# 时序降采样分析
by wj
## intuition:

Here is video_0001.json
it include the pedestrian bbox tracjectoy in the 30Hz video
We analyzed it, found for 30Hz, it is too frequent for ChatGPT to understand the current situation, because the data is too much, while the person may not change quickly in 1/30 s. 
Therefore, I want you give me some suggestion on it, how should we split the object tracking information and feed the segment for ChatGPT 

## GPT analysis:
Certainly, handling high-frequency data like 30Hz for object tracking in a video can be quite challenging, especially when trying to integrate it with a system like ChatGPT. The trick is to balance the need for real-time feedback against the ability to process and interpret the data meaningfully. Here are some suggestions on how you could approach this:

### 1. Temporal Downsampling
Instead of analyzing every frame at 30Hz, you could downsample the data. For instance, you could take one frame per second or every half-second, depending on how critical the time factor is in your application. This would reduce the load on ChatGPT and make it more manageable.

### 2. Delta-based Analysis
Instead of analyzing every frame, only analyze a frame when a significant change in the bounding boxes or any other parameter is detected. You could define what a 'significant change' means in your context—e.g., a person moves more than a certain distance, changes direction, or there's a new object in the frame.

### 3. Segment Aggregation
You could aggregate data over a small time window (e.g., 0.5 or 1 second) and analyze the aggregated data. For example, take the average or median bounding box coordinates for each person over that window and feed that to ChatGPT.

### 4. Priority-based Filtering
You could introduce a priority or scoring system based on the initial assessment of the scene. For example, focus more on pedestrians that are closer to the road or car, as they would be at higher risk. Only send updates related to high-priority objects to ChatGPT for further analysis.

### 5. Event-driven Analysis
Only analyze the frames where an 'event' occurs. An 'event' could be defined as any significant change in the scene—like a new pedestrian appearing, a pedestrian starting to cross the road, etc.

Let's try to simulate one of these approaches, like Temporal Downsampling, using the data snippet you provided. For simplicity, let's consider analyzing only every 2nd frame.

After applying temporal downsampling, we have reduced the frame count for analysis. Now, only frames 0000, 0002, 0004, and 0006 are considered, effectively halving the amount of data to process.

Would you like to proceed with evaluating the potential risk based on this downsampled data, or would you like to explore another approach?

In [None]:
# Provided JSON data for pedestrian tracking
video_data = {
    "0000": {
        "person": {
            "1": [1391.74, 653.24, 1491.21, 893.04],
            "2": [461.4, 726.51, 544.0, 849.0799999999999]
        },
        "car": {}
    },
    "0001": {
        "person": {
            "1": [1390.94, 653.24, 1492.01, 896.9],
            "2": [460.12, 726.9, 542.19, 848.6999999999999]
        },
        "car": {}
    },
    "0002": {
        "person": {
            "1": [1392.71, 655.3, 1494.38, 900.31],
            "2": [458.98, 727.31, 541.4100000000001, 849.67]
        },
        "car": {}
    },
    "0003": {
        "person": {
            "1": [1395.87, 655.63, 1497.9399999999998, 901.67],
            "2": [456.87, 727.77, 538.9300000000001, 849.64]
        },
        "car": {}
    },
    "0004": {
        "person": {
            "1": [1397.68, 655.03, 1502.1100000000001, 907.21],
            "2": [454.05, 727.58, 536.4, 849.94]
        },
        "car": {}
    },
    "0005": {
        "person": {
            "1": [1400.93, 655.43, 1505.75, 909.0],
            "2": [450.86, 727.18, 532.84, 849.0699999999999]
        },
        "car": {}
    },
    "0006": {
        "person": {
            "1": [1406.37, 654.94, 1513.6299999999999, 913.8100000000001],
            "2": [448.58, 727.04, 530.4399999999999, 848.77]
        },
        "car": {}
    },
}

# Temporal Downsampling: Only keep every 2nd frame for analysis
downsampled_video_data = {key: value for index, (key, value) in enumerate(video_data.items()) if index % 2 == 0}

downsampled_video_data


After applying temporal downsampling, we have reduced the frame count for analysis. Now, only frames 0000, 0002, 0004, and 0006 are considered, effectively halving the amount of data to process.

In [3]:
import json
import os

# Create a folder 'demo' for demonstration purposes
os.makedirs('demo/', exist_ok=True)

# Save the uploaded sample json files into 'bbox_txt_archive' folder
sample_file_1 = '../bbox_txt_archive/cd09a73f-5f6b9212.json'
sample_file_2 = '../bbox_txt_archive/video_0055.json'

def temporal_downsample(json_file_path, output_file_path):
    """
    Perform temporal downsampling on a given JSON file and save the result to another JSON file.
    """
    # Load the JSON data from the file
    with open(json_file_path, 'r') as f:
        video_data = json.load(f)
    
    # Perform temporal downsampling
    downsampled_data = {key: value for index, (key, value) in enumerate(video_data.items()) if index % 2 == 0}
    
    # Save the downsampled data to a new JSON file
    with open(output_file_path, 'w') as f:
        json.dump(downsampled_data, f,indent=4)

# Traverse all JSON files in the 'bbox_txt_archive' folder
for json_file in os.listdir('../bbox_txt_archive'):
    if json_file.endswith('.json'):
        json_file_path = os.path.join('../bbox_txt_archive', json_file)
        output_file_path = os.path.join('demo', f'downsampled_{json_file}')
        temporal_downsample(json_file_path, output_file_path)

# List files in 'bbox_txt_archive' folder to verify if downsampled files are created
os.listdir('demo')


['downsampled_b1c9c847-3bda4659.json',
 'downsampled_b1d10d08-c35503b8.json',
 'downsampled_b1d22ed6-f1cac061.json',
 'downsampled_b1dac7f7-6b2e0382.json',
 'downsampled_b1f4491b-33824f31.json',
 'downsampled_b1f4491b-9958bd99.json',
 'downsampled_b2ae0446-4b0bfda7.json',
 'downsampled_b2be7200-b6f7fe0a.json',
 'downsampled_b2bee3e1-80c787bd.json',
 'downsampled_b2d22b2f-8302eb61.json',
 'downsampled_b2d502aa-f0b28e3e.json',
 'downsampled_b4542860-0b880bb4.json',
 'downsampled_cd09a73f-5f6b9212.json',
 'downsampled_cd17ff29-f393274e.json',
 'downsampled_cd26264b-22001629.json',
 'downsampled_video_0001.json',
 'downsampled_video_0003.json',
 'downsampled_video_0055.json',
 'downsampled_video_0056.json',
 'downsampled_video_0057.json',
 'downsampled_video_0194.json',
 'downsampled_video_0310.json',
 'downsampled_video_0313.json',
 'downsampled_video_0333.json',
 'downsampled_video_0343.json']