# Step 1: Object Detection and Data Extraction

Use the custom YOLOv11nano model to detect objects (bird, perches, etc.) in the videos located in `data/original_videos/`.
The function `read_video_and_save_frames_to_json` from `utils/frames.py` processes each video.
It extracts bounding box information for detected objects frame by frame, along with other video metadata (like frame rate and dimensions), and saves this data into a JSON file in the `data/raw_data/` directory.
Each video will have a corresponding JSON file.

Define path to video file or a directory that contains video files to be processed. Define path where to save results.

In [1]:
from utils.frames import read_video_and_save_frames_to_json
import os

# --- Configuration ---
# Option 1: Specify a single video file
# input_path = 'data/original_videos/CAGE_250520_HA70342_exploration_IB.mp4' 
 
# Option 2: Specify a directory containing video files
input_path = 'data/original_videos/' 

output_dir = 'data/raw_data/'
model_path = 'yolo/custom_yolo11n_v2.pt'
allowed_extensions = ['.mp4', '.avi', '.mov'] # Add other video extensions if needed
# --- End Configuration ---

# Ensure output directory exists
os.makedirs(output_dir, exist_ok=True)

# Determine video paths to process
video_paths_to_process = []
if os.path.isfile(input_path):
    video_paths_to_process.append(input_path)
    print(f"Processing single video file: {input_path}")
elif os.path.isdir(input_path):
    print(f"Processing all videos in directory: {input_path}")
    for filename in os.listdir(input_path):
        if any(filename.lower().endswith(ext) for ext in allowed_extensions):
            video_paths_to_process.append(os.path.join(input_path, filename))
else:
    print(f"Error: Input path not found or is not a valid file/directory: {input_path}")

print(f"Found {len(video_paths_to_process)} video(s) to process.")

Processing all videos in directory: data/original_videos/
Found 11 video(s) to process.


Use custom YOLO model to detect objects from video files and store information of bounding boxes found in each frame into JSON files.

In [3]:
# --- Object Detection ---
if not 'video_paths_to_process' in locals() or not video_paths_to_process:
    print("No video paths defined or found in the previous cell. Please run the previous cell first.")
else:
    total_videos = len(video_paths_to_process) # Get total number of videos
    print(f"Starting object detection for {total_videos} video(s)...")
    processed_count = 0
    error_count = 0
    
    for idx, video_path in enumerate(video_paths_to_process): # Use enumerate for index
        video_filename = os.path.basename(video_path)
        output_json_filename = os.path.splitext(video_filename)[0] + '.json'
        output_json_path = os.path.join(output_dir, output_json_filename)

        # Print progress before processing the current video
        print(f"\nProcessing video {idx + 1} of {total_videos}: {video_filename}...") 
        
        if os.path.exists(video_path):
            try:
                read_video_and_save_frames_to_json(
                    video_filepath=video_path, 
                    model_path=model_path, 
                    save_path=output_json_path
                    # max_frames=None # Optional: uncomment to limit frames
                )
                print(f"Saved detection data to {output_json_path}")
                processed_count += 1
            except Exception as e:
                print(f"Error processing {video_path}: {e}")
                error_count += 1
        else:
            print(f"Video file not found : {video_path}")
            error_count += 1
            
    print(f"\n--- Detection Summary ---")
    print(f"Successfully processed: {processed_count}")
    print(f"Errors encountered: {error_count}")
    print(f"Total attempted: {total_videos}")

Starting object detection for 11 video(s)...
HE21360_100721_21OW8_exploration_IB.json
data/raw_data/HE21360_100721_21OW8_exploration_IB.json

Processing video 1 of 11: HE21360_100721_21OW8_exploration_IB.mp4...


Frames:   0%|          | 17/18024 [00:04<1:21:43,  3.67it/s]



KeyboardInterrupt: 

# Step 2: Feature Extraction

Load the JSON files generated in the previous step (located in `data/raw_data/`).
Use the `extract_features` function from `utils/features.py` to calculate relevant behavioral features from the raw detection data.
The extracted features and quality metrics are saved to separate CSV files in the `data/extracted_features/` directory using the `save_features_to_csv` function.

## Extracted Features

| Feature     | Unit         | Description                                                                 |
|-------------|--------------|-----------------------------------------------------------------------------|
| latency     | Duration (s) | Time until first entry into the novel (exploration) area.                   |
| 5perches    | Duration (s) | Time spent in the novel area until the 5th distinct perch (1-5) is visited. |
| ground      | Duration (s) | Total time spent on the ground.                                             |
| perch1      | Duration (s) | Total time spent on perch 1.                                                |
| perch2      | Duration (s) | Total time spent on perch 2.                                                |
| perch3      | Duration (s) | Total time spent on perch 3.                                                |
| perch4      | Duration (s) | Total time spent on perch 4.                                                |
| perch5      | Duration (s) | Total time spent on perch 5.                                                |
| movements   | Count        | Number of movements (hops/flights) detected in the novel area.              |
| back_home   | Duration (s) | Time until the bird first returns to the home area after entering novel area. |
| T_new       | Duration (s) | Total time spent in the novel (exploration) area.                           |
| T_home      | Duration (s) | Total time spent in the home area.                                          |
| move_home   | Count        | Number of movements (hops/flights) detected in the home area.               |
| top         | Duration (s) | Total time spent in the top section of the cage.                            |
| middle      | Duration (s) | Total time spent in the middle section of the cage.                         |
| bottom      | Duration (s) | Total time spent in the bottom section of the cage.                         |
| fence       | Duration (s) | Total time spent detected near the fence/mesh.                              |

## Quality Metrics

| Metric                   | Unit         | Description                                                                                 |
|--------------------------|--------------|---------------------------------------------------------------------------------------------|
| camera_movement        | Boolean      | Indicates if significant camera/perch coordinate movement was detected during analysis.     |
| perch_count            | Count        | Number of perches (out of 5) reliably identified in the novel area in initial frames.     |
| close_perches          | Boolean      | Indicates if any identified perches (1-5) are potentially too close together.               |
| bird_inbetween_zones   | Rate (ev/s)  | Rate at which the bird was detected in ambiguous vertical zone boundaries (events per sec). |
| bird_inbetween_perches | Rate (ev/s)  | Rate at which the bird was detected in ambiguous location between perches 2 & 3 (ev per sec). |

Define path to JSON file or a directory that contains JSON files to be processed. Define path where to save results.

In [1]:
import os
import pandas as pd

# --- Configuration ---
# Option 1: Specify a single JSON file
#json_input_path = 'data/raw_data/CAGE_200520_HA70336_exploration_IB.json'

# Option 2: Specify a directory containing JSON files
json_input_path = 'data/raw_data/'

# Output directory for extracted features
output_features_dir = 'data/extracted_features/'

allowed_json_extensions = ['.json']

# Feature Extraction Parameters
window_size_mean = 5  # Adjust as needed. Must be odd.
window_size_mode = 31  # Adjust as needed. Must be odd.
# --- End Configuration ---

# Ensure output directory exists
os.makedirs(output_features_dir, exist_ok=True)

# Determine JSON paths to process
json_paths_to_process = []
if os.path.isfile(json_input_path):
    if any(json_input_path.lower().endswith(ext) for ext in allowed_json_extensions):
        json_paths_to_process.append(json_input_path)
        print(f"Processing single JSON file: {json_input_path}")
    else:
        print(f"Error: Specified file is not a JSON file: {json_input_path}")
elif os.path.isdir(json_input_path):
    print(f"Processing all JSON files in directory: {json_input_path}")
    for filename in os.listdir(json_input_path):
        if any(filename.lower().endswith(ext) for ext in allowed_json_extensions):
            json_paths_to_process.append(os.path.join(json_input_path, filename))
else:
    print(f"Error: Input path not found or is not a valid file/directory: {json_input_path}")

print(f"Found {len(json_paths_to_process)} JSON file(s) to process.")

Processing all JSON files in directory: data/raw_data/
Found 21 JSON file(s) to process.


Extract features from JSON files and store results into CSV files.

In [2]:
from utils.frames import load_json_to_dict
from utils.features import extract_features, save_features_to_csv
import os
from IPython.display import clear_output

# --- Feature Extraction Loop ---
if not 'json_paths_to_process' in locals() or not json_paths_to_process:
    print("No JSON paths defined or found. Please run the previous cell first.")
else:
    total_files = len(json_paths_to_process)
    processed_count = 0
    error_count = 0
    error_files = [] # Initialize list to store filenames with errors
    all_features_list = [] # Optional: Collect all features in a list
    all_quality_list = []  # Optional: Collect all quality metrics
    
    # Use a list to store iteration messages (status + result)
    # Each item will be a tuple: (status_line, result_line)
    last_outputs = []

    for idx, json_path in enumerate(json_paths_to_process):
        clear_output(wait=True)
        for output in last_outputs:
            print(output)

        json_filename = os.path.basename(json_path)
        base_filename = os.path.splitext(json_filename)[0]
        print(f"\nProcessing file {idx + 1} of {total_files}: {json_filename}...")

        try:
            # 1. Load JSON data
            raw_data = load_json_to_dict(json_path)

            # Extract necessary parameters from loaded data
            fps = raw_data.get('fps')
            frame_count = raw_data.get('frame_count')
            frames_data = raw_data.get('frames') # Check if frames data exists

            if fps is None or frame_count is None or frames_data is None:
                error_msg = f"Missing required keys (fps, frame_count, frames) in JSON file."
                last_outputs.append(f"Error processing {json_filename}: {error_msg}")
                raise ValueError(error_msg)

            # 2. Extract Features
            features_df, bird_status_array, quality_df = extract_features(
                data_raw=raw_data, # Pass the whole loaded dict
                window_size_mean=window_size_mean,
                window_size_mode=window_size_mode,
                fps=int(fps),
                frame_count=int(frame_count)
            )
            print("Features extracted successfully.")

            # Optional: Add identifier and collect DataFrames
            features_df['identifier'] = base_filename
            quality_df['identifier'] = base_filename
            all_features_list.append(features_df)
            all_quality_list.append(quality_df)

            # 3. Save Features to CSV
            print("Saving features to CSV...")
            save_features_to_csv(
                features_df=features_df,
                bird_status=bird_status_array,
                quality_df=quality_df,
                base_filename=base_filename,
                output_dir=output_features_dir
            )
            processed_count += 1
            last_outputs.append(f"Processed {json_filename} successfully.")
            

        except Exception as e:
            print(f"Error processing {json_filename}: {e}")
            error_count += 1
            error_files.append((json_filename,e)) # Add filename to error list
            last_outputs.append(f"Error processing {json_filename}: {e}")

    clear_output(wait=False)
    print(f"\n--- Feature Extraction Summary ---")
    print(f"Successfully processed: {processed_count}")
    print(f"Errors encountered: {error_count}")
    print(f"Total attempted: {total_files}")

    # Print filenames that caused errors
    if error_files:
        print("\n--- Files with Errors ---")
        for filename, error in error_files:
            print(f"{filename}: {error}")


--- Feature Extraction Summary ---
Successfully processed: 20
Errors encountered: 1
Total attempted: 21

--- Files with Errors ---
CAGE_220520_HA70337_exploration_IB.json: No perches found in the frame.


Combine results and save them into separate CSV file.

In [3]:
# Optional: Combine all features into single DataFrames if needed
if all_features_list:
    combined_features_df = pd.concat(all_features_list, ignore_index=True)
    combined_quality_df = pd.concat(all_quality_list, ignore_index=True)
    print("\nCombined features DataFrame head:")
    # display(combined_features_df.head()) # Use display in Jupyter
    print(combined_features_df.head())
    print("\nCombined quality metrics DataFrame head:")
    # display(combined_quality_df.head())
    print(combined_quality_df.head())
    
    # You could save these combined dataframes as well if desired
    combined_features_df.to_csv(os.path.join(output_features_dir, 'all_features_combined4.csv'), index=False)
    combined_quality_df.to_csv(os.path.join(output_features_dir, 'all_quality_combined4.csv'), index=False)


Combined features DataFrame head:
       perch1     perch2      perch3     perch4  perch5    5perches  \
0  116.133333  78.166667  109.900000  37.700000    54.4   10.066667   
1    0.000000   0.000000   33.600000  75.933333    36.3  147.433333   
2    0.000000   0.000000    0.000000   0.000000     0.0    0.000000   
3  170.566667  61.400000  138.066667   0.700000     4.5    8.333333   
4    0.000000   0.000000    0.000000   0.000000     0.0    0.000000   

        fence    ground     top  middle  bottom   T_new  T_home  latency  \
0   17.200000  0.000000  267.60  326.77    5.93  415.60  184.70     17.8   
1  329.666667  0.000000  170.07  429.87    0.00  147.43  452.50    413.7   
2    0.866667  0.000000   85.10   62.30  453.33    0.00  600.73      NaN   
3  135.300000  1.633333  255.73  336.83    8.07  538.67   61.97     62.0   
4    0.000000  0.000000  599.70    0.00    0.00    0.00  599.70      NaN   

    back_home  move_home  movements  home_perches_identified  \
0   58.133333    