In [1]:
import json
import pandas as pd
import numpy as np
import os

from pathlib import Path

# Step 2

## Overview
This Jupyter Notebook modifies configuration files for training SLEAP models. It reads a CSV file containing train, validation, and test splits, updates the initial configuration file with the appropriate paths for each split, and saves the modified configuration files.

## Prerequisites
Ensure the working directory contains the following files from the first Jupyter Notebook:
- `train_test_splits.csv`: A CSV file with columns `path`, `version`, `labeled_frames`, and `split_type`.
- `initial_config.json`: A JSON configuration file to be modified based on the splits.

## User-Defined Inputs
- `working_dir`: The directory where the CSV and JSON files are located.

## Steps
1. **Import Libraries**: Import necessary libraries.
2. **Define Working Directory**: Set the `working_dir` variable.
3. **Load CSV File**: Read the `train_test_splits.csv` file into a DataFrame (`splits`).
4. **Load JSON File**: Read the `initial_config.json` file into a dictionary (`data`).
5. **Modify Configuration**: Update the JSON data with the appropriate paths for each split type (train, val, test).
6. **Save Modified Configurations**: Save the modified JSON data to new files in the corresponding directories.

## Output
The final output is a set of modified configuration files saved in the directories specified in the `path` column of the CSV file. Each modified configuration file is named `initial_config_modified_v00X.json`, where `X` is the version number from the CSV file.

## Usage
1. Ensure the working directory contains the required `train_test_splits.csv` and `initial_config.json` files. These are the outputs from step 1.
2. Update the `working_dir` variable with the path to your working directory. This should be the `output_path` from step 1.
3. Use `Run All` to process the data and generate the modified configuration files.

In [2]:
# This should be the output path from the previous step
working_dir = "D:/SLEAP/20250102_generalizability_experiment/primary/sorghum"

In [3]:
# Load the CSV file
csv_path = Path(working_dir) / "train_test_splits.csv"
splits = pd.read_csv(csv_path )
splits

Unnamed: 0,path,version,labeled_frames,split_type
0,D:\SLEAP\20250102_generalizability_experiment\...,0,210,train
1,D:\SLEAP\20250102_generalizability_experiment\...,0,45,val
2,D:\SLEAP\20250102_generalizability_experiment\...,0,45,test
3,D:\SLEAP\20250102_generalizability_experiment\...,1,210,train
4,D:\SLEAP\20250102_generalizability_experiment\...,1,45,val
5,D:\SLEAP\20250102_generalizability_experiment\...,1,45,test
6,D:\SLEAP\20250102_generalizability_experiment\...,2,210,train
7,D:\SLEAP\20250102_generalizability_experiment\...,2,45,val
8,D:\SLEAP\20250102_generalizability_experiment\...,2,45,test


In [4]:
# Path to the initial configuration file for the models
init_config_path = Path(working_dir) / "initial_config.json"

In [5]:
# Open the JSON file
with open(init_config_path, 'r') as file:
    # Load the data from the file
    data = json.load(file)

# Iterate over the rows of the DataFrame
for index, row in splits.iterrows():
    # Ensure the path corresponds to the correct split type
    if row['split_type'] == 'train':
        data['data']['labels']['training_labels'] = row['path']
    elif row['split_type'] == 'val':
        data['data']['labels']['validation_labels'] = row['path']
    elif row['split_type'] == 'test':
        data['data']['labels']['test_labels'] = row['path']

    # Derive the directory for saving the modified config
    config_dir = os.path.dirname(row['path'])
    data['outputs']['runs_folder'] = os.path.join(config_dir, 'models')

    # Save the changes to a new JSON file
    with open(os.path.join(config_dir, f'initial_config_modified_v00{row["version"]}.json'), 'w') as new_file:
        # Write the updated data to the file
        json.dump(data, new_file)
        print(f"Saved modified config for split {row['split_type']} version {row['version']} to {config_dir}")

Saved modified config for split train version 0 to D:\SLEAP\20250102_generalizability_experiment\primary\sorghum\train_test_split.v000
Saved modified config for split val version 0 to D:\SLEAP\20250102_generalizability_experiment\primary\sorghum\train_test_split.v000
Saved modified config for split test version 0 to D:\SLEAP\20250102_generalizability_experiment\primary\sorghum\train_test_split.v000
Saved modified config for split train version 1 to D:\SLEAP\20250102_generalizability_experiment\primary\sorghum\train_test_split.v001
Saved modified config for split val version 1 to D:\SLEAP\20250102_generalizability_experiment\primary\sorghum\train_test_split.v001
Saved modified config for split test version 1 to D:\SLEAP\20250102_generalizability_experiment\primary\sorghum\train_test_split.v001
Saved modified config for split train version 2 to D:\SLEAP\20250102_generalizability_experiment\primary\sorghum\train_test_split.v002
Saved modified config for split val version 2 to D:\SLEAP\202