
# JSON to HDF5 Workflow

This notebook walks through converting Unity behavioral JSON logs to HDF5 using `behavioral_analysis.processing.json_to_hdf5_processor`.


In [1]:

from pathlib import Path
import sys

REPO_ROOT = Path.cwd().resolve()
SRC_CANDIDATES = [
    REPO_ROOT / 'src',
    REPO_ROOT.parent / 'src',
    (REPO_ROOT / '..' / 'src').resolve(),
]

for candidate in SRC_CANDIDATES:
    if candidate.exists():
        src_path = candidate.resolve()
        break
else:
    raise RuntimeError('Could not locate the package src directory. Update the path setup cell.')

if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f'Added {src_path} to sys.path')


Added /groups/spruston/home/moharb/DELTA_Behavior/src to sys.path


## Imports
Load the processing helpers we'll use in this workflow.

In [2]:

import json
from pathlib import Path

import pandas as pd

from behavioral_analysis.processing.json_to_hdf5_processor import process_json_to_hdf5
from behavioral_analysis.io.hdf5_writer import list_hdf5_contents



## Choose a JSON log
Set `JSON_PATH` to the Unity log you want to convert. If the path does not exist, a compact demo dataset is generated so you can run through the workflow end-to-end.


In [4]:

JSON_PATH = Path('/groups/spruston/home/moharb/DELTA_Behavior/Log BM35 2025-09-22 session 1.json')

if JSON_PATH.exists():
    print(f'Using JSON log: {JSON_PATH}')
else:
    print('JSON_PATH does not exist; creating a synthetic demo dataset.')
    

Using JSON log: /groups/spruston/home/moharb/DELTA_Behavior/Log BM35 2025-09-22 session 1.json



## Run the conversion
The helper wraps the full pipeline: parsing the JSON, building pandas DataFrames, detecting corridors, optionally generating trial summaries, and writing everything to HDF5.


In [5]:

OUTPUT_DIR = Path('outputs')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

OUTPUT_PATH = OUTPUT_DIR / f"{JSON_PATH.stem}_with_global_position.h5"

result_path = process_json_to_hdf5(
    input_file=str(JSON_PATH),
    output_file=str(OUTPUT_PATH),
    corridor_length_cm=500.0,
    include_trials=True,
    include_combined=False,
    verbose=True,
)
print(f'Processed HDF5 saved to: {result_path}')


Processing JSON data with global position: /groups/spruston/home/moharb/DELTA_Behavior/Log BM35 2025-09-22 session 1.json
Step 1: Parsing JSON file
Parsing file: /groups/spruston/home/moharb/DELTA_Behavior/Log BM35 2025-09-22 session 1.json
File size: 22.77 MB
Processed 10000 events...
Processed 20000 events...
Processed 30000 events...
Processed 40000 events...
Processed 50000 events...
Processed 60000 events...
Processed 70000 events...
Processed 80000 events...
Processed 90000 events...
Processed 100000 events...
Processed 110000 events...
Processed 120000 events...
Processed 130000 events...
Processed 140000 events...
Processed 150000 events...
Processed 160000 events...
Processed 170000 events...
Processed 180000 events...
Processed 190000 events...
Processed 200000 events...
Processed 210000 events...
Processed 220000 events...
Processed 230000 events...
Processed 240000 events...

Processing completed in 1.14 seconds
Total events: 241714

Event counts by type:
  Position: 95201


your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block3_values] [items->Index(['name', 'gain'], dtype='object')]

  store[f'events/{safe_event_type}'] = df
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed-integer,key->values] [items->None]

  store['metadata'] = pd.Series(metadata)



## Inspect the HDF5 contents
Use the I/O helpers or pandas directly to see what was stored.


In [6]:

contents = list_hdf5_contents(result_path)
contents


[('/events/Corridor_Info',
  58,
  ['corridor_id', 'start_time', 'trigger', 'end_time']),
 ('/events/Cue_Result',
  398,
  ['time',
   'id',
   'id2',
   'position',
   'isRewarding',
   'hasGivenReward',
   'numLicksInReward',
   'numLicksInPre',
   'corridor_id',
   'position_cm',
   'global_position_cm']),
 ('/events/Cue_State',
  405,
  ['time',
   'id',
   'id2',
   'position',
   'isRewarding',
   'corridor_id',
   'position_cm',
   'global_position_cm']),
 ('/events/Info',
  1,
  ['time', 'session_time', 'project', 'scene', 'corridor_id']),
 ('/events/Lick', 2970, ['time', 'corridor_id']),
 ('/events/Linear_Controller_Settings',
  1,
  ['time',
   'name',
   'isActive',
   'loopPath',
   'gain',
   'inputSmooth',
   'corridor_id']),
 ('/events/Log', 47359, ['time', 'source', 'msg', 'corridor_id']),
 ('/events/Path_Position',
  95200,
  ['time',
   'name',
   'pathName',
   'position',
   'corridor_id',
   'position_cm',
   'global_position_cm']),
 ('/events/Position',
  95201,
 


## Preview key tables
Here we look at the position trace with the derived global position and any trials that were generated.


In [7]:

with pd.HDFStore(result_path, mode='r') as store:
    position_preview = store['events/Position'].head()
    corridor_info = store['events/Corridor_Info']
    trials = store.get('events/Trials')

position_preview, corridor_info, trials.head() if trials is not None else 'No trials table found'


(        time   name  position  heading  corridor_id  position_cm  \
 0   954.7454  Mouse         0        0          NaN          0.0   
 1   990.7839  Mouse         0        0          NaN          0.0   
 2  1393.4910  Mouse         0        0          NaN          0.0   
 3  1443.5110  Mouse         0        0          NaN          0.0   
 4  1455.5170  Mouse         0        0          NaN          0.0   
 
    global_position_cm  
 0                 NaN  
 1                 NaN  
 2                 NaN  
 3                 NaN  
 4                 NaN  ,
     corridor_id  start_time    trigger    end_time
 0             0    10977.16  first_cue    22231.28
 1             1    22231.28  cue_reset    52474.30
 2             2    52474.30  cue_reset    84919.63
 3             3    84919.63  cue_reset   118963.00
 4             4   118963.00  cue_reset   146804.50
 5             5   146804.50  cue_reset   165521.60
 6             6   165521.60  cue_reset   185771.40
 7             7 

In [8]:
corridor_info

Unnamed: 0,corridor_id,start_time,trigger,end_time
0,0,10977.16,first_cue,22231.28
1,1,22231.28,cue_reset,52474.3
2,2,52474.3,cue_reset,84919.63
3,3,84919.63,cue_reset,118963.0
4,4,118963.0,cue_reset,146804.5
5,5,146804.5,cue_reset,165521.6
6,6,165521.6,cue_reset,185771.4
7,7,185771.4,cue_reset,206972.4
8,8,206972.4,cue_reset,229291.6
9,9,229291.6,cue_reset,273113.5
