The goal of this notebook is to include the basic information of each eid in the respective folders. This notebook adds 3 deliverables
1. left_df: wheel, nose, whisker info, synced to left camera clock
2. right_df: wheel, nose, whisker info, synced to right camera clock
3. trials_df: trial information
4. report the syncing process, including fps, pc1 variance expalined, max_age, wheel_masked, etc.

Methedology:
1. **PCA (nose):** We compute a 2D PCA over (nose_tip_x, nose_tip_y) on the session clock. PC1’s sign is fixed so the X-loading ≥ 0, making it deterministic across sessions. We store the PC1 score (nose_pc1) and report the % variance explained by PC1.

2. **Wheel (position & velocity):** Wheel is sampled on its native (irregular) timestamps; velocity is the time-derivative on that native clock. To align with camera frames without interpolation, we use a causal hold-last-sample (backward-asof) join and record the age of the matched wheel sample (wheel_age_s). We then mask wheel_pos/vel when stale using a data-driven threshold max_age = min(max(3×p95_age, p99_age), 0.5s). This preserves temporal integrity (no future peeking), avoids blending, and cleanly flags gaps.

3. **Signals & clock:** All columns live on the session master clock; pose and whisker motion energy (whiskerME) are frame-aligned to camera times; wheel fields are attached per the rule above. No numeric interpolation anywhere.


In [1]:
! pip install ONE-api
! pip install ibllib

Collecting ONE-api
  Downloading one_api-3.4.0-py3-none-any.whl.metadata (4.2 kB)
Collecting iblutil>=1.14.0 (from ONE-api)
  Downloading iblutil-1.20.0-py3-none-any.whl.metadata (1.6 kB)
Collecting boto3 (from ONE-api)
  Downloading boto3-1.40.54-py3-none-any.whl.metadata (6.6 kB)
Collecting colorlog>=6.0.0 (from iblutil>=1.14.0->ONE-api)
  Downloading colorlog-6.10.1-py3-none-any.whl.metadata (11 kB)
Collecting botocore<1.41.0,>=1.40.54 (from boto3->ONE-api)
  Downloading botocore-1.40.54-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3->ONE-api)
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.15.0,>=0.14.0 (from boto3->ONE-api)
  Downloading s3transfer-0.14.0-py3-none-any.whl.metadata (1.7 kB)
Downloading one_api-3.4.0-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading iblutil-1.20.0-py3-none-any.whl (43

In [2]:
from one.api import ONE
from brainbox.io.one import SessionLoader
import numpy as np
import pandas as pd
import os
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [3]:
ONE.setup(base_url='https://openalyx.internationalbrainlab.org', silent=True)
one = ONE(password='international')

Connected to https://openalyx.internationalbrainlab.org as user "intbrainlab"


In [5]:
BASE_DIR = '/content/drive/MyDrive/S25/Langone/Breathing/Figures'

In [6]:
eid_list = ['862ade13-53cd-4221-a3fa-dda8643641f2']

In [7]:
# ---------- helpers ----------
def pca_pc1_xy(x: np.ndarray, y: np.ndarray):
    """PCA on [x,y] → (scores_pc1, loadings_pc1[2], var_ratio_pc1). Sign fixed so x-loading ≥ 0."""
    X = np.column_stack([x, y]).astype(float)
    Xc = X - X.mean(axis=0, keepdims=True)
    U, S, VT = np.linalg.svd(Xc, full_matrices=False)
    comps = VT
    eigvars = (S**2) / max(1, (len(X) - 1))
    var_ratio = eigvars / eigvars.sum()
    pc1 = comps[0].copy()
    if pc1[0] < 0: pc1 *= -1
    scores = Xc @ pc1
    return scores, pc1, float(var_ratio[0])

def _load_wheel(eid, one):
    wt = one.load_dataset(eid, '_ibl_wheel.timestamps.npy', collection='alf').astype(float)
    wp = one.load_dataset(eid, '_ibl_wheel.position.npy',   collection='alf')
    wheel = pd.DataFrame({'time': wt, 'wheel_pos': wp}).dropna().sort_values('time')
    wheel = wheel[~wheel['time'].duplicated(keep='last')].copy()       # drop exact dupes
    wheel['time'] = np.maximum.accumulate(wheel['time'].to_numpy())    # enforce monotonic
    wheel['wheel_vel'] = np.gradient(wheel['wheel_pos'].to_numpy(), wheel['time'].to_numpy())
    return wheel

def build_camera_df(eid, one, side: str):
    assert side in ('left', 'right')
    cam = 'left' if side == 'left' else 'right'
    # load camera times, pose, whisker ROI motion energy
    t = one.load_dataset(eid, f'_ibl_{cam}Camera.times.npy', collection='alf').astype(float)
    pose_path = one.load_dataset(eid, f'_ibl_{cam}Camera.lightningPose.pqt', collection='alf', download_only=True)
    pose = pd.read_parquet(pose_path)
    me = one.load_dataset(eid, f'{cam}Camera.ROIMotionEnergy.npy', collection='alf')

    # frame-aligned truncate
    n = min(len(t), len(pose), len(me))
    t = t[:n]; pose = pose.iloc[:n]; me = me[:n]

    # base df (session clock)
    df = pd.DataFrame(index=pd.Index(t, name='time'))
    df[['nose_tip_x','nose_tip_y']] = pose[['nose_tip_x','nose_tip_y']].to_numpy()
    df['whiskerME'] = me

    # PCA on nose (deterministic sign)
    pc1_scores, pc1_load, pc1_vr = pca_pc1_xy(df['nose_tip_x'].to_numpy(), df['nose_tip_y'].to_numpy())
    df['nose_pc1'] = pc1_scores

    # wheel stream + backward-asof (hold-last-sample; no interpolation)
    wheel = _load_wheel(eid, one)
    joined = pd.merge_asof(
        df.reset_index().sort_values('time'),
        wheel.rename(columns={'time': 'wheel_time'}).sort_values('wheel_time'),
        left_on='time', right_on='wheel_time',
        direction='backward', tolerance=None  # always pick last-known sample
    )
    # staleness masking
    joined['wheel_age_s'] = joined['time'] - joined['wheel_time']
    ages = joined['wheel_age_s'].to_numpy()
    ages_valid = ages[np.isfinite(ages)]
    if ages_valid.size:
        q95 = float(np.quantile(ages_valid, 0.95))
        q99 = float(np.quantile(ages_valid, 0.99))
        MAX_AGE = min(max(3*q95, q99), 0.5)  # cap at 0.5 s
    else:
        MAX_AGE = 0.5
    stale = joined['wheel_age_s'] > MAX_AGE
    for c in ('wheel_pos','wheel_vel'):
        joined.loc[stale, c] = np.nan

    df = joined.set_index('time')

    # meta
    fps_est = (len(t)-1)/(t[-1]-t[0]) if len(t) > 1 and (t[-1]-t[0]) > 0 else np.nan
    meta = dict(
        side=side,
        pc1_var_ratio=pc1_vr,
        pc1_loadings=[float(pc1_load[0]), float(pc1_load[1])],
        fps_est=float(fps_est),
        max_age=float(MAX_AGE),
        pct_wheel_masked=float(stale.mean()*100.0)
    )
    return df, meta

def build_left_right(eid, one):
    left_df,  left_meta  = build_camera_df(eid, one, 'left')
    right_df, right_meta = build_camera_df(eid, one, 'right')
    report = {'eid': eid, 'left': left_meta, 'right': right_meta}
    return left_df, right_df, report

In [8]:
import os, json
import pandas as pd

# assumes: BASE_DIR, one, eid_list, and build_left_right(...) already defined

def _save_df_csv(df: pd.DataFrame, path: str, name_for_log: str, index=True):
    os.makedirs(os.path.dirname(path), exist_ok=True)
    df.to_csv(path, index=index)
    print(f"[SAVED] {name_for_log:<10} rows={len(df):>8} cols={df.shape[1]:>4} -> {path}")

def _save_report_csv(report: dict, path: str):
    """
    Flatten nested dict like {'left': {...}, 'right': {...}} to 2-column CSV:
    key,value with dotted keys e.g. left.pc1_var_ratio
    """
    flat = {}
    def _recurse(prefix, obj):
        if isinstance(obj, dict):
            for k, v in obj.items():
                _recurse(f"{prefix}.{k}" if prefix else k, v)
        else:
            flat[prefix] = obj
    _recurse("", report)
    df = pd.DataFrame(list(flat.items()), columns=["key", "value"])
    os.makedirs(os.path.dirname(path), exist_ok=True)
    df.to_csv(path, index=False)
    print(f"[SAVED] report(csv) -> {path}")

for i, eid in enumerate(eid_list, start=1):
    print("\n" + "="*88)
    print(f"[{i:04d}/{len(eid_list)}] Processing eid: {eid}")
    try:
        info = one.alyx.rest("sessions", "read", id=eid)
        lab, subject = info['lab'], info['subject']
        SAVE_PATH = os.path.join(BASE_DIR, f"{lab}/{subject}/{eid}")
        os.makedirs(SAVE_PATH, exist_ok=True)
        print(f"[INFO]  Save dir: {SAVE_PATH}")

        # ---- trials.csv ----
        try:
            trials_obj = one.load_object(eid, 'trials')
            trials = trials_obj.to_df() if hasattr(trials_obj, "to_df") else pd.DataFrame(trials_obj)
        except Exception as e:
            print(f"[WARN]  trials missing/failed for {eid}: {e}")
            trials = pd.DataFrame()
        _save_df_csv(trials, os.path.join(SAVE_PATH, "trials.csv"), "trials(csv)", index=False)

        # ---- left_df.csv, right_df.csv, report.csv ----
        left_df, right_df, report = build_left_right(eid, one)

        # keep the time index for left/right so we preserve the session clock
        _save_df_csv(left_df,  os.path.join(SAVE_PATH, "left_df.csv"),  "left_df",  index=True)
        _save_df_csv(right_df, os.path.join(SAVE_PATH, "right_df.csv"), "right_df", index=True)
        _save_report_csv(report, os.path.join(SAVE_PATH, "report.csv"))

        L, R = report['left'], report['right']
        print(f"[DONE]  {eid} | "
              f"L_PC1%={L['pc1_var_ratio']*100:.1f} fps≈{L['fps_est']:.1f} mask%={L['pct_wheel_masked']:.1f} | "
              f"R_PC1%={R['pc1_var_ratio']*100:.1f} fps≈{R['fps_est']:.1f} mask%={R['pct_wheel_masked']:.1f}")

    except Exception as e:
        print(f"[FAIL]  {eid}: {e}")
        continue



[0001/1] Processing eid: 862ade13-53cd-4221-a3fa-dda8643641f2
[INFO]  Save dir: /content/drive/MyDrive/S25/Langone/Breathing/Figures/hoferlab/SWC_042/862ade13-53cd-4221-a3fa-dda8643641f2


(S3) /root/Downloads/ONE/openalyx.internationalbrainlab.org/hoferlab/Subjects/SWC_042/2020-07-15/001/alf/_ibl_trials.goCueTrigger_times.npy: 100%|██████████| 4.89k/4.89k [00:00<00:00, 43.1kB/s]
(S3) /root/Downloads/ONE/openalyx.internationalbrainlab.org/hoferlab/Subjects/SWC_042/2020-07-15/001/alf/_ibl_trials.table.pqt: 100%|██████████| 47.9k/47.9k [00:00<00:00, 256kB/s]
(S3) /root/Downloads/ONE/openalyx.internationalbrainlab.org/hoferlab/Subjects/SWC_042/2020-07-15/001/alf/_ibl_trials.stimOff_times.npy: 100%|██████████| 4.89k/4.89k [00:00<00:00, 47.3kB/s]


[SAVED] trials(csv) rows=     595 cols=  15 -> /content/drive/MyDrive/S25/Langone/Breathing/Figures/hoferlab/SWC_042/862ade13-53cd-4221-a3fa-dda8643641f2/trials.csv


(S3) /root/Downloads/ONE/openalyx.internationalbrainlab.org/hoferlab/Subjects/SWC_042/2020-07-15/001/alf/_ibl_leftCamera.times.npy: 100%|██████████| 3.33M/3.33M [00:00<00:00, 7.54MB/s]


[FAIL]  862ade13-53cd-4221-a3fa-dda8643641f2: Dataset "_ibl_leftCamera.lightningPose.pqt" not found 
 The ALF object was not found.  This may occur if the object or namespace or incorrectly formatted e.g. the object "_ibl_trials.intervals.npy" would be found with the filters `object="trials", namespace="ibl"` 
