## Script for Girls' Day 3rd April, 2025

### Data: in folder "output" in cwd (current working directory 

### JGU Jupyter Webserver:

https://cbdm-01.zdv.uni-mainz.de/~muro/teaching/p4b/mod4-2/SoSe23/c0_set_up/c0_jgu_jupyter_notebook_server.html


In [9]:
# imports (pre-installed libraries)
import pandas as pd
import numpy as np
import matplotlib.pyplot as pt
import scipy
import os
import re

In [10]:
#Folder that contains the data we will use
output_folder = './output'

We need a function to load the multi-index data

In [11]:
def load_multi_index_csv_as_df(path, index_cols):
    """
    Loads a CSV and applies transform_df_types, then sets MultiIndex.
    """
    df = pd.read_csv(path)

    # Protect index columns from being cast to int
    preserve_cols = index_cols + ['time_s','batch', 'session', 'rp_rm']
    df = transform_df_types(df, preserve_cols=preserve_cols)

    df.set_index(index_cols, inplace=True)
    return df

def transform_df_types(df, preserve_cols=None):
    """
    Casts numeric columns to int (0,1), casts 'time_s' to float,
    and keeps preserve_cols as strings.
    """
    if preserve_cols is None:
        preserve_cols = ['batch', 'session', 'rp_rm']

    for col in df.columns:
        if col == 'time_s':
            df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0.0).astype(float)
        elif col in preserve_cols:
            df[col] = df[col].astype(str)
        else:
            df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0).astype(int)

    return df

# 1.) Load the data

In [12]:
data = load_multi_index_csv_as_df('./output/checkpoint_with_index.csv', ['file', 'index'])
print(data.index.names)

['file', 'index']


Take a look on the data:

In [13]:
data

Unnamed: 0_level_0,Unnamed: 1_level_0,time_s,immobile,csp,csm,id,batch,session,rp_rm
file,index,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
990_B_s4_rm_12_4kHz,0,0.00,0,0,0,990,B,4,rm
990_B_s4_rm_12_4kHz,1,0.04,0,0,0,990,B,4,rm
990_B_s4_rm_12_4kHz,2,0.08,0,0,0,990,B,4,rm
990_B_s4_rm_12_4kHz,3,0.12,0,0,0,990,B,4,rm
990_B_s4_rm_12_4kHz,4,0.16,0,0,0,990,B,4,rm
...,...,...,...,...,...,...,...,...,...
982_A_s5_rm_12_4kHz,40496,1619.84,1,0,0,982,A,5,rm
982_A_s5_rm_12_4kHz,40497,1619.88,1,0,0,982,A,5,rm
982_A_s5_rm_12_4kHz,40498,1619.92,1,0,0,982,A,5,rm
982_A_s5_rm_12_4kHz,40499,1619.96,1,0,0,982,A,5,rm


## Information about the data 

### The keys

The data contains keys like "990_B_s4_rm_12_4kHz". 

* The first part "990_... is the animal id (number that identifies the animal)
* The second part "_B_" is the batch (we don't need this info today)
* The third part "_s4_" is the session
* The fourth part is the info to which group the animal belongs to: "rm" = R- (suscepptible) and "rp" = R+ (resilient)
* The fifth part is the frequency that has been used for the tone "12_4kHz" or "7_4kHz"

In [16]:
def print_column_dtypes(df):
    print("\n Column Data Types:")
    print("-" * 38)
    for col, dtype in df.dtypes.items():
        print(f"{col:<25} : {dtype}")
        print("-" * 38)
        
        
print_column_dtypes(data)


 Column Data Types:
--------------------------------------
time_s                    : float64
--------------------------------------
immobile                  : int64
--------------------------------------
csp                       : int64
--------------------------------------
csm                       : int64
--------------------------------------
id                        : int64
--------------------------------------
batch                     : object
--------------------------------------
session                   : object
--------------------------------------
rp_rm                     : object
--------------------------------------


## Information about the data: 
### The columns (of each key)

* 	time_s
* immobile

- CS+ (csp = 1/ True) is the "fear tone" - traumatic event is linked to this tone
- CS- (csm = 1/True) is the "safe tone" - nothing bad happened when this tone was on
  
* csp
* csm
* id
* batch
* session
* rp_rm

## For our analysis we want to get the freezing behavior

Freezing refers to the mouse being in enormous fear 

It is defined as "immobile state lasting for at least 2 seconds"

Meaning: we need all freezing events, that last 2 seconds or longer for each key (one animal recorded in one session)

# 2.) Add a new column called "freezing" to our data