# Reimer Lab Data

## TODO

- [ ] Understand meaning of each behavior, pupil, and walk data field in context of experiment
    - Need to re-read paper and look at `Reimer AV Data Notes.docx`
    - Also will probably need to talk to Mario about units
- [ ] Figure out how to connect behavior data with pupil and walk data
    - This will probably involve the `time_start` field in the behavior data and the `time` field in the pupil and walk data
- [ ] Build simple visualization of data

## Directory structure

- Main directory has `behavior.csv` file, which contains behavior data for all subjects and sessions
- Pupil and walk data for subjects (in this case, `W2372` and `W2402`) is stored in separate folders
- Each subject folder has `pupil_preprocessed.csv` and `walk.csv` files for each session, along with `behavior.xls` file storing behavior data for subject (this is redundant)
- Main directory also has `Reimer AV Data Notes.docx` file, which explains structure of task

```
AV_Data_CSHL
├── behavior.csv
├── W2372
│   ├── W2372_28_df_pupil_preprocessed.csv
│   ├── W2372_28_df_walk.csv
│   ├── W2372_29_df_pupil_preprocessed.csv
│   ├── W2372_29_df_walk.csv
│   └── W2372_behavior.xlsx
├── W2402
│   ├── W2402_10_df_pupil_preprocessed.csv
│   ├── W2402_10_df_walk.csv
│   ├── W2402_9_df_pupil_preprocessed.csv
│   ├── W2402_9_df_walk.csv
│   └── W2402_behavior.xlsx
└── Reimer\ AV\ Data\ Notes.docx
```

## Imports and settings

In [1]:
import numpy as np
import pandas as pd

In [2]:
pd.set_option('display.width', 100)

## Specify data to load

In [3]:
# Directory where data is stored
data_fpath = '/Users/cmcgrory/engel_lab/brainstate_dm/AV_Data_CSHL'

# Path to behavior file (contains data for all subjects/sessions)
behavior_fpath = f'{data_fpath}/behavior.csv'

# Subject and session to inspect
subject_id = 'W2372'
session_id = 28

# Paths to pupil and walk data for subject/session
prefix = f'{data_fpath}/{subject_id}/{subject_id}_{session_id}_df'
pupil_fpath = f'{prefix}_pupil_preprocessed.csv'
walk_fpath = f'{prefix}_walk.csv'

## Behavior data

### Questions

- **What values are the `rt` and `time_trial` columns storing?**
- **What are the units?**

### Format

| Column | Name           | Description |
|:-------|:---------------|:------------|
| 0      | None           | Row index   |
| 1      | `block`        |             |
| 2      | `trial`        |             |
| 3      | `trial_type`   |             |
| 4      | `reward`       |             |
| 5      | `trial_result` |             |
| 6      | `hit`          |             |
| 7      | `miss`         |             |
| 8      | `fa`           |             |
| 9      | `cr`           |             |
| 10     | `stimulus`     |             |
| 11     | `response`     |             |
| 12     | `correct`      |             |
| 13     | `rt`           |             |
| 14     | `subject_id`   |             |
| 15     | `session_id`   |             |
| 16     | `time_trial`   |             |

In [4]:
# Load behavior data for all subjects
df_behavior_all = pd.read_csv(behavior_fpath, header=0, index_col=0)

# Select data from specified subject
idx_subject = df_behavior_all['subject_id'] == subject_id
idx_session = df_behavior_all['session_id'] == session_id
df_behavior = df_behavior_all[idx_subject & idx_session]
print(df_behavior)

       block  trial  trial_type  reward  trial_result    hit   miss     fa     cr  stimulus  \
41735      0      1         3.0       0             1  False   True  False  False      True   
41736      0      2         3.0       0             1  False   True  False  False      True   
41737      0      3         3.0       4             0   True  False  False  False      True   
41738      0      4         3.0       4             0   True  False  False  False      True   
41739      0      5         2.0       4             0   True  False  False  False      True   
...      ...    ...         ...     ...           ...    ...    ...    ...    ...       ...   
42180      0    446         4.0       0             2  False  False   True  False     False   
42181      0    447         4.0       4             3  False  False  False   True     False   
42182      0    448         4.0       4             3  False  False  False   True     False   
42183      0    449         4.0       0           

## Pupil data

### Questions

- **What values are each of the columns storing?**
- **What are the units?**

### Format

| Column | Name         | Description |
|:-------|:-------------|:------------|
| 0      | None         | Row index   |
| 1      | `time`       |             |
| 2      | `pupil_x`    |             |
| 3      | `pupil_y`    |             |
| 4      | `blink`      |             |
| 5      | `pupil_raw`  |             |
| 6      | `pupil`      |             |
| 7      | `eyelid_raw` |             |
| 8      | `eyelid`     |             |

In [5]:
df_pupil = pd.read_csv(pupil_fpath, header=0, index_col=0)
print(df_pupil)

              time    pupil_x  pupil_y  blink  pupil_raw     pupil  eyelid_raw    eyelid
0         1.152785  340.02615  278.751    0.0   6722.788  0.341241   38137.438  0.832852
1         1.252785  340.02615  278.751    0.0   6722.788  0.341236   38137.438  0.832850
2         1.352785  340.02615  278.751    0.0   6722.788  0.341219   38137.438  0.832839
3         1.452785  340.02615  278.751    0.0   6722.788  0.341213   38137.438  0.832834
4         1.552785  340.02615  278.751    0.0   6722.788  0.341253   38137.438  0.832857
...            ...        ...      ...    ...        ...       ...         ...       ...
36098  3610.952785  340.02615  278.751    0.0   6722.788  0.341176   38137.438  0.832945
36099  3611.052785  340.02615  278.751    0.0   6722.788  0.341217   38137.438  0.832872
36100  3611.152785  340.02615  278.751    0.0   6722.788  0.341248   38137.438  0.832824
36101  3611.252785  340.02615  278.751    0.0   6722.788  0.341247   38137.438  0.832830
36102  3611.352785  3

## Walk data

### Questions

- **What values are each of the columns storing?**
- **What are the units?**

### Format

| Column | Name       | Description |
|:-------|:-----------|:------------|
| 0      | None       | Row index   |
| 1      | `time`     |             |
| 2      | `velocity` |             |
| 3      | `distance` |             |

In [6]:
df_walk = pd.read_csv(walk_fpath, header=0, index_col=0)
print(df_walk)

         time  velocity  distance
0         0.0  0.000798  0.000013
1         0.1  0.013860  0.000719
2         0.2  0.025089  0.001980
3         0.3  0.022633  0.002033
4         0.4  0.020176  0.002086
...       ...       ...       ...
36132  3613.2  0.000000  6.612637
36133  3613.3  0.000000  6.612637
36134  3613.4  0.000000  6.612637
36135  3613.5  0.000000  6.612637
36136  3613.6  0.000000  6.612637

[36137 rows x 3 columns]
