# Trace Fear Conditioning Analysis Pipeline

This notebook is a prototype for an analysis pipeline for the Trace Fear Conditioning rig:

* <https://github.com/GergelyTuri/tFC-rig>

The goal of the pipeline and its supporting code is to:

* Fix inconsistent folder and file names
* Inspect data for inconsistencies (and fix some of these)
* Extract features from the data into a usable format (Pandas data frames)
* Provide a variety of exploratory or numerical data analysis tools to gain potential insights from the data

## Assumptions

The analysis pipeline code makes the following assumptions:

* You are executing the code from a Google Colaboratory notebook
* Your data is stored on Google Drive

## Notebook Setup

Using the `tFC-rig` analysis pipeline code requires a few notebook-specific steps. This section will typically be copied into any notebook analysing rig data.

In [None]:
import os
import sys

# NOTE: if you change this, you may be breaking certain assumptions this
# code was deveveloped on. Consider stepping through each section carefully and
# avoiding renaming any files or directories until observing the data file name
# patterns
DATA_ROOT = "/gdrive/Shareddrives/Turi_lab/Data/aging_project/"

# Google Colaboratory executes in an environment with a file system
# that has a Linux topography, but where the user should work under
# the `/content` directory
COLAB_ROOT = "/content"

REPO_URL = "https://github.com/GergelyTuri/tFC-rig.git"
REPO_ROOT = os.path.join(COLAB_ROOT, REPO_URL.split("/")[-1].split(".")[0])
REPO_BRANCH = "cpr/analysis-pipeline"

# Clones the `tFC-rig` repository at `/content/tFC-rig`
if not os.path.exists(REPO_ROOT):
  os.chdir(COLAB_ROOT)
  !git clone {REPO_URL}

# Pulls the latest code from the provided branch and adds the
# analysis pipeline source code to the Python system path
os.chdir(REPO_ROOT)
!git pull
!git checkout {REPO_BRANCH}
sys.path.append(os.path.join(REPO_ROOT, "Analysis"))
os.chdir(COLAB_ROOT)

In [None]:
from tfcrig.notebook import Notebook

notebook = Notebook(
    file_root=COLAB_ROOT,
    repo_url=REPO_URL,
    repo_root=REPO_ROOT,
    repo_branch=REPO_BRANCH,
    data_root=DATA_ROOT,
    max_cell_height=1000,
)
notebook.setup()

## Data Prep

Check, prep, and clean data

### Check Folders and Files

Perform a set of checks on the provided data folders and files. This can reveal issues in data collection prior to running an analysis

In [None]:
from tfcrig.files import RigFiles

files = RigFiles(
    data_root=DATA_ROOT,
    dry_run=True,
)
files.check()

### Prep Data for Cleaning

Prior to cleaning the data, which can be made to modify folder and file names as well as data files themselves, ensure that a copy of each JSON file exists.

In [None]:
files.prep()

### Clean Folders and Files

Runs a set of methods that cleans the data.

In [None]:
files.clean()

## Analysis

Define an analysis, which extracts features from the data and provides plotting methods to visualize and analyze the data. This requires pulling and processing data from all files, and can therefore be time consuming.

In [None]:
from tfcrig.analysis import Analysis

analysis = Analysis(
    data_root=DATA_ROOT,
    verbose=False,
)

### Analysis Info

It can be useful to check the `info` on the `Analysis` object to confirm that the columns are all not null, the column names are what we expect from the rig data, and the memory usage is reasonable. If memory usage becomes unmanageable, it may be necessary to optimize it by storing processed data in files and only working with feature data (for example).

In [None]:
analysis.info()

### Licks Over Time

First example plot of licks over time separated by trial type.

In [None]:
analysis.summarize_licks_per_session(
    mouse_ids=["88_1", "88_2", "88_3", "89_1", "89_2"],
    min_session=20240103000000,
    water_on=False,
    tail_length=12,
)

In [None]:
analysis.summarize_licks_per_session(
    mouse_ids=["88_1", "88_2", "88_3", "89_1", "89_2"],
    min_session=20240103000000,
    water_on=True,
    tail_length=12,
)

In [None]:
analysis.learning_rate_heat_map(
    mouse_ids=[
        # "74_1", "74_2", "74_4", "74_5",
        # "75_1", "75_2", "75_4", "75_5",
        "88_1", "88_2", "88_3", "89_1", "89_2",
        "90_1", "90_2", "90_5",
        "91_1", "91_2", "91_5",
        "92_1", "92_2",
        "93_1", "93_2", "93_5",
    ],
    min_session=0,
    water_on=False,
    tail_length=12,
)