# Data Ingestion & Processing

This notebook provides the step-by-step instruction of ingesting the data from the source and processing it to make it ready for the query and further analysis.

Prerequisites: complete pipeline installation and configuration. See instruction here: [Pipeline Deployment](../PIPELINE_LOCAL_DEPLOYMENT.md)

There are 3 main steps in this notebook:
1. Create a new experiment
2. Manual insert of subject information for the experiment
3. Run automated ingestion & processing

## Step 1 - Create a new experiment

This step assumes that you have downloaded the data for this experiment and configured the path correctly (see prerequisites above)

The released data is for experiment named: `social0.2-aeon3`

The following command will insert a new entry for `social0.2-aeon3` experiment into `acquisition.Experiment` table as well as other relevant meta information


In [None]:
import datajoint as dj

from aeon.dj_pipeline import subject, acquisition
from aeon.dj_pipeline.create_experiments import create_socialexperiment

In [None]:
experiment_name = "social0.2-aeon3"

create_socialexperiment(experiment_name)

In [17]:
# Check `Experiment` table
acquisition.Experiment()

experiment_name  e.g exp0-aeon3,experiment_start_time  datetime of the start of this experiment,experiment_description,arena_name  unique name of the arena (e.g. circular_2m),lab  Abbreviated lab name,location,experiment_type
social0.2-aeon3,2024-03-01 16:46:12,Social0.2 experiment on AEON3 machine,circle-2m,SWC,AEON3,social


In [13]:
acquisition.Experiment.Directory()

experiment_name  e.g exp0-aeon3,directory_type,repository_name,directory_path,load_order  order of priority to load the directory
social0.2-aeon3,processed,ceph_aeon,aeon/data/processed/AEON3/social0.2,0
social0.2-aeon3,raw,ceph_aeon,aeon/data/raw/AEON3/social0.2,1


## Step 2 - Insert Subjects

The experiment "social0.2-aeon3" features two participating animals:
- BAA-1104045
- BAA-1104047

Let's add them

In [10]:
subject_list = [
    {'subject': 'BAA-1104045',
     'sex': 'U',
     'subject_birth_date': '2024-01-01',
     'subject_description': 'Subject for Social 0.2 experiment'},
    {'subject': 'BAA-1104047',
     'sex': 'U',
     'subject_birth_date': '2024-01-01',
     'subject_description': 'Subject for Social 0.2 experiment'}
]

In [11]:
subject.Subject.insert(subject_list, skip_duplicates=True)

In [15]:
subject_experiment_list = [
    {'experiment_name': 'social0.2-aeon3', 'subject': 'BAA-1104045'},
    {'experiment_name': 'social0.2-aeon3', 'subject': 'BAA-1104047'}
]

In [16]:
acquisition.Experiment.Subject.insert(subject_experiment_list, skip_duplicates=True)

In [18]:
# Check Experiment.Subject table
acquisition.Experiment.Subject()

experiment_name  e.g exp0-aeon3,subject
social0.2-aeon3,BAA-1104045
social0.2-aeon3,BAA-1104047


## Step 3 - Data Ingestion & Processing

Data ingestion and processing is fully automated in a few prepared routines below

Data ingestion/populate with DataJoint is idempotent, so it is safe to run the same command multiple times.

In [None]:
from aeon.dj_pipeline.populate.worker import AutomatedExperimentIngestion, acquisition_worker, streams_worker, analysis_worker

In [None]:
AutomatedExperimentIngestion.insert1({'experiment_name': 'social0.2-aeon3'}, skip_duplicates=True)

In [None]:
acquisition_worker.run()

In [None]:
streams_worker.run()

In [None]:
analysis_worker.run()