# Process calcium imaging data with DataJoint Elements

This notebook will walk through processing two-photon calcium imaging data collected
from ScanImage and processed with Suite2p.

The DataJoint Python API and Element Calcium Imaging offer a lot of features to support collaboration, automation, reproducibility, and visualizations.

For more information on these topics, please visit our documentation: 
 
- [DataJoint Core](https://datajoint.com/docs/core/): General principles

- DataJoint [Python](https://datajoint.com/docs/core/datajoint-python/) and
  [MATLAB](https://datajoint.com/docs/core/datajoint-matlab/) APIs: in-depth reviews of
  specifics

- [DataJoint Element Calcium Imaging](https://datajoint.com/docs/elements/element-calcium-imaging/):
  A modular pipeline for calcium imaging analysis

In [None]:
# Import Statements

import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

import datajoint as dj
import datetime
import matplotlib.pyplot as plt
import numpy as np

### The Basics:

Any DataJoint workflow can be broken down into basic 3 parts:

- `Insert`
- `Populate` (or process)
- `Analyze`

In this demo we will:
- `Insert` information about an animal subject, recording session, and the parameters related
  to processing calcium imaging data through Suite2p or CaImAn.
- `Populate` tables with outputs of image processing including motion correction,
  segmentation, mask classification, and activity traces.
- `Analyze` the processed data by querying and plotting activity traces.

Each of these topics will be explained thoroughly in this notebook

**Please Note**: Some cells in the notebook are hidden for clarity and readability of
the notebook. Please feel free to expand all the cells and explore the code in detail to
understand how you can use DataJoint Elements for your own dataset.

### Initial Setup: Define schema prefixes and root data directory

+ If we set prefix to `neuro_`, every schema created with the current workflow will start with `neuro_`, e.g. `neuro_lab`, `neuro_scan`, `neuro_imaging`, etc.

+ The example dataset is mounted to this GitHub Codespace environment. Feel free to explore it in the `example_data` directory.

In [None]:
dj.config["custom"] = {"database.prefix": "neuro_",
                       "imaging_root_data_dir": os.getenv('DJ_PUBLIC_S3_MOUNT_PATH', 'example_data')}

## Activate DataJoint Elements

+ The current workflow is composed of multiple database schemas, each of them corresponds to a module within `workflow_calcium_imaging.pipeline`

In [None]:
from workflow_calcium_imaging.pipeline import lab, subject, session, scan, imaging, Equipment

## Workflow diagram

This workflow is assembled from 4 DataJoint elements:
+ [element-lab](https://github.com/datajoint/element-lab)
+ [element-animal](https://github.com/datajoint/element-animal)
+ [element-session](https://github.com/datajoint/element-session)
+ [element-calcium-imaging](https://github.com/datajoint/element-calcium-imaging)

The schema diagram is a good reference for understanding the order in which tables
within the workflow should receive data manually using `insert()` or automatically using `populate()`. 


In [None]:
(
    dj.Diagram(subject.Subject)
    + dj.Diagram(session.Session)
    + dj.Diagram(scan)
    + dj.Diagram(imaging)
)

### Insert entries into manual tables and populate automated tables

To view details about a table's dependencies and attributes, use functions `.describe()`
and `.heading`, respectively.

In [None]:
subject.Subject.describe;

In [None]:
subject.Subject.heading

The cells above show all attribute of the subject table. We will insert data into the
`subject.Subject` table.

In [None]:
subject.Subject.insert1(
    dict(
        subject="subject1",
        sex="F",
        subject_birth_date="2020-01-01",
        subject_description="ScanImage acquisition. Suite2p processing.",
    )
)

Insert data into the `Equipment` table

In [None]:
Equipment.insert1(dict(scanner="ScanImage"))

In [None]:
session.Session.describe;

In [None]:
session.Session.heading

The cells above show the dependencies and attributes for the `session.Session` table.


Here we will demonstrate a very useful way of inserting data by assigning the dictionary to a
variable `session_key`. This variable can be used to insert entries into tables that
contain the `Session` table as one of its attributes.

In [None]:
session_key = dict(subject="subject1", session_datetime="2021-04-30 12:22:15.032")

session.Session.insert1(session_key)

In [None]:
session.SessionDirectory.insert1(
    dict(**session_key,
        session_dir="subject1/session1")
)

In [None]:
scan.Scan.insert1(
    dict(
        **session_key,
        scan_id=0,
        scanner="ScanImage",
        acq_software="ScanImage",
        scan_notes="",
    )
)

In [None]:
populate_settings = {
                    "display_progress": True,
                    "reserve_jobs": False,
                    "suppress_errors": False,
                    }

In [None]:
# duration depends on your network bandwidth to s3
scan.ScanInfo.populate(**populate_settings)

In [None]:
import suite2p

params_suite2p = suite2p.default_ops()
params_suite2p['nonrigid']=False

imaging.ProcessingParamSet.insert_new_params(
    processing_method="suite2p",
    paramset_idx=0,
    params=params_suite2p,
    paramset_desc="Calcium imaging analysis with Suite2p using default Suite2p parameters",
)

In [None]:
imaging.ProcessingTask.insert1(
    dict(
        **session_key,
        scan_id=0,
        paramset_idx=0,
        task_mode='load', # load or trigger
        processing_output_dir="subject1/session1/suite2p",
    )
)

In [None]:
imaging.Processing.populate(**populate_settings)

In [None]:
imaging.Curation.insert1(
    dict(
        **session_key,
        scan_id=0,
        paramset_idx=0,
        curation_id=0,
        curation_time="2021-04-30 12:22:15.032",
        curation_output_dir="subject1/session1/suite2p",
        manual_curation=False,
        curation_note="",
    )
)

In [None]:
imaging.MotionCorrection.populate(**populate_settings)

In [None]:
imaging.Segmentation.populate(**populate_settings)

In [None]:
imaging.Fluorescence.populate(**populate_settings)

In [None]:
imaging.Activity.populate(**populate_settings)

## Query, fetch, and plot segmentations on average image

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
average_image = (
    imaging.MotionCorrection.Summary & session_key & "field_idx=0"
).fetch1("average_image")

In [None]:
mask_xpix, mask_ypix = (
    imaging.Segmentation.Mask * imaging.MaskClassification.MaskType
    & session_key
    & "mask_center_z=0"
    & "mask_npix > 130"
).fetch("mask_xpix", "mask_ypix")

In [None]:
mask_image = np.zeros(np.shape(average_image), dtype=bool)
for xpix, ypix in zip(mask_xpix, mask_ypix):
    mask_image[ypix, xpix] = True

In [None]:
plt.imshow(average_image)
plt.contour(mask_image, colors="white", linewidths=0.5);

# Drop schemas

+ Schemas are not typically dropped in a production workflow with real data in it. 
+ At the developmental phase, it might be required for the table redesign.
+ When dropping all schemas is needed, the following is the dependency order.

In [None]:
def drop_databases(databases):
    import pymysql.err
    conn = dj.conn()

    with dj.config(safemode=False):
        for database in databases:
            schema = dj.Schema(f'{dj.config["custom"]["database.prefix"]}{database}')
            while schema.list_tables():
                for table in schema.list_tables():
                    try:
                        conn.query(f"DROP TABLE `{schema.database}`.`{table}`")
                    except pymysql.err.OperationalError:
                        print(f"Can't drop `{schema.database}`.`{table}`. Retrying...")
            schema.drop()

# drop_databases(databases=['imaging_report', 'imaging', 'scan', 'session', 'subject', 'lab', 'reference'])