# DataJoint Elements for 2-Photon Calcium Imaging

#### Open-source data pipeline for processing and analyzing fluorescent imaging datasets.

Welcome to the tutorial for the DataJoint Element for calcium imaging. This tutorial
aims to provide a comprehensive understanding of the open-source data pipeline created
using `element-calcium-imaging`.

This package is designed to seamlessly process, ingest, and track calcium imaging data,
along with its associated parameters such as those used for image segmentation or motion
correction, and scan-level metadata. By the end of this tutorial, you will have a clear
grasp on setting up and integrating `element-calcium-imaging` into your specific
research projects and lab.

![flowchart](../images/flowchart.svg)

### Prerequisites

Please see the [datajoint tutorials GitHub
repository](https://github.com/datajoint/datajoint-tutorials/tree/main) before
proceeding.

A basic understanding of the following DataJoint concepts will be beneficial to your
understanding of this tutorial: 
1. The `Imported` and `Computed` tables types in `datajoint-python`.
2. The functionality of the `.populate()` method. 

#### **Tutorial Overview**

+ Setup
+ *Activate* the DataJoint pipeline.
+ *Insert* subject, session, and scan metadata.
+ *Populate* scan-level metadata from image files.
+ Run the image processing task.
+ Curate the results (optional).
+ Visualize the results.

### **Setup**

This tutorial examines calcium imaging data acquired with [ScanImage](https://www.mbfbioscience.com/products/scanimage/) and processed via
[suite2p](https://www.suite2p.org/). The goal is to store, track, and manage sessions of calcium imaging data,
including all outputs of image segmentations, fluorescence traces and deconvolved
activity traces. 

The results of this Element can be combined with **other modalities** to create
a complete, customizable data pipeline for your specific lab or study. For instance, you
can combine `element-calcium-imaging` with `element-array-ephys` and
`element-deeplabcut` to characterize the neural activity along with markless
pose-estimation during behavior.

Let's start this tutorial by importing the packages necessary to run the notebook.

In [None]:
import datajoint as dj
import datetime
import matplotlib.pyplot as plt
import numpy as np

If the tutorial is run in Codespaces, a private, local database server is created and
made available for you. This is where we will insert and store our processed results.
let's connect to the database server.

In [None]:
dj.conn()

### **Activate the DataJoint Pipeline**

This tutorial activates the `imaging.py` module from `element-calcium-imaging`, along
with upstream dependencies from `element-animal` and `element-session`. Please refer to the
[`tutorial_pipeline.py`](./tutorial_pipeline.py) for the source code.

In [None]:
from tests.tutorial_pipeline import (
    lab,
    subject,
    session,
    scan,
    imaging,
    imaging_report,
    Equipment,
)

We can represent the tables in the `scan` and `imaging` schemas as well as some of the
upstream dependencies to `session` and `subject` schemas as a diagram.

In [None]:
(
    dj.Diagram(subject.Subject)
    + dj.Diagram(session.Session)
    + dj.Diagram(scan)
    + dj.Diagram(imaging)
)

As evident from the diagram, this data pipeline encompasses tables associated with
scan metadata, results of image processing, and optional curation of image processing
results. A few tables, such as `subject.Subject` or `session.Session`,
while important for a complete pipeline, fall outside the scope of the `element-calcium-imaging`
tutorial, and will therefore, not be explored extensively here. The primary focus of
this tutorial will be on the `scan` and `imaging` schemas.

### **Insert subject, session, and probe metadata**

Let's start with the first table in the schema diagram (i.e. `subject.Subject` table).

To know what data to insert into the table, we can view its dependencies and attributes using the `.describe()` and `.heading` methods.

In [None]:
subject.Subject()

In [None]:
print(subject.Subject.describe())

In [None]:
subject.Subject.heading

The cells above show all attributes of the subject table.
We will insert data into the
`subject.Subject` table. 

In [None]:
subject.Subject.insert1(
    dict(
        subject="subject1",
        subject_nickname="subject1_nickname",
        sex="F",
        subject_birth_date="2020-01-01",
        subject_description="ScanImage acquisition. Suite2p processing.",
    )
)
subject.Subject()

Let's repeat the steps above for the `Session` table and see how the output varies
between `.describe` and `.heading`.

In [None]:
print(session.Session.describe())

In [None]:
session.Session.heading

Notice that `describe`, displays the table's structure and highlights its dependencies, such as its reliance on the `Subject` table. These dependencies represent foreign key references, linking data across tables.

On the other hand, `heading` provides an exhaustive list of the table's attributes. This
list includes both the attributes declared in this table and any inherited from upstream
tables.

With this understanding, let's move on to insert a session associated with our subject.

We will insert into the `session.Session` table by passing a dictionary to the `insert1` method.

In [None]:
session_key = dict(subject="subject1", session_datetime="2021-04-30 12:22:15")

In [None]:
session.Session.insert1(session_key)
session.Session()

Every experimental session produces a set of data files. The purpose of the `SessionDirectory` table is to locate these files. It references a directory path relative to a root directory, defined in `dj.config["custom"]`. More information about `dj.config` is provided in the [documentation](https://datajoint.com/docs/elements/user-guide/).

In [None]:
session.SessionDirectory.insert1(dict(**session_key, session_dir="subject1/session1"))
session.SessionDirectory()

As the Diagram indicates, the tables in the `scan` schemas need to
contain data before the tables in the `imaging` schema accept any data. Let's
start by inserting into `scan.Scan`, a table containing metadata about a calcium imaging
scan. 

In [None]:
print(scan.Scan.describe())

The `Scan` table's attributes include the `Session` table **and** the `Equipment` table.
Let's insert into the `Equipment` table and then `Scan`.

In [None]:
Equipment.insert1(
    dict(
        device="Scanner1",
        modality="Calcium imaging",
        description="Example microscope",
    )
)

In [None]:
scan.Scan.insert1(
    dict(
        **session_key,
        scan_id=0,
        device="Scanner1",
        acq_software="ScanImage",
        scan_notes="",
    )
)
scan.Scan()

### **Populate calcium imaging scan metadata**

In the upcoming cells, the `.populate()` method will automatically extract and store the
recording metadata for each experimental session in the `scan.ScanInfo` table and its part table `scan.ScanInfo.Field`.

In [None]:
scan.ScanInfo()

In [None]:
scan.ScanInfo.Field()

In [None]:
# duration depends on your network bandwidth to s3
scan.ScanInfo.populate(display_progress=True)

Let's view the information was entered into each of these tables.

In [None]:
scan.ScanInfo()

In [None]:
scan.ScanInfo.Field()

### **Run the Processing Task**

We're almost ready to perform image processing with `suite2p`. An important step before
processing is managing the parameters which will be used in that step. To do so, we will
define the suite2p parameters in a dictionary and insert them into a DataJoint table
`ProcessingParamSet`. This table keeps track of all combinations of your image
processing parameters. You can choose which parameter are used during processing in a
later step. 

Let's view the attributes and insert data into `imaging.ProcessingParamSet`.

In [None]:
imaging.ProcessingParamSet.heading

In [None]:
import suite2p

params_suite2p = suite2p.default_ops()
params_suite2p["nonrigid"] = False

imaging.ProcessingParamSet.insert_new_params(
    processing_method="suite2p",
    paramset_idx=0,
    params=params_suite2p,
    paramset_desc="Calcium imaging analysis with Suite2p using default parameters",
)

DataJoint uses a `ProcessingTask` table to manage which `Scan` and `ProcessingParamSet`
should be used during processing. 

This table is important for defining several important aspects of downstream processing.
Let's view the attributes to get a better understanding. 

In [None]:
imaging.ProcessingTask.heading

The `ProcessingTask` table contains two important attributes: 
+ `paramset_idx` - Allows the user to choose the parameter set with which
you want to run image processing.
+ `task_mode` - Can be set to `load` or `trigger`. When set to `load`,
running the processing step initiates a search for existing output files of the image
processing algorithm defined in `ProcessingParamSet`. When set to `trigger`, the
processing step will run image processing on the raw data. 

In [None]:
imaging.ProcessingTask.insert1(
    dict(
        **session_key,
        scan_id=0,
        paramset_idx=0,
        task_mode="load",  # load or trigger
        processing_output_dir="subject1/session1/suite2p",
    )
)

Let's call populate on the `Processing` table, which checks for Suite2p results since `task_mode=load`.

In [None]:
imaging.Processing.populate(session_key, display_progress=True)

### **Populate the results**

Once the `Processing` table finishes, we can populate the remaining tables in the
workflow including `MotionCorrection`, `Segmentation`, and `Fluorescence`.

In [None]:
imaging.MotionCorrection.populate(display_progress=True)
imaging.Segmentation.populate(display_progress=True)
imaging.Fluorescence.populate(display_progress=True)
imaging.Activity.populate(display_progress=True)
imaging_report.ScanLevelReport.populate(display_progress=True)
imaging_report.TraceReport.populate(display_progress=True)

Now that we've populated the tables in this DataJoint pipeline, there are one of
several next steps. If you have an existing pipeline for
aligning waveforms to behavior data or other stimuli, you can easily
invoke `element-event` or define your custom DataJoint tables to extend the
pipeline.

### **Visualize the results**

In this tutorial, we will do some exploratory analysis by fetching the data from the database and creating a few plots.

Next, we will fetch the `fluorescence` attribute for `mask=10` with the `fetch1` method by passing the attribute as an argument to the method.

By default, `fetch1()` returns all attributes of one of the entries in the table.  If a query has multiple entries, `fetch1()` imports the first entry in the table.

In [None]:
trace = (imaging.Fluorescence.Trace & "mask = '10'").fetch1("fluorescence")

In the query above, we fetch the fluorescence trace from the `Trace` part table
belonging to the `Fluorescence` parent table. 

Let's plot this trace after fetching sampling rate of the data to define the x-axis values.

In [None]:
sampling_rate = (scan.ScanInfo & session_key & "scan_id=0").fetch1("fps")

In [None]:
plt.plot(np.r_[: trace.size] * 1 / sampling_rate, trace)
plt.title("Fluorescence trace for mask 10")
plt.xlabel("Time (s)")
plt.ylabel("Activity (a.u.)")

DataJoint queries are a highly flexible tool to manipulate and visualize your data.
After all, visualizing traces or generating rasters is likely just the start of
your analysis workflow. This can also make the queries seem more complex at
first. However, we'll walk through them slowly to simplify their content in this notebook. 

The examples below perform several operations using DataJoint queries:
- Fetch the primary key attributes of the scan with `scan_id=0`.
- Use **multiple restrictions** to fetch the average motion-corrected image for this
  scan with `field_idx=0`.
- Use a **join** operation and **multiple restrictions** to fetch ROI mask coordinates
  and overlay them on the average motion-corrected image.

In [None]:
imaging.MotionCorrection.Summary & session_key & "scan_id=0" & "field_idx=0"

In [None]:
average_image,max_proj_image = (imaging.MotionCorrection.Summary & session_key & "scan_id=0" & "field_idx=0").fetch1(
    "average_image","max_proj_image"
)

In [None]:
# Plotting
fig, ax = plt.subplots(1, 2, figsize=(8, 16))
ax[0].imshow(average_image)
ax[1].imshow(max_proj_image)
plt.tight_layout()
plt.show()

We will fetch the segmentation mask coordinates and overlay them on the average image:

In [None]:
mask_xpix, mask_ypix = (
    imaging.Segmentation.Mask * imaging.MaskClassification.MaskType
    & scan_key
    & "mask_center_z=0"
    & "mask_npix > 130"
    & "confidence >= 0.8"
).fetch("mask_xpix", "mask_ypix")

In [None]:
#Plotting
fig = plt.figure(figsize=(8, 6))
plt.imshow(average_image,cmap='binary')
for i in range(len(mask_xpix)):
    plt.scatter(mask_xpix[i], mask_ypix[i], s=1, alpha=0.04)

plt.tight_layout()
plt.show()

## Summary

Following this tutorial, we have: 
+ Covered the essential functionality of `element-calcium-imaging`.
+ Learned how to manually insert data into tables.
+ Executed and ingested results of image processing with `suite2p`.
+ Visualized the results. 

#### Documentation and DataJoint Tutorials

+ [Detailed documentation on
  `element-calcium-imaging`.](https://datajoint.com/docs/elements/element-calcium-imaging/)
+ [General `datajoint-python`
  tutorials.](https://github.com/datajoint/datajoint-tutorials) covering fundamentals,
  such as table tiers, query operations, fetch operations, automated computations with the
  make function, and more.
+ [Documentation for
  `datajoint-python`.](https://datajoint.com/docs/core/datajoint-python/)

##### Run this tutorial on your own data

To run this tutorial notebook on your own data, please use the following steps:
+ Download the [mysql-docker image for
  DataJoint](https://github.com/datajoint/mysql-docker) and run the container according
  to the instructions provide in the repository.
+ Create a fork of this repository to your GitHub account.
+ Clone the repository and open the files using your IDE.
+ Add a code cell immediately after the first code cell in the notebook - we will setup
  the local connection using this cell. In this cell, type in the following code. 

```python
import datajoint as dj
dj.config["database.host"] = "localhost"
dj.config["database.user"] = "<your-username>"
dj.config["database.password"] = "<your-password>"
dj.config["custom"] = {"imaging_root_data_dir": "path/to/your/data/dir",
"database_prefix": "<your-username_>"}
dj.config.save_local()
dj.conn()
```

+ Run the code block above and proceed with the rest of the notebook.