# DataJoint Element for Calcium Imaging

Open-source data pipeline to automate analyses and organize data

<img src =../docs/src/images/rawscans.gif title="value" width="200" height="200"/>
<img src =../docs/src/images/motioncorrectedscans.gif width="200" height="200"/>
<img src =../docs/src/images/cellsegmentation.png width="200" height="200"/>
<img src =../docs/src/images/calciumtraces.png width="200" height="200"/> 

Left to right: Raw scans, Motion corrected scans, Cell segmentations, Calcium events

In this tutorial, we will walk through processing two-photon calcium imaging data collected from ScanImage and processed with Suite2p.

We will explain the following concepts as they relate to this pipeline:
- What is an Element versus a pipeline?
- Plot the pipeline with `dj.Diagram`
- Insert data into tables
- Query table contents
- Fetch table contents
- Run the pipeline for your experiments

For detailed documentation and tutorials on general DataJoint principles that support collaboration, automation, reproducibility, and visualizations:

- [DataJoint for Python - Interactive Tutorials](https://github.com/datajoint/datajoint-tutorials) - Fundamentals including table tiers, query operations, fetch operations, automated computations with the `make` function, etc.

- [DataJoint for Python - Documentation](https://datajoint.com/docs/core/datajoint-python/)

- [DataJoint Element for Calcium Imaging - Documentation](https://datajoint.com/docs/elements/element-calcium-imaging/)

Let's start by importing the packages necessary to run this pipeline. 

In [None]:
import datajoint as dj
import datetime
import matplotlib.pyplot as plt
import numpy as np

## Combine multiple Elements into a pipeline

Each DataJoint Element is a modular set of tables that can be combined into a complete pipeline.

Each Element contains 1 or more modules, and each module declares its own schema in the database.

This tutorial pipeline is assembled from four DataJoint Elements.

| Element | Source Code | Documentation | Description |
| -- | -- | -- | -- |
| Element Lab | [Link](https://github.com/datajoint/element-lab) | [Link](https://datajoint.com/docs/elements/element-lab) | Lab management related information, such as Lab, User, Project, Protocol, Source. |
| Element Animal | [Link](https://github.com/datajoint/element-animal) | [Link](https://datajoint.com/docs/elements/element-animal) | General animal metadata and surgery information. |
| Element Session | [Link](https://github.com/datajoint/element-session) | [Link](https://datajoint.com/docs/elements/element-session) | General information of experimental sessions. |
| Element Calcium Imaging | [Link](https://github.com/datajoint/element-calcium-imaging) | [Link](https://datajoint.com/docs/elements/element-calcium-imaging) |  Calcium imaging analysis with Suite2p, CaImAn, and EXTRACT. |

By importing the modules for the first time, the schemas and tables will be created in the database.  Once created, importing modules will not create schemas and tables again, but the existing schemas/tables can be accessed.

The Elements are imported and activated within the `tutorial_pipeline` script.

In [None]:
from tutorial_pipeline import lab, subject, session, scan, imaging, imaging_report, Equipment

Each Python module (e.g. `subject`) contains a schema object that enables interaction with the schema in the database.

In [None]:
subject.schema

The Python classes in the module correspond to a table in the database server.

In [None]:
subject.Subject()

## Diagram

Let's plot the diagram of tables within multiple schemas and their dependencies using `dj.Diagram()` (see [Diagram docs](https://datajoint.com/docs/core/concepts/getting-started/diagrams/)).

In [None]:
(
    dj.Diagram(subject.Subject)
    + dj.Diagram(session.Session)
    + dj.Diagram(scan)
    + dj.Diagram(imaging)
)

#### Table Types

There are 5 table types in DataJoint.  Each of these appear in the diagram above.

| Table tier | Color and shape | Description |
| -- | -- | -- |
| Manual table | Green box | Data entered from outside the pipeline, either by hand or with external helper scripts. |
| Lookup table | Gray box | Small tables containing general facts and settings of the data pipeline; not specific to any experiment or dataset. |  
| Imported table | Blue oval | Data ingested automatically inside the pipeline but requiring access to data outside the pipeline. |
| Computed table | Red circle | Data computed automatically entirely inside the pipeline. |
| Part table | Plain text | Part tables share the same tier as their master table. |

The diagram becomes clear when it's approached as a hierarchy of tables that define the order in which the pipeline expects to receive data in each of the tables. 

The tables higher up in the diagram such as `subject.Subject()` should be the first to receive data.

Data is manually entered into the green rectangular tables with the `insert1()` method.

Tables connected by a line depend on entries from the table above it.
 
Tables with a purple oval or red circle will be automatically filled with relevant data
  by calling `populate()`. For example `scan.ScanInfo` and its part-table
  `scan.ScanInfo.Field` are both populated with `scan.ScanInfo.populate()`.

## DataJoint Basics

DataJoint pipelines can be run with four commands:
- `Insert`
- `Populate`
- `Query`
- `Fetch`

In this demo we will:
- `Insert` metadata about a subject, recording session, and 
  parameters related to processing calcium imaging data through Suite2p.

- `Populate` tables with outputs of image processing including motion correction,
  segmentation, fluorescence traces and deconvolved activity traces.

- `Query` the data from the database.

- `Fetch` and plot calcium activity traces.


## Insert entries into manual tables

Let's start with the first table in the schema diagram (i.e. `subject.Subject` table).

To know what data to insert into the table, we can view its dependencies and attributes using the `.describe()` and `.heading` functions.

In [None]:
print(subject.Subject.describe())

In [None]:
subject.Subject.heading

The cells above show all attributes of the subject table.
We will insert data into the
`subject.Subject` table. 

In [None]:
subject.Subject.insert1(
    dict(
        subject="subject1",
        sex="F",
        subject_birth_date="2020-01-01",
        subject_description="ScanImage acquisition. Suite2p processing.",
    )
)
subject.Subject()

Let's repeat the steps above for the `Session` table.

In [None]:
print(session.Session.describe())

In [None]:
session.Session.heading

Notice that `describe` displays the table definition with the dependencies (i.e. foreign key references). The `Session` table depends on the upstream `Subject` table. 

Whereas `heading` displays all the attributes of the table definition, regardless of
whether they are declared in an upstream table.

Next we can insert in the `session.Session` table by passing a dictionary to the `insert1` method.

In [None]:
session_key = dict(subject="subject1", session_datetime="2021-04-30 12:22:15")

In [None]:
session.Session.insert1(session_key)
session.Session()

The `SessionDirectory` table locates the relevant data files in a directory path
relative to the root directory defined in your `dj.config["custom"]`. More
information about `dj.config` is provided in the [User Guide](https://datajoint.com/docs/elements/user-guide/).

In [None]:
session.SessionDirectory.insert1(dict(**session_key, session_dir="subject1/session1"))
session.SessionDirectory()

Next, we'll use `describe` and `heading` for the Scan table. Do you notice anything we
might have missed here? 

In [None]:
print(scan.Scan.describe())

The `Scan` table's attributes include the `Session` table **and** the `Equipment` table.
Let's insert into the `Equipment` table and then `Scan`.

In [None]:
Equipment.insert1(dict(device="Mesoscope1", modality="Calcium imaging"))

In [None]:
scan.Scan.insert1(
    dict(
        **session_key,
        scan_id=0,
        device="Mesoscope1",
        acq_software="ScanImage",
        scan_notes="",
    )
)
scan.Scan()

## Automatically populate tables

`scan.ScanInfo` is the first table in the pipeline that can be populated automatically with the `populate` method.

If a table contains a Part table, `populate()` inserts data into both.

Let's populate the `scan.ScanInfo` and its Part table `scan.ScanInfo.Field`.

In [None]:
scan.ScanInfo.heading

In [None]:
scan.ScanInfo.Field.heading

In [None]:
scan.ScanInfo()

In [None]:
scan.ScanInfo.Field()

In [None]:
# duration depends on your network bandwidth to s3
scan.ScanInfo.populate(display_progress=True)

Let's view the information was entered into these tables.

In [None]:
scan.ScanInfo()

In [None]:
scan.ScanInfo.Field()

Let's define the Suite2p parameters by making an entry in the `ProcessingParamSet` table.

In [None]:
import suite2p

params_suite2p = suite2p.default_ops()
params_suite2p["nonrigid"] = False

imaging.ProcessingParamSet.insert_new_params(
    processing_method="suite2p",
    paramset_idx=0,
    params=params_suite2p,
    paramset_desc="Calcium imaging analysis with Suite2p using default parameters",
)

The `ProcessingTask` table is used to select the `ProcessingParamSet` entry that is used to process a selected `Scan` entry in the downstream tables.

In [None]:
print(imaging.ProcessingTask.describe())

In [None]:
imaging.ProcessingTask.heading

The `ProcessingParamSet` table contains two important attributes: 
+ `paramset_idx` - Allows the user to choose the parameter set with which
you want to run image processing.
+ `task_mode` - Can be set to `load` or `trigger`. When set to `load`,
running the processing step initiates a search for existing output files of the image
processing algorithm defined in `ProcessingParamSet`. When set to `trigger`, the
processing step will run image processing on the raw data. 

In [None]:
imaging.ProcessingTask.insert1(
    dict(
        **session_key,
        scan_id=0,
        paramset_idx=0,
        task_mode="load",  # load or trigger
        processing_output_dir="subject1/session1/suite2p",
    )
)

Let's call populate on the `Processing` table, which checks the Suite2p results since `task_mode=load`.

In [None]:
imaging.Processing.populate(session_key, display_progress=True)

Once processing is complete, you can optionally curate the output of Suite2p using the `Curation` table.

In [None]:
imaging.Curation.heading

In [None]:
imaging.Curation.insert1(
    dict(
        **session_key,
        scan_id=0,
        paramset_idx=0,
        curation_id=0,
        curation_time="2021-04-30 12:22:15.032",
        curation_output_dir="subject1/session1/suite2p",
        manual_curation=False,
        curation_note="",
    )
)

Now, we will populate several tables that store the output of image processing, including
`MotionCorrection`, `Segmentation`, `Fluorescence`, and `Activity`.

In [None]:
imaging.MotionCorrection.populate(display_progress=True)
imaging.Segmentation.populate(display_progress=True)
imaging.Fluorescence.populate(display_progress=True)
imaging.Activity.populate(display_progress=True)

## Query

Queries allow you to view the contents of the database.  The simplest query is the instance of the table class.

In [None]:
subject.Subject()

Let's query the contents of the `Mask` part table.

In [None]:
imaging.Segmentation.Mask()

With the `&` operator, we will restrict the contents of the `imaging.Segmentation.Mask` table to the entry where the `mask` attribute is 10.

In [None]:
imaging.Segmentation.Mask & "mask = '10'"

DataJoint queries can be a highly flexible tool with several [operators](https://datajoint.com/docs/core/concepts/query-lang/operators/).  The next operator we will explore is `join` which combines matching information from tables.

First let's view the contents of each table.

In [None]:
imaging.Segmentation.Mask()

In [None]:
imaging.MaskClassification.MaskType()

Let's use the `join` operator to combine matching information in `imaging.Segmentation.Mask` and `imaging.MaskClassification.MaskType`.   The result contains all matching combinations of entities from both arguments.

In [None]:
imaging.Segmentation.Mask * imaging.MaskClassification.MaskType

We can chain these operators together.

In [None]:
imaging.Segmentation.Mask * imaging.MaskClassification.MaskType & "mask = '10'"

## Fetch

The `fetch` and `fetch1` methods download the data from the query object into the workspace.

Below we will run `fetch()` to return all attributes of all entries in the table.

In [None]:
imaging.Fluorescence.Trace.fetch(as_dict=True)

Next, we will fetch the `fluorescence` attribute for `mask=10` with the `fetch1` method by passing the attribute as an argument to the method.

By default, `fetch1()` returns all attributes of one of the entries in the table.  If a query has multiple entries, `fetch1()` imports the first entry in the table.

In [None]:
trace = (imaging.Fluorescence.Trace & "mask = '10'").fetch1("fluorescence")

Let's plot this trace.  First we will fetch the sampling rate of the data to define the x-axis values.

In [None]:
sampling_rate = (scan.ScanInfo & session_key & "scan_id=0").fetch1("fps")

In [None]:
plt.plot(np.r_[: trace.size] * 1 / sampling_rate, trace)
plt.title("Fluorescence trace for mask 10")
plt.xlabel("Time (s)")
plt.ylabel("Activity (a.u.)");

We will fetch and plot the average, motion-corrected image.

In [None]:
average_image = (imaging.MotionCorrection.Summary & session_key & "field_idx=0").fetch1(
    "average_image"
)

In [None]:
plt.imshow(average_image)

We will fetch mask coordinates and overlay these on the average image.

In [None]:
mask_xpix, mask_ypix = (
    imaging.Segmentation.Mask * imaging.MaskClassification.MaskType
    & session_key
    & "mask_center_z=0"
    & "mask_npix > 130"
).fetch("mask_xpix", "mask_ypix")

In [None]:
mask_image = np.zeros(np.shape(average_image), dtype=bool)
for xpix, ypix in zip(mask_xpix, mask_ypix):
    mask_image[ypix, xpix] = True

In [None]:
plt.imshow(average_image)
plt.contour(mask_image, colors="white", linewidths=0.5);

This Element includes an interactive widget to plot the segmentations and traces to visualize the results after processing with Suite2p, CaImAn, or EXTRACT.

First, let's populate the `imaging_report` table with these plots, and then we can visualize the plots with the widget.

In [None]:
imaging_report.ScanLevelReport.populate()
imaging_report.TraceReport.populate()

In [None]:
from element_calcium_imaging.plotting.widget import main

In [None]:
main(imaging)

Congratulations!  You have learned about the DataJoint Element for Calcium Imaging and common DataJoint commands to interact with the pipeline, including insert, populate, query, and fetch.

## Next steps

Follow the steps below to run this pipeline for your experiments:

- Create a fork of this repository to your GitHub account.
- Clone the repository to your local machine and configure for use with the instructions in the [User Guide](https://datajoint.com/docs/elements/user-guide/).
- The DataJoint team offers free [Office Hours](https://datajoint.com/docs/community/support/) to help you setup this pipeline.
- If you have any questions, please reach out at support@datajoint.com.