# Identifying Regions of Interest with Segmentation
The Allen Institute uses [**Suite2P**](http://www.suite2p.org/) for processing 2-Photon Calcium Imaging Data. Specifically, **Suite2P** takes a 2P movie, and outputs information about the putative segmented *Regions of Interest* (ROIs). The most pertinent information outputted are the locations/shapes of the ROIs within the 2P Movie's field-of-view, as well as the fluorescence of each ROI at every frame of the movie. This notebook serves as a simple demonstration of how to input a 2P Movie into Suite2P and produce cell segmentation and fluorescence output.


### Environment Setup
⚠️**Note: If running on a new environment, run this cell once and then restart the kernel**⚠️

In [1]:
try:
    from dandi_utils import dandi_download_open
except:
    !git clone https://github.com/AllenInstitute/openscope_databook.git
    %cd openscope_databook
    %pip install -e .

c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll


In [2]:
import os
import suite2p

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from pynwb import NWBHDF5IO
from tifffile import imsave

%matplotlib inline

### Downloading Ophys NWB Files
Here you can download several files for a subject that you can run nway matching on. The pipeline can take in any number of input *sessions*, however, the input ophys data should be from the same imaging plane of the same subject. To specify your own file to use, set `dandiset_id` to be the dandiset id of the files you're interested in. Also set `input_dandi_filepaths` to be a list of the filepaths on dandi of each file you're interested in providing to **Nway Matching**. When accessing an embargoed dataset, set `dandi_api_key` to be your DANDI API key.

In [3]:
dandiset_id = "000336"
dandi_filepath = "sub-634402/sub-634402_ses-1209063020-acq-1209359211raw_ophys.nwb"
dandi_api_key = os.environ["DANDI_API_KEY"]

In [4]:
io = dandi_download_open(dandiset_id, dandi_filepath, "./", dandi_api_key=dandi_api_key)
# from pynwb import NWBHDF5IO
# io = NWBHDFIO("./sub-661749_ses-1254161187-acq-1254305759raw_ophys.nwb", load_namespaces=True)
nwb = io.read()

File already exists
Opening file


  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."


### Preparing Suite2P Input
Typically, the 2P movie would not come prepared within an NWB file, but would be in some other form. Suite2P is capable of taking input in many forms, enumerated [here](https://suite2p.readthedocs.io/en/latest/inputs.html). For the purposes of this notebook, the movie is extracted from an Allen Institute NWB File and converted to a *tiff* file.

In [5]:
movie = np.array(nwb.acquisition["raw_suite2p_motion_corrected"].data).astype(np.uint16)
movie.shape

(44201, 512, 512)

In [6]:
imsave("movie.tiff", movie)

  imsave("movie.tiff", movie)


In [7]:
results_folder = "./results/"
scratch_folder = "./results/"
input_movie_path = "./movie"
sampling_rate = nwb.acquisition["raw_suite2p_motion_corrected"].rate
suite2p_threshold_scaling = 1

### Running Suite2P
From there, input settings can be specified for Suite2P via the `ops` object. The available settings and their meanings are described [here](https://suite2p.readthedocs.io/en/latest/settings.html) The most pertinent settings are `results_folder`, `input_movie_path`, `data_path`, and `sampling_rate`. Here, the sampling rate is also retrieved from the NWB file, but should probably be known information for your movies.

In [8]:
ops = suite2p.default_ops()
ops['threshold_scaling'] = suite2p_threshold_scaling
ops['fs'] = float(sampling_rate) # sampling rate of recording, determines binning for cell detection
ops['tau'] = 0.7 # timescale of gcamp to use for deconvolution
ops['do_registration'] = 0 # data was already registered
ops['save_NWB'] = 1
ops['save_folder'] = results_folder
ops['fast_disk'] = scratch_folder
ops["save_nwb"] = 1
ops["data_path"] = ["./"]

output_ops = suite2p.run_s2p(ops=ops)

{}


KeyError: 'data_path'

### Suite2P Output
Descriptions of the output of Suite2P can be found [here](https://suite2p.readthedocs.io/en/latest/outputs.html). Below, it is shown how to access each output file. Three files, `F.npy`, `Fneu.npy`, and `spks.npy` are 2D arrays containing various forms of the trace data that have shape # ROIs * # Frames. The settings and intermediate values are stored as a dictionary in `ops.npy`. `iscell.py` stores a table which, for each ROI, contains 1/0 if an ROI is a cell, and the certainty value of the classifier. 
Most importantly, the main statistics of each ROI are included in `stat.npy`. This is processed and shown as a dataframe below. Because the ROI location is stored as `xpix`, and `ypix`, which are arrays of coordinates, the code below produces an *ROI submask* for each ROI. The ROI submasks are more convenient for some purposes.

#### Traces

In [None]:
fluorescence = np.load("./results/plane0/F.npy")
neuropil = np.load("./results/plane0/Fneu.npy")
spikes = np.load("./results/plane0/spks.npy")

print(fluorescence.shape)
print(neuropil.shape)
print(spikes.shape)

In [None]:
plt.plot(fluorescence[0])

#### Options and Intermediate Outputs

In [None]:
ops = np.load("./results/plane0/ops.npy", allow_pickle=True).item()
ops.keys()

#### Is-Cell Array

In [None]:
is_cell = np.load("./results/plane0/iscell.npy")
print(is_cell[:10])

#### ROI Statistics

In [None]:
roi_stats = np.load("./results/plane0/stat.npy", allow_pickle=True)

In [None]:
stats_dict = {stat : [roi[stat] for roi in roi_stats] for stat in roi_stats[0].keys()}

In [None]:
def convert_pix_to_mask(xpix, ypix):
    x_loc = min(xpix)
    y_loc = min(ypix)
    rel_xpix = xpix - x_loc
    rel_ypix = ypix - y_loc
    width = max(xpix) - x_loc + 1
    height = max(ypix) - y_loc + 1
    
    mask = np.zeros((height, width))
    for y,x in zip(rel_ypix, rel_xpix):
        mask[y,x] = 1
    return mask

In [None]:
masks_col = []
for xpix, ypix in zip(stats_dict["xpix"], stats_dict["ypix"]):
    roi_mask = convert_pix_to_mask(xpix, ypix)
    masks_col.append(roi_mask)

# unionize dicts to ensure masks column goes first
stats_dict = {"mask": masks_col} | stats_dict

In [None]:
stats_df = pd.DataFrame(data=stats_dict)
print(stats_df.columns)
stats_df

### Comparing Segmentation Output
Below we can compare multuple views of the Segmentation Output. Firstly is the main output which contains cells and non-cells. Secondly is the filtered version which is filtered to only include the ROIs which are classified as cells. The fields `Lx` and `Ly` from `ops.npy` contain the shape of the movie, and `ypix` and `xpix` contain the pixel coordinates at which each ROI was identified. These are used to generate the image of all ROI masks. The last plot is the segmentation from the original NWB file from the Allen Institute for comparison.

#### Segmentation from Suite2P

In [None]:
im = np.zeros((ops['Ly'], ops['Lx']))
for n in range(len(roi_stats)):
    ypix = roi_stats[n]['ypix']
    xpix = roi_stats[n]['xpix']
    im[ypix,xpix] = 1

plt.imshow(im)

#### Cell Segmentation from Suite2P Filtered

In [None]:
im = np.zeros((ops['Ly'], ops['Lx']))
for i in range(len(roi_stats)):
    if is_cell[i][0] == 1.0:
        ypix = roi_stats[i]['ypix']
        xpix = roi_stats[i]['xpix']
        im[ypix,xpix] = 1

plt.imshow(im)

#### Segmentation From Original NWB

In [None]:
plt.imshow(nwb.processing["ophys"]["images"]["segmentation_mask_image"])

#### Ophys NWB
Finally, since in the input settings, `save_nwb` was set to 1, an NWB file containing all of these was generated as output.

In [None]:
out_io = NWBHDF5IO("./results/ophys.nwb", load_namespaces=True)
out_nwb = out_io.read()

- download NWB
- save raw movie
- determine inputs
- run suite2p pipeline
- parse output
- compare to segmentation in NWB