<h1>Automated Gating with Immunova</h1>

This is the first of three experiments comparing the automated gating in immunova to expert manual gating. For this experiment I will be using pre-gated data from peritoneal dialysis patients.

<h3>Create project and experiments</h3>

In [1]:
from immunova.data.project import Project
from immunova.data.mongo_setup import global_init
from immunova.data.fcs_experiments import FCSExperiment, Panel
from immunova.data.utilities import get_fcs_file_paths
from immunova.flow.readwrite.read_fcs import explore_channel_mappings
from tqdm import tqdm_notebook, tqdm
import pandas as pd
global_init()

In [2]:
pd_project = Project(project_id='Peritonitis', owner='burtonrossj')
pd_project.save()

NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error collection: immunova.projects index: project_id_1 dup key: { project_id: "Peritonitis" })

In [5]:
pd_project.id.__str__()

'5d822c08a5130a969aafe40b'

I will create four experiments in total:
* PBMC_T: T cell panel for PBMC samples
* PBMC_M: Myeloid cell panel for PBMC samples
* PDMC_T: T cell panel for peritoneal fluid samples
* PDMC_M: Myeloid cell panel for peritoneal fluid samples

For each of these experiments I need to associate a flow cytometry panel. A Panel object defines the channel(fluorochrome)/marker(antibody) mappings for all associated flow data. This allows for standardisation of the flow cytometry meta-data at the point of entry.

Panel objects can be created from a python dictionary object or using an excel template. In this case I have created an excel template (see documentation for details on creating panel templates).

It is often useful to explore the channel and marker names of a large selection of fcs files to get a feel for the naming conventions and make sure you have convered all edge cases. There is a useful utility function in `immunova.flow.readwrite.read_fcs` called `explore_channel_mappings`. Given a directory, the function will search for all `.fcs` files and return all permutations of channel/marker pairings found.

In [None]:
cm_permutations = explore_channel_mappings('/media/ross/FCS_DATA/Raya PD Samples/ds_friendly')

In [6]:
len(cm_permutations)

20

So there is 20 permutations for the different ways that markers have been labelled in fcs files. I can account for most cases using regular expression but in a few cases (e.g. live/dead staining) I have added like-for-like matches in the templates.

In [6]:
t_panel = Panel(panel_name='peritonitis_t_panel')
m_panel = Panel(panel_name='peritonitis_m_panel')

In [7]:
t_panel.create_from_excel(path='experiment_data/peritonitis_t_template.xlsx')

True

In [8]:
m_panel.create_from_excel(path='experiment_data/peritonitis_m_template.xlsx')

True

The `create_from_excel` method will populate the Panel object using the excel template. I can now save the panels to the database.

In [9]:
t_panel.save()
m_panel.save()

<Panel: Panel object>

With the panels created I can now create the experiments. When you create an experiment you must always associate it to a project. We therefore use the `add_experiment` method of the Project object.

In [10]:
pbmc_t = pd_project.add_experiment(experiment_id='PBMC_T', panel_name='peritonitis_t_panel')
pdmc_t = pd_project.add_experiment(experiment_id='PDMC_T', panel_name='peritonitis_t_panel')
pbmc_m = pd_project.add_experiment(experiment_id='PBMC_M', panel_name='peritonitis_m_panel')
pdmc_m = pd_project.add_experiment(experiment_id='PDMC_M', panel_name='peritonitis_m_panel')

Experiment created successfully!
Experiment created successfully!
Experiment created successfully!
Experiment created successfully!


Now that the experiments are created I can start adding the fcs files. The `add_new_sample` method is used to generate a new fcs file entry into the mongo database, which is then associated to the experiment. See the documentation below:


In [11]:
?pbmc_t.add_new_sample

[0;31mSignature:[0m
[0mpbmc_t[0m[0;34m.[0m[0madd_new_sample[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0msample_id[0m[0;34m:[0m[0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfile_path[0m[0;34m:[0m[0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcontrols[0m[0;34m:[0m[0mlist[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcomp_matrix[0m[0;34m:[0m[0;34m<[0m[0mbuilt[0m[0;34m-[0m[0;32min[0m [0mfunction[0m [0marray[0m[0;34m>=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcompensate[0m[0;34m:[0m[0mbool[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfeedback[0m[0;34m:[0m[0mbool[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0mstr[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Add a new sample (FileGroup) to this experiment
:param sample_id: primary ID for identification of sample (FileGroup.primary_id)
:param file_path: file path of the primary fcs fi

<h3>Add PDMC files (Myeloid cell panel)</h3>

A summary table of all the samples collected in the peritonitis study can provide us with the sample numbers and the manual gating results.

In [12]:
summary = pd.read_excel('/media/ross/FCS_DATA/Raya PD Samples/ClinicalData_and_ManualGatingResults.xlsx')

In [13]:
pdmc_sample_ids = summary[summary['Cell origin'] == 'PDMC']['Patient no.'].values

In [14]:
pdmc_sample_ids

array(['142-09', '175-09', '209-03', '209-05', '210-12', '210-14',
       '229-02', '237-06', '239-02', '239-04', '251-07', '251-08',
       '254-04', '254-05', '255-04', '255-05', '262-01', '264-02',
       '267-01', '267-02', '272-01', '273-01', '276-01', '279-03',
       '286-02', '286-03', '286-04', '288-02', '289-01', '294-01',
       '294-02', '294-03', '295-01', '298-01', '302-01', '305-01',
       '305-02', '305-03', '306-01', '307-01', '308-01', '308-02R',
       '308-03R', '308-04', '310-01', '315-01', '315-02', '316-01',
       '318-01', '320-01', '321-01', '322-01', '323-01', '323-02',
       '324-01', '326-01'], dtype=object)

We can use the utility functin `get_fcs_file_paths` from immunova's data module to generate file paths for adding samples.

In [15]:
get_fcs_file_paths(fcs_dir='/media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel',
                  control_names=['CD1c', 'HLA-DR'], ctrl_id='FMO')

{'primary': ['/media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel/Peri 142-09R PDMC 1 N Panel_N1_013.fcs'],
 'controls': [{'control_id': 'CD1c',
   'path': '/media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel/Peri 142-09R PDMC 1 N Panel_N2 FMO CD1c_014.fcs'},
  {'control_id': 'HLA-DR',
   'path': '/media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel/Peri 142-09R PDMC 1 N Panel_N2 FMO CD1c_014.fcs'},
  {'control_id': 'CD1c',
   'path': '/media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel/Peri 142-09R PDMC 1 N Panel_N3 FMO HLA-DR_015.fcs'},
  {'control_id': 'HLA-DR',
   'path': '/media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel/Peri 142-09R PDMC 1 N Panel_N3 FMO HLA-DR_015.fcs'}]}

In [16]:
pdmc_m_142_09 = get_fcs_file_paths(fcs_dir='/media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel',
                  control_names=['CD1c', 'HLA-DR'], ctrl_id='FMO')
primary = pdmc_m_142_09['primary'][0]
controls = pdmc_m_142_09['controls']

In [17]:
pdmc_m.add_new_sample(sample_id='pd142_09_m', file_path=primary, controls=controls)

Generating main file entry...
FSC-W,  pair not found!
Column mappings: dict_items([('FSC-A_', ['FSC-A', '']), ('FSC-H_', ['FSC-H', '']), ('SSC-A_', ['SSC-A', '']), ('SSC-H_', ['SSC-H', '']), ('SSC-W_', ['SSC-W', '']), ('Alexa Fluor 488-A_CD14 FITC', ['Alexa Fluor 488-A', 'CD14']), ('PerCP-A_CD16 PerCP-CY5-5', ['PerCP-A', 'CD16']), ('Alexa Fluor 647-A_Siglec8 APC', ['Alexa Fluor 647-A', 'Siglec-8']), ('Alexa Fluor 700-A_CD45', ['Alexa Fluor 700-A', 'CD45']), ('APC-Cy7-A_CD3 APC Fire 750', ['APC-Cy7-A', 'CD3']), ('Alexa Fluor 405-A_CD1c BV421', ['Alexa Fluor 405-A', 'CD1c']), ('AmCyan-A_Dead V500', ['AmCyan-A', 'L/D']), ('BV605-A_CD15', ['BV605-A', 'CD15']), ('BV711-A_HLA-DR', ['BV711-A', 'HLA-DR']), ('PE-A_CD116', ['PE-A', 'CD116']), ('PE-Cy7-A_CD19', ['PE-Cy7-A', 'CD19']), ('Time_', ['Time', ''])])
Error: invalid channel/marker mappings for pd142_09_m, at path /media/ross/FCS_DATA/Raya PD Samples/ds_friendly/PDMC/142-09/m_panel/Peri 142-09R PDMC 1 N Panel_N1_013.fcs, aborting.


In [18]:
from immunova.flow.readwrite.read_fcs import FCSFile

In [19]:
fcs = FCSFile(primary)

In [20]:
fcs.fluoro_mappings

[{'channel': 'FSC-A', 'marker': ''},
 {'channel': 'FSC-H', 'marker': ''},
 {'channel': 'SSC-A', 'marker': ''},
 {'channel': 'SSC-H', 'marker': ''},
 {'channel': 'SSC-W', 'marker': ''},
 {'channel': 'Alexa Fluor 488-A', 'marker': 'CD14 FITC'},
 {'channel': 'PerCP-A', 'marker': 'CD16 PerCP-CY5-5'},
 {'channel': 'Alexa Fluor 647-A', 'marker': 'Siglec8 APC'},
 {'channel': 'Alexa Fluor 700-A', 'marker': 'CD45'},
 {'channel': 'APC-Cy7-A', 'marker': 'CD3 APC Fire 750'},
 {'channel': 'Alexa Fluor 405-A', 'marker': 'CD1c BV421'},
 {'channel': 'AmCyan-A', 'marker': 'Dead V500'},
 {'channel': 'BV605-A', 'marker': 'CD15'},
 {'channel': 'BV711-A', 'marker': 'HLA-DR'},
 {'channel': 'PE-A', 'marker': 'CD116'},
 {'channel': 'PE-Cy7-A', 'marker': 'CD19'},
 {'channel': 'Time', 'marker': ''}]

In [21]:
from flowio import FlowData

In [22]:
fcs = FlowData(primary)

In [26]:
fcs.text

{'beginanalysis': '0',
 'endanalysis': '0',
 'beginstext': '0',
 'endstext': '0',
 'begindata': '4912',
 'enddata': '76555911           ',
 'fil': 'PD142-09R PDMC 1 N Panel_N1_013.fcs',
 'sys': 'Windows 7 6.1',
 'tot': '1125750            ',
 'par': '17',
 'mode': 'L',
 'byteord': '4,3,2,1',
 'datatype': 'F',
 'nextdata': '0',
 'creator': 'BD FACSDiva Software Version 8.0.1',
 'tube name': 'N1',
 'src': 'PD142-09R PDMC 1 N Panel',
 'experiment name': 'PD142-09-P PDMC 1 N Panel 2017-09-27',
 'guid': '587eeb0d-ee43-4a9c-92e1-7ba4996accb0',
 'date': '28-SEP-2017',
 'btim': '09:55:03',
 'etim': '09:58:33',
 'cyt': 'LSRFortessa',
 'settings': 'Cytometer',
 'cytnum': '1',
 'window extension': '10.00',
 'export user name': 'Raya',
 'export time': '28-SEP-2017-10:25:01',
 'op': 'Raya',
 'fsc asf': '0.65',
 'autobs': 'TRUE',
 'inst': ' ',
 'laser1name': 'Blue',
 'laser1delay': '0.00',
 'laser1asf': '0.70',
 'laser2name': 'Red',
 'laser2delay': '123.01',
 'laser2asf': '0.74',
 'laser3name': 'Vio