## importing aligater & the aligater config

AliGater will attempt to detect if you are running an interactive python session (Ipython/jupyter) or started it in a script (terminal mode), and show a corresponding standard error message.

In [1]:
import aligater as ag

AliGater started in Jupyter mode


This mainly switches on and off plotting.

When aligater is imported the config file is run. Located in aligater/aligater/AGConf.py

This contains some settings that are well worth inspecting before going on to batch processing. For loading single files and exploring, the defaults are usually fine.

You can always access and change settings in the AGConf file after aligater as been imported if needed:

In [2]:
ag.AGConfig.execMode = 'terminal'

Its strongly recommended to correctly set the aligater home directory

In [3]:
ag.AGConfig.ag_home

'/media/ludvig/Project_Storage/BloodVariome/aligater/'

Another useful path is the ag_tmp property, this defines aligaters 'scratch space', where it stores intermediate files, downsampled images etc. The space requirements can be rather large with big batch runs. So I suggest setting this to a folder where you have space available

In [4]:
ag.AGConfig.ag_tmp

'/media/ludvig/Project_Storage/aligater_temp/'

## Single file i/o

AliGater file i/o is mainly done in one of two ways, either through the loadFCS function, or batch loading of many files through setting up an AGExperiment object.

By default, the loadFCS function returns the data as a pandas Dataframe

In [5]:
ag.loadFCS(path=ag.AGConfig.ag_home+"tutorial/data/example1.fcs", 
           compensate=True, 
           flourochrome_area_filter=True)

Opening file example1 from folder /tutorial/data
Loaded dataset with 1000000 events.


Unnamed: 0,FSC 488/10-H,FSC 488/10-A,FSC 488/10-W,SSC 488/10-H,SSC 488/10-A,SSC 488/10-W,BB515 CD39-A,PE-Cy7 CD25-A,PE CD127-A,PE-Dazzle 594 CCR6-A,BV650 HLA-DR-A,BV711 CCR7-A,BV786 CXCR5-A,BV421 CXCR3-A,BV605 CD194-A,BV510 CD4-A,Alexa Fluor 700 CD3-A,APC-H7 CD8-A,APC CD45RA-A
0,88768.6144,115694.1312,78764.4416,23343.1808,25840.0768,66093.0560,120.370272,239.641722,58.923444,103.918683,93.028168,141.317305,258.044027,114.663754,398.293056,80.621713,206.555734,152.830524,4.659925
1,45153.5104,67816.0128,99375.5136,167609.8816,191416.7296,71129.4976,200.210499,693.326859,249.931336,295.704729,103.487257,752.449007,370.677672,1679.466896,3827.976214,45.983813,-84.886763,652.231400,53.660000
2,60145.6384,77661.2352,80288.1536,13294.7456,14618.8800,67498.8032,81.190419,132.194142,112.163318,3543.683607,1742.238333,3796.207810,1613.339615,400.131389,800.681072,35.508496,-776.301108,375.489958,195.150474
3,68961.9200,86361.0624,75399.1680,12068.2240,13162.2400,66538.7008,84.735906,161.763261,216.086232,147.602239,-33.096730,1246.019463,147.300612,207.899467,428.570031,505.577913,4579.946665,-108.898032,367.448467
4,131969.7664,208002.1504,96872.0384,214731.1360,214748.0000,83997.4912,275.688918,1305.255277,169.837805,139.485735,219.137415,878.919580,294.435400,1796.582907,4844.724148,484.844886,92.258240,610.634542,154.902509
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999995,89871.8720,113879.0912,77515.9808,17263.6160,18954.9568,67662.6432,163.243621,291.009162,61.342903,243.272295,-64.492626,388.948296,269.014414,335.925523,680.944393,102.200811,-84.231826,700.249344,569.440784
999996,80852.3520,107713.5616,80537.1904,20498.0224,22555.4944,67698.6880,59.597661,291.044370,87.118466,-101.975519,54.008898,561.816350,323.532090,341.294551,566.059552,55.544961,1156.231582,3450.219284,1382.457310
999997,75933.3376,95031.7056,76677.1200,22532.5056,24762.0352,67380.8384,155.831188,252.498242,349.383662,126.954442,-100.255948,336.092223,210.900084,565.794824,586.369531,43.023669,8508.503913,394.399828,127.832572
999998,157514.2144,213773.6960,84456.2432,140626.7904,157082.8032,69477.9904,211.728988,390.228888,315.458599,335.544054,114.022993,332.122552,416.700949,374.146323,771.626994,150.280757,363.049544,75.812150,66.347037


Normally I'd recommend loading it into an aligater.AGSample object, which holds a dataframe internally with extra metadata

In [6]:
sample = ag.loadFCS(path=ag.AGConfig.ag_home+"tutorial/data/example1.fcs", 
                    compensate=True, 
                    flourochrome_area_filter=True, 
                    return_type="agsample")

Opening file example1 from folder /tutorial/data
Loaded dataset with 1000000 events.


In [7]:
type(sample)

aligater.AGClasses.AGsample

You can always access the pandas dataframe by calling the sample object

In [8]:
sample()

Unnamed: 0,FSC 488/10-H,FSC 488/10-A,FSC 488/10-W,SSC 488/10-H,SSC 488/10-A,SSC 488/10-W,BB515 CD39-A,PE-Cy7 CD25-A,PE CD127-A,PE-Dazzle 594 CCR6-A,BV650 HLA-DR-A,BV711 CCR7-A,BV786 CXCR5-A,BV421 CXCR3-A,BV605 CD194-A,BV510 CD4-A,Alexa Fluor 700 CD3-A,APC-H7 CD8-A,APC CD45RA-A
0,88768.6144,115694.1312,78764.4416,23343.1808,25840.0768,66093.0560,120.370272,239.641722,58.923444,103.918683,93.028168,141.317305,258.044027,114.663754,398.293056,80.621713,206.555734,152.830524,4.659925
1,45153.5104,67816.0128,99375.5136,167609.8816,191416.7296,71129.4976,200.210499,693.326859,249.931336,295.704729,103.487257,752.449007,370.677672,1679.466896,3827.976214,45.983813,-84.886763,652.231400,53.660000
2,60145.6384,77661.2352,80288.1536,13294.7456,14618.8800,67498.8032,81.190419,132.194142,112.163318,3543.683607,1742.238333,3796.207810,1613.339615,400.131389,800.681072,35.508496,-776.301108,375.489958,195.150474
3,68961.9200,86361.0624,75399.1680,12068.2240,13162.2400,66538.7008,84.735906,161.763261,216.086232,147.602239,-33.096730,1246.019463,147.300612,207.899467,428.570031,505.577913,4579.946665,-108.898032,367.448467
4,131969.7664,208002.1504,96872.0384,214731.1360,214748.0000,83997.4912,275.688918,1305.255277,169.837805,139.485735,219.137415,878.919580,294.435400,1796.582907,4844.724148,484.844886,92.258240,610.634542,154.902509
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999995,89871.8720,113879.0912,77515.9808,17263.6160,18954.9568,67662.6432,163.243621,291.009162,61.342903,243.272295,-64.492626,388.948296,269.014414,335.925523,680.944393,102.200811,-84.231826,700.249344,569.440784
999996,80852.3520,107713.5616,80537.1904,20498.0224,22555.4944,67698.6880,59.597661,291.044370,87.118466,-101.975519,54.008898,561.816350,323.532090,341.294551,566.059552,55.544961,1156.231582,3450.219284,1382.457310
999997,75933.3376,95031.7056,76677.1200,22532.5056,24762.0352,67380.8384,155.831188,252.498242,349.383662,126.954442,-100.255948,336.092223,210.900084,565.794824,586.369531,43.023669,8508.503913,394.399828,127.832572
999998,157514.2144,213773.6960,84456.2432,140626.7904,157082.8032,69477.9904,211.728988,390.228888,315.458599,335.544054,114.022993,332.122552,416.700949,374.146323,771.626994,150.280757,363.049544,75.812150,66.347037


An aligater sample object will know which file it was loaded from and a shortened version containing two parent folders. 

Folder structure is a common way to sort files into case/control, cohorts etc...

In [9]:
sample.filePath

'/media/ludvig/Project_Storage/BloodVariome/aligater/tutorial/data/example1.fcs'

In [10]:
sample.sample

'tutorial/data/example1'

In the above function two parameters were passed; **compensate** and **fluorochrome_area_filter** which merits some explaination.

The compensate flag tells aligater to apply compensation data available in the fcs metadata - more on that shortly.

The fluorochrome_area_filter is only relevant to certain .fcs files, coming from some flow machines/ways of exporting. As with forward- and sidescatters, each flow channel can be reported with height and width. In most setups these extra channels are not used. The filter will shave them and only keep the -area channel, which is what's typically used in flow gating. 

## Metadata & Compensation

Supplying the metadata flag to the loadFCS function will cause the function to return two things; a metadata python dictionary as well as the AGSample or pandas Dataframe

In [11]:
metadata, sample = ag.loadFCS(path=ag.AGConfig.ag_home+"tutorial/data/example1.fcs", 
                              metadata=True, 
                              compensate=True, 
                              flourochrome_area_filter=True, 
                              return_type="agsample")

Opening file example1 from folder /tutorial/data
Loaded dataset with 1000000 events.


In [12]:
type(metadata)

dict

It's a pretty big dictionary and would be too clunky to show the entire content in this tutorial. Feel free to browse the content yourself. One of the most important parts of the dictionary, however, is the spill matrix:

In [13]:
metadata['$SPILLOVER']

'13,FL02-A,FL08-A,FL12-A,FL14-A,FL15-A,FL16-A,FL17-A,FL19-A,FL20-A,FL21-A,FL22-A,FL23-A,FL25-A,1.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0008,0.0065,0.0002,0.0001,0.0004,0.0012,1.0000,0.0303,0.0088,0.0003,0.0038,0.1847,0.0000,0.0013,0.0000,0.0114,0.4024,0.0003,0.0002,0.0041,1.0000,0.2845,0.0103,0.0021,0.0007,0.0000,0.0418,0.0000,0.0001,0.0001,0.0001,0.0004,0.0391,0.0695,1.0000,0.0439,0.0125,0.0049,0.0000,0.1181,0.0000,0.0015,0.0002,0.0013,0.0000,0.0019,0.0006,0.0115,1.0000,0.2716,0.1033,0.0319,0.1924,0.0009,0.1265,0.0168,0.1011,0.0000,0.0180,0.0000,0.0000,0.0899,1.0000,0.5779,0.0465,0.0013,0.0046,0.9123,0.3237,0.0216,0.0000,0.0100,0.0000,0.0000,0.0029,0.0326,1.0000,0.0294,0.0017,0.0031,0.0071,0.1445,0.0006,0.0001,0.0000,0.0000,0.0000,0.0027,0.0004,0.0004,1.0000,0.0101,0.0935,0.0001,0.0000,0.0000,0.0000,0.0126,0.0187,0.2286,0.5096,0.1328,0.0603,0.0221,1.0000,0.0017,0.0006,0.0001,0.0010,0.0011,0.0001,0.0000,0.0002,0.3372,0.0959,0.0510,0.0010,0.8689,1.0000,0.0007,0.0004,0.

This is the matrix defining how much signal from one laser spills into other channels, and needs to be corrected - i.e. it's used for *compensation*. 

Note that it will not always be called $SPILLOVER, there's some different aliases depending on machine/software used for exporting. 

To make it properly readable you need to reformat it by the number of colors, which is given in the first element.

Below is a somewhat complicated one-liner to push it into a more readable pandas Dataframe

In [14]:
import numpy as np
import pandas as pd

In [15]:
pd.DataFrame(np.array(metadata['$SPILLOVER'].split(',')[13+1:]).reshape(13, 13).astype(float))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0008,0.0065,0.0002,0.0001,0.0004
1,0.0012,1.0,0.0303,0.0088,0.0003,0.0038,0.1847,0.0,0.0013,0.0,0.0114,0.4024,0.0003
2,0.0002,0.0041,1.0,0.2845,0.0103,0.0021,0.0007,0.0,0.0418,0.0,0.0001,0.0001,0.0001
3,0.0004,0.0391,0.0695,1.0,0.0439,0.0125,0.0049,0.0,0.1181,0.0,0.0015,0.0002,0.0013
4,0.0,0.0019,0.0006,0.0115,1.0,0.2716,0.1033,0.0319,0.1924,0.0009,0.1265,0.0168,0.1011
5,0.0,0.018,0.0,0.0,0.0899,1.0,0.5779,0.0465,0.0013,0.0046,0.9123,0.3237,0.0216
6,0.0,0.01,0.0,0.0,0.0029,0.0326,1.0,0.0294,0.0017,0.0031,0.0071,0.1445,0.0006
7,0.0001,0.0,0.0,0.0,0.0027,0.0004,0.0004,1.0,0.0101,0.0935,0.0001,0.0,0.0
8,0.0,0.0126,0.0187,0.2286,0.5096,0.1328,0.0603,0.0221,1.0,0.0017,0.0006,0.0001,0.001
9,0.0011,0.0001,0.0,0.0002,0.3372,0.0959,0.051,0.001,0.8689,1.0,0.0007,0.0004,0.0


**Typically you would only need to inspect/look at this this when there's some compensation issues**

Aligater will report that compensation information is missing if this matrix is equal to the identity matrix.

In that case you might want to compensate your flow data using external compensation information, such as from another sample.

## 'Manual' compensation
Below is such a sample where compensation hasn't been applied for some reason, and the associated shown AliGater warning.

In [16]:
metaDict, fcsDF = ag.loadFCS(ag.AGConfig.ag_home+"tutorial/data/Uncompensated.fcs", compensate=True, metadata=True)

Opening file Uncompensated from folder /tutorial/data
Loaded dataset with 500000 events.


Using the same 'hack' from before we can inspect the compensation data

In [17]:
pd.DataFrame(np.array(metaDict['SPILL'].split(',')[8+1:]).reshape(8, 8).astype(float))

Unnamed: 0,0,1,2,3,4,5,6,7
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In the same run a sample where a correct compensation matrix was present, AliGater lets you use compensation data from that secondary sample.

For single files this is can be achieved like below. There are ways to automate this process for batch runs.

In [18]:
metaDict, fcsDF = ag.loadFCS(ag.AGConfig.ag_home+"tutorial/data/Compensated.fcs",metadata=True)
marker_labels,compensation_matrix = ag.getCompensationMatrix(fcsDF, metaDict)
compensation_matrix

Opening file Compensated from folder /tutorial/data
Loaded dataset with 500000 events.


array([[1.00000000e+00, 1.32450863e-01, 1.02665103e-02, 2.01250917e-03,
        0.00000000e+00, 1.08101399e-02, 0.00000000e+00, 9.71868171e-03],
       [0.00000000e+00, 1.00000000e+00, 1.15291878e-01, 2.48027420e-02,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.18944635e-01],
       [0.00000000e+00, 9.32454529e-03, 1.00000000e+00, 3.97515291e-02,
        1.48551524e-04, 0.00000000e+00, 1.05581201e-04, 4.48574031e-02],
       [2.82618619e-04, 3.41073206e-04, 4.06687202e-02, 1.00000000e+00,
        1.08666204e-03, 8.90738978e-04, 1.41479627e-04, 8.85263402e-01],
       [1.78071188e-03, 6.43905570e-05, 0.00000000e+00, 0.00000000e+00,
        1.00000000e+00, 1.09196710e-01, 0.00000000e+00, 0.00000000e+00],
       [1.88402026e-02, 5.74699525e-03, 7.81401221e-05, 0.00000000e+00,
        7.51734389e-02, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [1.28217849e-02, 3.15879976e-01, 1.92875902e-04, 0.00000000e+00,
        0.00000000e+00, 1.04141284e-03, 1.00000000e+00, 6.

In [19]:
marker_labels

Index(['IgA', 'CD34', 'IgD', 'CD45', 'CD38', 'CD24', 'CD27', 'CD19'], dtype='object')

As seen above, the function getCompensationMatrix will extract the compensation matrix, aswell as the marker labels from the given Dataframe and metadata dictionary

We can supply this compensation ag.loadFCS when we load the uncompensated sample. A confirmation message will be shown if successful

In [20]:
ag.loadFCS(ag.AGConfig.ag_home+"tutorial/data/Uncompensated.fcs",
          compensate=True,
          comp_matrix=compensation_matrix)

Opening file Uncompensated from folder /tutorial/data
Loaded dataset with 500000 events.
External compensation matrix passed, applying
Applied passed compensation matrix


Unnamed: 0,FSC-A,FSC-H,SSC-A,SSC-H,IgA,CD34,IgD,CD45,CD38,CD24,CD27,CD19
0,16.158390,12.6673,18.025036,14.8789,0.041989,0.070435,0.074535,0.051577,0.046093,0.055830,0.005692,0.007555
1,15.402430,12.6671,16.398663,14.2751,0.049393,0.047140,0.093299,0.230910,0.041995,0.102625,0.003553,-0.012576
2,13.797511,10.7581,19.310524,15.5385,0.062510,0.057108,0.069877,0.141843,0.047633,0.097556,0.005288,0.008696
3,20.677742,15.9966,15.153702,11.8430,0.053194,0.009454,0.053682,0.364596,0.063264,0.059631,0.307893,-0.066244
4,8.752479,7.5609,3.602321,3.2077,0.012867,0.008182,0.015288,0.101952,0.300594,0.004831,-0.000988,-0.001819
...,...,...,...,...,...,...,...,...,...,...,...,...
499995,18.213146,14.6053,16.949608,13.7509,0.049350,0.033308,0.070916,0.126927,0.047641,0.054668,0.005007,-0.016638
499996,13.624652,10.3334,10.685856,8.6930,0.024250,0.005208,0.021362,0.056140,0.039952,0.048447,0.008228,0.010351
499997,11.418878,10.3168,2.538981,2.3776,0.004886,-0.021754,-0.002619,0.524873,0.001893,0.006141,0.207710,0.012792
499998,11.590866,10.6740,2.494267,2.3264,0.010679,-0.001599,-0.001570,0.758849,0.346149,0.004997,0.586761,-0.029186
