# Prepare your data

This tutorial guides you from a FCS file to the creation of a `Scyan` object.

## 1. Creation of your `AnnData` object (cytometry data)

Create an `AnnData` object containing your cytometry data. Consider reading the [anndata documention](https://anndata.readthedocs.io/en/latest/) if you have never heard about `anndata` before.

### Loading a `FCS` file

You probably have `fcs` files that you want to load. For this, you can simply use [`scyan.read_fcs`](../../api/read_fcs). Make sure you have already [installed scyan](../../getting_started).

In [1]:
import scyan

Global seed set to 0


In [2]:
adata = scyan.read_fcs("<path-to-fcs>.fcs")

You should have something like the following. On this example, we have $N = 52 981$ cells and $M = 38$ markers.
Make sure your markers names looks good, or consider reading [`scyan.read_fcs`](../../api/read_fcs) for more advanced usage.

If you have multiple `FCS`, consider [concatenating your data](https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.concatenate.html#anndata.AnnData.concatenate).

In [3]:
print(adata)
print(f"\nThe markers names are: {', '.join(adata.var_names)}")

AnnData object with n_obs × n_vars = 52981 × 38
    obs: 'FSC-A', ..., 'Time'

The markers names are: CD8, CD4, ...


### Preprocess your data

Choose either the `asinh` or `logicle` transformation below, and scale your data.

In [4]:
# If you choose the logicle transform (recommended)
scyan.preprocess.auto_logicle_transform(adata)

### If you choose the Asinh transform
#scyan.preprocess.asinh_transform(adata)

scyan.preprocess.scale(adata) # To standardise your data

## 2. Creation of the knowledge table
The knowledge table, or marker-population table, contains well-known marker expressions per population. For instance, if you want `Scyan` to annotate CD4 T cells, you have to tell which markers CD4 T cells are supposed to express or not. Typically, depending on your panel, you may have CD4+, CD8-, CD45+, CD3+, etc. Values inside the table can be:

- `-1` for negative expressions.
- `1` for positive expressions.
- Some float values such as `0` or `0.5` for mid and low expressions respectively (use it only when necessary).
- `NA` when you don't know or if it is not applicable.

We recommend the `csv` format for this table. You can either directly create a `csv`, or use Excel and export the table as `csv`.

You can then import the `csv` to make a pandas `DataFrame`.

### Example

In [5]:
import pandas as pd

In [6]:
marker_pop_matrix = pd.read_csv("<path-to-csv>.csv", index_col=0)

In [7]:
marker_pop_matrix.head() # Display the first 5 rows of the table

Unnamed: 0_level_0,CD19,CD4,CD8,CD34,CD20,CD45,CD123,CD11c,CD7,CD16,CD38,CD3,HLA-DR,CD64
Populations,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Basophils,-1,,-1.0,-1,-1.0,,1,-1,-1.0,-1.0,,-1,-1.0,-1.0
CD4 T cells,-1,1.0,-1.0,-1,-1.0,,-1,-1,,-1.0,,1,-1.0,-1.0
CD8 T cells,-1,-1.0,1.0,-1,-1.0,,-1,-1,1.0,-1.0,,1,-1.0,-1.0
CD16- NK cells,-1,,,-1,-1.0,,-1,-1,1.0,-1.0,,-1,-1.0,-1.0
CD16+ NK cells,-1,,,-1,,,-1,-1,1.0,1.0,,-1,-1.0,-1.0


You can see our [advice](../../advanced/advice) when creating this table.

Also, make sure your columns names correspond to marker names in `adata.var_names`.

## 3. Creation of the `Scyan` model

In [8]:
model = scyan.Scyan(adata, marker_pop_matrix)

INFO:scyan.model:Initialized Scyan model with N=52981 cells, P=29 populations and M=38 markers. No covariate provided.


Congratulations! You can now follow our tutorial on [model training and visualisation](../usage).

## 4. (Optional) Save your data for later use

You can use [scyan.data.add](../../api/add) to save your data.

In [9]:
scyan.data.add("your-project-name", adata, marker_pop_matrix)

INFO:scyan.data.datasets:Creating new dataset folder at /.../your_project_name
INFO:scyan.data.datasets:Created file /.../your_project_name/default.h5ad
INFO:scyan.data.datasets:Created file /.../your_project_name/default.csv


You can now simply load it with [scyan.data.load](../../api/load).

In [10]:
adata, marker_pop_matrix = scyan.data.load("your-project-name")