# Tutorial for human gastrula dataset

## CIARA functions
Install the ciara_python package over pip (can of course also be done in command line)

In [2]:
import sys
!{sys.executable} -m pip install --upgrade ciara_python

Collecting ciara_python
  Downloading ciara_python-0.9.8-py3-none-any.whl (4.1 kB)
Installing collected packages: ciara-python
  Attempting uninstall: ciara-python
    Found existing installation: ciara-python 0.9.7
    Uninstalling ciara-python-0.9.7:
      Successfully uninstalled ciara-python-0.9.7
Successfully installed ciara-python-0.9.8
You should consider upgrading via the '/opt/python/bin/python3.8 -m pip install --upgrade pip' command.[0m


Import the two important entropyofmixing functions

In [3]:
from ciara_python import get_background_full, ciara

## Import human gastrula dataset and KNN matrix

Note that for Anndata object the count matrix is transposed (cells x genes) compared to the Seurat pipeline in R (genes x cells).

In [4]:
import scanpy as sc
import pandas as pd

human_gast_norm = sc.read_csv('/root/host_home/Documents/CIARA/Data/norm_elmir_5_30_transposed.csv', delimiter=',')
human_gast_norm = human_gast_norm.transpose()
print(human_gast_norm)

knn_matrix = pd.read_csv('/root/host_home/Documents/CIARA/Data/knn_matrix_elmir_5_30.csv', delimiter=',', index_col=0)


AnnData object with n_obs × n_vars = 1195 × 36570


## CIARA algorithm

### Step 1: Find background genes

The background genes get calculated and added to the gene metadata in the AnnData object:

In [7]:
import time
import numpy as np

t = time.perf_counter()

human_gast_norm.var["CIARA_background"] = get_background_full(human_gast_norm, threshold=1, n_cells=3, n_cells_high=20)

elapsed_time = time.perf_counter() - t
print("Execution time: " + str(np.round(elapsed_time, 2)) + "s")

#background_genes = norm_adata.var_names[norm_adata.var["CIARA_background"]]

Background genes: 5057
Execution time: 0.07s


### Step 2: Calculate entropy of mixing of background genes

The p value for each background gene is added to the gene metadata in the AnnData object:

**Runtime (4-core MacBook Pro) per size of genes (no approximation):**
- 1 gene: **0.2s**
- 10 genes: **0.5s**
- 100 genes: **4s**
- 1000 genes: **10s**
- 5057 genes *(this dataset)*: **270s**

In [8]:
#human_gast_small = human_gast_norm[:,0:1000]
#human_gast_small = human_gast_small.copy()

t = time.perf_counter()

human_gast_norm.var["CIARA_p_value"] = ciara(human_gast_norm, knn_matrix, n_cores=4, p_value=0.001, odds_ratio=2, approximation=True, local_region=1)

elapsed_time = time.perf_counter() - t
print("\nExecution Time: " + str(np.round(elapsed_time, 2)) + "s")


---- Finished sucessfully! ----

Execution Time: 62.25s


## Ciara results

We receive an extended AnnData object that contains the results in its gene metadata:


In [9]:
human_gast_norm.var

Unnamed: 0,CIARA_background,CIARA_p_value
A1BG,False,
A1BG.AS1,False,
A1CF,True,2.486022e-07
A2M,False,
A2M.AS1,False,
...,...,...
ZXDC,True,1.000000e+00
ZYG11A,False,
ZYG11B,False,
ZYX,False,
