# Tutorial for human gastrula dataset

## Entropy of Mixing functions
Install entropy of mixing package over pip (can of course also be done in command line)

In [2]:
import sys
!{sys.executable} -m pip install --upgrade entropyofmixing

Collecting entropyofmixing
  Downloading entropyofmixing-0.9.4-py3-none-any.whl (4.0 kB)
Installing collected packages: entropyofmixing
  Attempting uninstall: entropyofmixing
    Found existing installation: entropyofmixing 0.9.3
    Uninstalling entropyofmixing-0.9.3:
      Successfully uninstalled entropyofmixing-0.9.3
Successfully installed entropyofmixing-0.9.4


Import the two important entropyofmixing functions

In [6]:
from entropyofmixing import get_background_full, entropy_mixing

## Import human gastrula dataset and KNN matrix

Note that for Anndata object the count matrix is transposed (cells x genes) compared to the Seurat pipeline in R (genes x cells).

In [2]:
import scanpy as sc
import pandas as pd

human_gast_norm = sc.read_csv('/root/host_home/Documents/EntropyOfMixing/Data/norm_elmir_5_30_transposed.csv', delimiter=',')
human_gast_norm = human_gast_norm.transpose()
print(human_gast_norm)

knn_matrix = pd.read_csv('/root/host_home/Documents/EntropyOfMixing/Data/knn_matrix_elmir_5_30.csv', delimiter=',', index_col=0)


AnnData object with n_obs × n_vars = 1195 × 36570


## Entropy of mixing algorithm

**Step 1: Find background genes**

The background genes get calculated and added to the gene metadata in the AnnData object:

In [5]:
import time
import numpy as np

t = time.perf_counter()

human_gast_norm.var["EOM_background"] = get_background_full(human_gast_norm, threshold=1, n_cells=10, n_cells_high=1000)

elapsed_time = time.perf_counter() - t
print("Execution time: " + str(np.round(elapsed_time, 2)) + "s")

#background_genes = norm_adata.var_names[norm_adata.var["EOM_background"]]

Background genes: 8812
Execution time: 0.09s


**Step 2: Calculate entropy of mixing of background genes**

The entropy and related p value for each background gene are added to the gene metadata in the AnnData object:

In [7]:
#human_gast_small = human_gast_norm[:,0:1000]
#human_gast_small = human_gast_small.copy()

t = time.perf_counter()

entropies, p_values = entropy_mixing(human_gast_norm, knn_matrix, n_cores=8, p_value=0.001, odds_ratio=2, approximation=True, local_region=2)

elapsed_time = time.perf_counter() - t
print("\nExecution Time: " + str(np.round(elapsed_time, 2)) + "s")

human_gast_norm.var["EOM_entropy"] = entropies
human_gast_norm.var["EOM_p_value"] = p_values


---- Finished sucessfully! ----

Execution Time: 6.47s


## Entropy of mixing results

We receive an extended AnnData object that contains the entropy of mixing results in its gene metadata:


In [79]:
human_gast_norm.var

Unnamed: 0,EOM_background,EOM_entropy,EOM_p_value
A1BG,False,,
A1BG.AS1,False,,
A1CF,False,,
A2M,True,0.0,1.570408e-07
A2M.AS1,False,,
...,...,...,...
ZXDC,False,,
ZYG11A,False,,
ZYG11B,True,1.0,1.000000e+00
ZYX,True,1.0,1.000000e+00
