# First step of the hematopoiesis application.

---------

## Table of content

[**Data preprocessing**](#datapreprocess)
1. [Import the observations built from single-cell data.](#import)
2. [Standardize the observations' gene names.](#standardobs)
3. [Get interaction graph from DoRothEA database.](#dorothea)

[**Use of BoNesis for selection of components**](#bonesis)

---------

## Data preprocessing <a class="anchor" id="datapreprocess"></a>

### 1. Import the observations built from single-cell data: <a class="anchor" id="import"></a>

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_nesto = pd.read_csv("data/nestorowa_binarizedObservations.csv", sep=",", index_col=[0])

In [3]:
df_nesto

Unnamed: 0,1110008L16Rik,1110059E24Rik,1200007C13Rik,1300017J02Rik,1500005C15Rik,1600014C10Rik,1600020E01Rik,1700006J14Rik,1700017B05Rik,1700024P16Rik,...,Zscan21,Zscan22,Zscan29,Zswim3,Zswim4,Zufsp,Zxdb,Zxdc,Zyx,Zzz3
S1,0,0,,,,0,0,0,0,,...,0,0,0,,,0,,,1,0
S4,0,1,,,,1,0,0,0,,...,0,0,0,,,0,,,1,0
S0,0,1,,,,0,0,0,0,,...,0,0,0,,,0,,,1,0
S2,0,0,,,,1,1,0,0,,...,0,0,0,,,0,,,1,0
S5,0,0,,,,0,0,1,0,,...,0,0,0,,,0,,,1,0
S3,0,0,,,,0,0,0,1,,...,0,0,0,,,0,,,1,1


In [4]:
observations_from_singlecell_nestorowa = df_nesto.to_dict(orient="index")

In [5]:
observations_from_singlecell_nestorowa

{'S1': {'1110008L16Rik': 0,
  '1110059E24Rik': 0,
  '1200007C13Rik': nan,
  '1300017J02Rik': nan,
  '1500005C15Rik': nan,
  '1600014C10Rik': 0,
  '1600020E01Rik': 0,
  '1700006J14Rik': 0,
  '1700017B05Rik': 0,
  '1700024P16Rik': nan,
  '1700028E10Rik': nan,
  '1700029J07Rik': nan,
  '1700030K09Rik': 0,
  '1700065D16Rik': nan,
  '1700066M21Rik': nan,
  '1700096K18Rik': nan,
  '1810010H24Rik': nan,
  '1810011H11Rik': nan,
  '2010300C02Rik': nan,
  '2010315B03Rik': 0,
  '2210016F16Rik': 0,
  '2210016L21Rik': 0,
  '2310057M21Rik': 0,
  '2510009E07Rik': nan,
  '2610008E11Rik': nan,
  '2610021A01Rik': 0,
  '2610035D17Rik': 0,
  '2610044O15Rik8': 0,
  '2610301B20Rik': 0,
  '2610307P16Rik': 0,
  '2700029L08Rik': nan,
  '2810013P06Rik': 0,
  '2810021J22Rik': nan,
  '2810029C07Rik': nan,
  '2810414N06Rik': nan,
  '2810428J06Rik': nan,
  '2810468N07Rik': nan,
  '2810474O19Rik': 0,
  '2900005J15Rik': nan,
  '2900018N21Rik': nan,
  '2900092N22Rik': nan,
  '3110043O21Rik': nan,
  '3632451O06Rik': na

### 2. Standardize the observations' gene names. <a class="anchor" id="standardobs"></a>

We use the file "data/Mus_musculus.gene_info" downloaded on 2022-03-30 from NCBI (mouse gene info).

In [6]:
import gene_name_standardization as gns

In [7]:
standardized_observations = gns.observations_standardization(observations_from_singlecell_nestorowa, "data/Mus_musculus.gene_info")

In [8]:
# Visualisation of the matrix with standardized gene names:
df_standardized = pd.DataFrame.from_dict(standardized_observations, orient="index").fillna('')
df_standardized

Unnamed: 0,PRORP,1110059E24RIK,1200007C13RIK,1300017J02RIK,1500005C15RIK,1600014C10RIK,1600020E01RIK,1700006J14RIK,1700017B05RIK,FYB2,...,ZSCAN21,ZSCAN22,ZSCAN29,ZSWIM3,ZSWIM4,ZUP1,ZXDB,ZXDC,ZYX,ZZZ3
S1,0,0,,,,0,0,0,0,,...,0,0,0,,,0,,,1,0
S4,0,1,,,,1,0,0,0,,...,0,0,0,,,0,,,1,0
S0,0,1,,,,0,0,0,0,,...,0,0,0,,,0,,,1,0
S2,0,0,,,,1,1,0,0,,...,0,0,0,,,0,,,1,0
S5,0,0,,,,0,0,1,0,,...,0,0,0,,,0,,,1,0
S3,0,0,,,,0,0,0,1,,...,0,0,0,,,0,,,1,1


In [9]:
# Genes being in this single-cell dataset:
standardized_genenames_in_singlecell = set(df_standardized)

In [28]:
len(standardized_genenames_in_singlecell)

4768

### 3. Get interaction graph from DoRothEA database. <a class="anchor" id="dorothea"></a>

Interactions of confidence A, B and C were extracted from DoRothEA on 2021-07-07. 
The extraction from DoRothEA and the standardization of the gene names were processed as described in the [tutorial](https://github.com/StephanieChevalier/notebooks_for_bonesis/blob/main/tutorials/Tutorial_for_interaction_graph_preprocessing.ipynb). 
We only kept edges from transcription factors to transcription factors or to genes with expression values in the Nestorowa single-cell dataset (1001 nodes, 2777 edges).

## Use of BoNesis for selection of components <a class="anchor" id="bonesis"></a>

In [None]:
import bonesis

In [None]:
standardized_dorothea = bonesis.InfluenceGraph.from_sif("data/standardized_dorothea_20221129.sif", maxclause=8, allow_skipping_nodes=True, canonic=False)

In [None]:
bo = bonesis.BoNesis(standardized_dorothea, standardized_observations_from_singlecell_nestorowa)

In [None]:
print(f"domain: {len(standardized_dorothea.nodes())} nodes, {len(standardized_dorothea.edges())} edges")

In [None]:
standardized_observations_from_singlecell_nestorowa

### Dynamics
<img src="img/trajectoire.png" alt="nestorowa stream trajectory" style="width:40%;"/>

#### fixpoints

In [None]:
s2 = bo.fixed(~bo.obs("S2"))
s4 = bo.fixed(~bo.obs("S4"))
s5 = bo.fixed(~bo.obs("S5"))
s2 != s4
s5 != s4
s2 != s5;

#### positive reachability

In [None]:
~bo.obs('S1') >= ~bo.obs('S0') >= s2
~bo.obs('S0') >= ~bo.obs('S3') >= s4
~bo.obs('S3') >= s5;

#### negative reachability

In [None]:
~bo.obs("S3") / s2;

### Optimization & view

In [None]:
bo.maximize_nodes()
bo.maximize_strong_constants()

In [None]:
view = bonesis.NonStrongConstantNodesView(bo, mode="opt")

In [None]:
view.standalone(output_filename=f"maxnodes_maxstrongconstant.sh")