# Overview

This notebook demonstrates how to scan for TF binding motifs. The base GRN will be generated by combining the ATAC-seq peaks and motif information.

### Notebook file
Notebook file is available on CellOracle's GitHub page.
https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/02_motif_scan/02_atac_peaks_to_TFinfo_with_celloracle_20200801.ipynb


# 0. Import libraries

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


import seaborn as sns

import os, sys, shutil, importlib, glob
from tqdm.notebook import tqdm

In [4]:
import celloracle as co
from celloracle import motif_analysis as ma
from celloracle.utility import save_as_pickled_object
co.__version__

  def twobit_to_dna(twobit: int, size: int) -> str:
  def dna_to_twobit(dna: str) -> int:
  def twobit_1hamming(twobit: int, size: int) -> List[int]:


'0.14.0'

In [5]:
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

plt.rcParams['figure.figsize'] = (15,7)
plt.rcParams["savefig.dpi"] = 600

# 1. Rerefence genome data preparation
## 1.1. Check reference genome installation

Before starting the analysis, we need to make sure the reference genome data is installed with `genomepy`. If not, please install the correct reference genome using the instructions below.

`genomepy` installs reference genome data under home directory. But if you have installed or want to install reference genome to another specific location, please specify the place using `genomes_dir` argument.

In [8]:
# PLEASE make sure reference genome is correct.
ref_genome = "mm10"

genome_installation = ma.is_genome_installed(ref_genome=ref_genome,
                                             genomes_dir=None)
print(ref_genome, "installation: ", genome_installation)

mm10 installation:  True


## 1.2. Install reference genome (if refgenome is not installed)

Before installing the reference genome, please check the refenome is in the celloracle supported reference genome list. 
You can check supported reference genome using `ma.SUPPORTED_REF_GENOME`

If your reference genome is not in the list, please send a request to us through CellOracle GitHub issue page.

In [6]:
ma.SUPPORTED_REF_GENOME

Unnamed: 0,species,ref_genome,provider
0,Human,hg38,UCSC
1,Human,hg19,UCSC
2,Mouse,mm39,UCSC
3,Mouse,mm10,UCSC
4,Mouse,mm9,UCSC
5,S.cerevisiae,sacCer2,UCSC
6,S.cerevisiae,sacCer3,UCSC
7,Zebrafish,danRer7,UCSC
8,Zebrafish,danRer10,UCSC
9,Zebrafish,danRer11,UCSC


In [9]:
if not genome_installation:
    import genomepy
    genomepy.install_genome(name=ref_genome, provider="UCSC", genomes_dir=None)
else:
    print(ref_genome, "is installed.")

mm10 is installed.


In [10]:
os.getcwd()

'/run/user/1001/gvfs/smb-share:server=tierra,share=sc/LAB_RB/LAB/Alvaro/Bioinformatics/Analysis/scAGM_Embryos/Notebooks/CellOracle/scATAC_data'

In [9]:
ATAC_peaks = pd.read_csv("../../../celloracle/ATAC_data/GSE174591_E9_Ctrl_ATAC_peaks.bed", sep="\t", header=None)
ATAC_peaks

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,chr1,3361054,3361163,E9_Ctrl_peak_1,21,.,3.04494,4.04380,2.11738,104
1,chr1,3670577,3672386,E9_Ctrl_peak_2,1972,.,13.83824,200.71835,197.29630,1195
2,chr1,3994834,3995006,E9_Ctrl_peak_3,155,.,7.47394,17.86846,15.56998,116
3,chr1,4142468,4142777,E9_Ctrl_peak_4,107,.,6.08987,12.95787,10.74613,212
4,chr1,4228321,4228579,E9_Ctrl_peak_5,34,.,3.59856,5.41878,3.42216,40
...,...,...,...,...,...,...,...,...,...,...
99293,chrX,169936599,169937653,E9_Ctrl_peak_99387,236,.,8.22457,26.06754,23.65968,768
99294,chrY,809067,809441,E9_Ctrl_peak_99392,635,.,18.26962,66.32692,63.58292,111
99295,chrY,1010223,1010644,E9_Ctrl_peak_99393,207,.,8.85800,23.17297,20.79975,102
99296,chrY,1163164,1163328,E9_Ctrl_peak_99394,64,.,4.70581,8.52811,6.42195,122


In [13]:
ATAC_bulk = pd.read_csv("../../../celloracle/ATAC_data/GSE174591_E9_Ctrl_ATAC.bw", sep="\t", header=None, encoding='latin1')
ATAC_bulk

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1: invalid start byte

# 2. Load data


## 2.0. Input data format

In this notebook, we explain how to make a base GRN.

Please look at the previous steps for details on preprocessing data for a base GRN

https://morris-lab.github.io/CellOracle.documentation/tutorials/base_grn.html#step1-scatac-seq-analysis-with-cicero



The scATAC-seq file needs to be converted in a csv file three columns:
- The first column is index.
- The second column is peak_id.
- The third column is gene_short_name.


A correctly formatted file will look like this:



<img src="https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/processed_peak_csv.png">


We will load the .csv file as a `pandas.DataFrame` using pd.read_csv().


<img src="https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/processed_peak_df.png">




In [10]:
os.getcwd()

'/run/user/1001/gvfs/smb-share:server=tierra,share=sc/LAB_RB/LAB/Alvaro/Bioinformatics/Analysis/scAGM_Embryos/Notebooks/CellOracle'

In [21]:
file_path =  "../../celloracle/ATAC_data/"

peaks = pd.read_csv("".join([file_path, "processed_peak_file_GSE174591_E9_Ctrl_ATAC.csv"]), sep=",", header=0, index_col=0)

peaks

Unnamed: 0,peak_id,gene_short_name
0,chr17_5491289_5491763,Zdhhc14
1,chr17_5492301_5493509,Zdhhc14
2,chr1_177622779_177622880,Opn3
3,chr1_177623345_177624034,Opn3
4,chr7_108986028_108986272,Inppl1
...,...,...
3227,chr2_179711924_179712067,4921531C22Rik
3228,chr16_35363369_35364173,Sec22a
3229,chr16_55973316_55974920,Zbtb11os1
3230,chr18_34380026_34380349,Apc


### Load Zhu et al Dataset

In [14]:
file_path =  "../../../celloracle/ATAC_Zhu/"

peaks = pd.read_csv("".join([file_path, "processed_peak_file_Zhu_et_al.csv"]), sep=",", header=0, index_col=0)

peaks

Unnamed: 0,peak_id,gene_short_name
0,chr10_100015425_100016651,Kitl
1,chr10_100486568_100487889,Tmtc3
2,chr10_100588506_100589498,4930430F08Rik
3,chr10_100741132_100741585,Gm35722
4,chr10_100741989_100742521,Gm35722
...,...,...
18051,chrX_99975073_99976635,Eda
18052,chrY_1010050_1010791,Eif2s3y
18053,chrY_1244887_1246076,Uty
18054,chrY_1285830_1286760,Ddx3y


## 2.0. Download demo data

You can download the demo file by running the following command.

Note: If the file download fails, please manually download and unzip the data.

https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/processed_peak_file.csv


In [7]:
# Download file. 
!wget https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/processed_peak_file.csv
    
# If you are using macOS, please try the following command.
#!curl -O https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/processed_peak_file.csv


--2023-07-08 17:00:17--  https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/processed_peak_file.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 569448 (556K) [text/plain]
Saving to: ‘processed_peak_file.csv’


2023-07-08 17:00:17 (38.0 MB/s) - ‘processed_peak_file.csv’ saved [569448/569448]



## 2.1. Load processed peak data 

In [23]:
# Load annotated peak data.
peaks = pd.read_csv("processed_peak_file.csv", index_col=0)
peaks.head()

FileNotFoundError: [Errno 2] No such file or directory: 'processed_peak_file.csv'

## 2.1. Check data format

Here, the function below will check peak data format, including chromosome name and peak length.

In [15]:
peaks = ma.check_peak_format(peaks, ref_genome, genomes_dir=None)

Peaks before filtering:  18056
Peaks with invalid chr_name:  0
Peaks with invalid length:  0
Peaks after filtering:  18056


## 2.2. [Optional step] Load custom motifs

You can chose to use either a custom TF binding reference or CellOracle’s default motifs during the motif analysis. If you would like to use our default motifs, you can continue to the next step without loading any additional data.


If you would like to use a custom motif dataset, please choose one of the following options.

- Motifs provided by gimmemotifs
 >Gimmemotifs is a python package for motif analysis. It provides many motif dataset. https://gimmemotifs.readthedocs.io/en/master/overview.html#motif-databases
 > 
 > Please use this notebook to learn how to load motif data from gimmemotifs database. 
 > https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/02_motif_scan/motif_data_preparation/01_How_to_load_gimmemotifs_motif_data.ipynb

- Custom motifs provided by CellOracle.
 
 >CellOracle also provides many motif datasets generated from CisBP. http://cisbp.ccbr.utoronto.ca/
 >
 >Please look at this notebook to learn how to load the CisBP motifs.https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/02_motif_scan/motif_data_preparation/02_How_to_load_CisBPv2_motif_data.ipynb


- Make your own custom motif data.
 >You can create custom motif data by yourself.
 >
 >Please look at this notebook to learn how to create your custom motif dataset.https://github.com/morris-lab/CellOracle/blob/master/docs/notebooks/02_motif_scan/motif_data_preparation/03_How_to_make_custom_motif.ipynb


# 3. Instantiate TFinfo object and search for TF binding motifs
The motif analysis module has a custom class, `TFinfo`. 
The TFinfo objectexecutes the steps below.

- Converts a peak data into a DNA sequences.
- Scans the DNA sequences searching for TF binding motifs.
- Post-processes the motif scan results.
- Converts data into appropriate format. You can convert data into base-GRN. The base GRN data can be formatted as either a python dictionary or pandas dataframe. This output will be the final base GRN used in the GRN model construction step.

## 3.1. Instantiate TFinfo object
If your reference genome file are installed in non-default location, please speficy the location using `genomes_dir`.

In [16]:
# Instantiate TFinfo object
tfi = ma.TFinfo(peak_data_frame=peaks, 
                ref_genome=ref_genome,
                genomes_dir=None) 

'mm10'

## 3.2. Motif scan


You can specify the TF binding motif data as follows. 

`tfi.scan(motifs=motifs)`

If you do not specify the motifs or set motifs to `None`, the default motifs will be loaded automatically.

- For mouse and human, "gimme.vertebrate.v5.0." will be used as the default motifs. 

- For another species, the species-specific TF binding motif data extracted from CisBP ver2.0 will be used.



**If your jupyter notebook kernel is killed during the motif scan process, please see the link below.**

https://morris-lab.github.io/CellOracle.documentation/installation/python_step_by_step_installation.html#install-gimmemotifs-with-conda

In [17]:
%%time

file_path =  "../../../celloracle/ATAC_Zhu/"


# Scan motifs. !!CAUTION!! This step may take several hours if you have many peaks!
tfi.scan(fpr=0.02, 
         motifs=None,  # If you enter None, default motifs will be loaded.
         verbose=True)

# Save tfinfo object
# tfi.to_hdf5(file_path="".join([file_path, "Zhu_et_al_ATAC.celloracle.tfinfo"]))

No motif data entered. Loading default motifs for your species ...
 Default motif for vertebrate: gimme.vertebrate.v5.0. 
 For more information, please see https://gimmemotifs.readthedocs.io/en/master/overview.html 

Initiating scanner... 

Calculating FPR-based threshold. This step may take substantial time when you load a new ref-genome. It will be done quicker on the second time. 

Motif scan started .. It may take long time.



scanning:   0%|          | 0/16365 [00:00<?, ? sequences/s]

OSError: [Errno 95] Driver write request failed (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x2bd8f5920, total write size = 1012, bytes this sub-write = 1012, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
OSError: [Errno 95] Driver write request failed (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x2bd8f5920, total write size = 1012, bytes this sub-write = 1012, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x86487e0, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x86487e0, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x6eedf50, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x6eedf50, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x81fd750, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x81fd750, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x8836780, total write size = 21, bytes this sub-write = 21, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x8836780, total write size = 21, bytes this sub-write = 21, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x1129a540, total write size = 34, bytes this sub-write = 34, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x1129a540, total write size = 34, bytes this sub-write = 34, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x83dddc0, total write size = 21, bytes this sub-write = 21, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x83dddc0, total write size = 21, bytes this sub-write = 21, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x2b93eabf0, total write size = 49, bytes this sub-write = 49, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x2b93eabf0, total write size = 49, bytes this sub-write = 49, bytes actually written = 18446744073709551615, offset = 0)


RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x81fd750, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)

Exception ignored in: 'h5py._objects.ObjectID.__dealloc__'
Traceback (most recent call last):
  File "h5py/_objects.pyx", line 201, in h5py._objects.ObjectID.__dealloc__
RuntimeError: Can't decrement id ref count (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x81fd750, total write size = 20, bytes this sub-write = 20, bytes actually written = 18446744073709551615, offset = 0)


OSError: [Errno 95] Driver write request failed (file write failed: time = Thu Feb 15 11:47:20 2024
, filename = '../../../celloracle/ATAC_Zhu/Zhu_et_al_ATAC.celloracle.tfinfo', file descriptor = 76, errno = 95, error message = 'Operation not supported', buf = 0x11d65790, total write size = 1035, bytes this sub-write = 1035, bytes actually written = 18446744073709551615, offset = 0)

In [11]:
os.getcwd()

'/run/user/1001/gvfs/smb-share:server=tierra,share=sc/LAB_RB/LAB/Alvaro/Bioinformatics/Analysis/scAGM_Embryos/Notebooks/CellOracle/scATAC_data'

In [26]:
# file_path =  "../../../celloracle/ATAC_Zhu/"

file_path = "/home/aregano/PhD/"

# Save tfinfo object
tfi.to_hdf5(file_path="".join([file_path, "Zhu_et_al_ATAC.celloracle.tfinfo"]))

# Then move onto the Tierra folder

In [14]:
file_path =  "../../../celloracle/ATAC_Zhu/"

# Load tfinfo object

tfi = co.load_hdf5(file_path="".join([file_path, "Zhu_et_al_ATAC.celloracle.tfinfo"]))

In [15]:
# Check motif scan results
tfi.scanned_df.head()

Unnamed: 0,seqname,motif_id,factors_direct,factors_indirect,score,pos,strand
0,chr10_100015425_100016651,GM.5.0.Homeodomain.0001,TGIF1,"ENSG00000234254, TGIF1",10.288487,869,1
1,chr10_100015425_100016651,GM.5.0.Mixed.0001,,"SRF, EGR1",7.904716,347,1
2,chr10_100015425_100016651,GM.5.0.Mixed.0001,,"SRF, EGR1",7.301316,777,-1
3,chr10_100015425_100016651,GM.5.0.Mixed.0001,,"SRF, EGR1",7.257605,677,-1
4,chr10_100015425_100016651,GM.5.0.Mixed.0001,,"SRF, EGR1",6.936637,1077,1


We have the score for each sequence and motif_id pair.
In the next step we will filter the motifs with low scores.

# 4. Filtering motifs

In [16]:
# Reset filtering 
tfi.reset_filtering()

# Do filtering
tfi.filter_motifs_by_score(threshold=10)

# Format post-filtering results.
tfi.make_TFinfo_dataframe_and_dictionary(verbose=True)



Filtering finished: 11017764 -> 2124202
1. Converting scanned results into one-hot encoded dataframe.


  0%|          | 0/16346 [00:00<?, ?it/s]

2. Converting results into dictionaries.


  0%|          | 0/16077 [00:00<?, ?it/s]

  0%|          | 0/1094 [00:00<?, ?it/s]

# 5. Get final base GRN

## 5.1. Get results as a dataframe

In [17]:
df = tfi.to_dataframe()
df.head()

Unnamed: 0,peak_id,gene_short_name,9430076c15rik,Ac002126.6,Ac012531.1,Ac226150.2,Afp,Ahr,Ahrr,Aire,...,Znf784,Znf8,Znf816,Znf85,Zscan10,Zscan16,Zscan22,Zscan26,Zscan31,Zscan4
0,chr10_100015425_100016651,Kitl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,chr10_100486568_100487889,Tmtc3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,chr10_100588506_100589498,4930430F08Rik,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,chr10_100741132_100741585,Gm35722,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,chr10_100741989_100742521,Gm35722,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# 6. Save results
We will use this information when constructing the GRN models later. Save the results.

In [18]:
# Save result as a dataframe


# file_path =  "../../celloracle/ATAC_data/"

# df = tfi.to_dataframe()
# df.to_parquet("".join([file_path, "GSE174591_E9_Ctrl_ATAC_GRN_dataframe.parquet"]))


# Save Zhu et al results

file_path =  "../../../celloracle/ATAC_Zhu/"

df = tfi.to_dataframe()
df.to_parquet("".join([file_path, "Zhu_et_al_ATAC_GRN_dataframe.parquet"]))


**We will use this base GRN data in the GRN construction section.**

https://morris-lab.github.io/CellOracle.documentation/tutorials/networkanalysis.html