## subset Oracle objects for in silico KO perturbation

- last updated: 04/17/2024
- author: Yang-Joon Kim


### Goals
- take and subset the Oracle objects for in silico KO simulation for a subset of population (for example, we can subset NMP trajectories as in Zebrahub) to focus on the genes whose KO effect change over developmental stages. (Followed by the GRN examination over timepoints).

In [1]:
import copy
import glob
import time
import os
import shutil
import sys

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
from tqdm.auto import tqdm

## 0.2. Import our library

In [2]:
import celloracle as co
from celloracle.applications import Pseudotime_calculator
co.__version__

  def twobit_to_dna(twobit: int, size: int) -> str:
  def dna_to_twobit(dna: str) -> int:
  def twobit_1hamming(twobit: int, size: int) -> List[int]:
INFO:matplotlib.font_manager:Failed to extract font properties from /usr/share/fonts/google-noto-emoji/NotoColorEmoji.ttf: In FT2Font: Can not load face (unknown file format; error code 0x2)


'0.14.0'

## 0.3. Plotting parameter setting

In [3]:
#plt.rcParams["font.family"] = "arial"
plt.rcParams["figure.figsize"] = [5,5]
%config InlineBackend.figure_format = 'retina'
plt.rcParams["savefig.dpi"] = 300

%matplotlib inline

# 1. Load data

- If you have `Oracle` object, please run **1.1.[Option1] Load oracle data.**

- If you have not made an `Oracle` object yet and want to calculate pseudotime using `Anndata` object, please run **1.2.[Option2] Load anndata.** 

In this notebook, we will load demo `Oracle` object and add pseudotime information to it.

## 1.1. [Option1] Load oracle data

In [7]:
# # Load demo scRNA-seq data.
# oracle = co.data.load_tutorial_oracle_object()

# # Instantiate pseudotime object using oracle object.
# pt = Pseudotime_calculator(oracle_object=oracle)

Data not found in the local folder. Loading data from github. Data will be saved at /home/yang-joon.kim/celloracle_data/tutorial_data


  0%|          | 0.00/77.7M [00:00<?, ?B/s]

In [4]:
# Load the TDR118 oracle data
oracle = co.load_hdf5("/hpc/projects/data.science/yangjoon.kim/zebrahub_multiome/data/processed_data/TDR119_cicero_output/06_TDR119.celloracle.oracle")
oracle

Oracle object

Meta data
    celloracle version used for instantiation: 0.14.0
    n_cells: 13022
    n_genes: 3000
    cluster_name: global_annotation
    dimensional_reduction_name: X_umap.atac
    n_target_genes_in_TFdict: 12674 genes
    n_regulatory_in_TFdict: 872 genes
    n_regulatory_in_both_TFdict_and_scRNA-seq: 318 genes
    n_target_genes_both_TFdict_and_scRNA-seq: 1637 genes
    k_for_knn_imputation: 325
Status
    Gene expression matrix: Ready
    BaseGRN: Ready
    PCA calculation: Done
    Knn imputation: Done
    GRN calculation for simulation: Not finished