# Setup

First, we need to initialize the object (a Pertpy package internal dataset), and if the object isn't already preprocessed and/or clustered, perform those steps.

## Imports & Options

In [19]:
%load_ext autoreload
%autoreload 2

import crispr as cr 
from crispr import Crispr
import pertpy as pt
import scanpy as sc
import pandas as pd
import numpy as np

# Options
print("\nAnalysis Functions\n\t" + "\n\t".join(list(pd.Series(
    [np.nan if "__" in x else x for x in dir(cr.ax)]).dropna())))
file = "perturb-seq"
pd.options.display.max_columns = 100

#  Set Arguments
kwargs_init = dict(assay="rna", assay_protein="adt",
                   col_sample_id="replicate",
                   col_gene_symbols="gene_symbol",  
                   col_cell_type="leiden", 
                   col_perturbed="perturbation", 
                   col_guide_rna="NT", 
                   col_num_umis=None,
                   kws_process_guide_rna=False,
                   col_condition="gene_target", 
                   key_control="NT", 
                   key_treatment="Perturbed")
target_gene_idents = ["NT","JAK2","STAT1","IFNGR1","IFNGR2", "IRF1"]
file_path = pt.dt.papalexi_2021()
file_path[kwargs_init["assay"]].layers["counts"] = file_path[
    kwargs_init["assay"]].X.copy()  # X contains counts layer

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Analysis Functions
	analyze_causal_network
	analyze_composition
	analyze_receptor_ligand
	calculate_dea_deseq2
	cluster
	clustering
	communication
	composition
	compute_distance
	find_marker_genes
	perform_augur
	perform_celltypist
	perform_dea
	perform_differential_prioritization
	perform_gsea
	perform_mixscape
	perform_pathway_interference
	perturbations


## Object

This code instantiates the CRISPR object, which is the main way of interacting with this package as an end-user.

This is more code than you would need in real life; it just ensures that certain public datasets are loaded from the source for various reasons.

In [20]:
self = Crispr(file_path, **kwargs_init)



<<< INITIALIZING OMICS CLASS OBJECT >>>

col_gene_symbols="gene_symbol"
col_cell_type="leiden"
col_sample_id="replicate"
col_batch="replicate"
col_subject=None
col_condition="gene_target"
col_num_umis=None
key_control="NT"
key_treatment="Perturbed"


<<< LOADING OBJECT >>>
Traceback (most recent call last):
  File "/home/elizabeth/elizabeth/crispr/crispr/processing/preprocessing.py", line 137, in create_object
    adata.obs_names_make_unique()
  File "/home/elizabeth/elizabeth/miniconda3/envs/py-bio/lib/python3.10/site-packages/mudata/_core/mudata.py", line 974, in obs_names_make_unique
    self._obs.index = obs_names
  File "/home/elizabeth/elizabeth/miniconda3/envs/py-bio/lib/python3.10/site-packages/pandas/core/generic.py", line 6307, in __setattr__
    return object.__setattr__(self, name, value)
  File "properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/home/elizabeth/elizabeth/miniconda3/envs/py-bio/lib/python3.10/site-packages/pandas/core/generi

# Processing

We will just use defaults for the preprocessing arguments (very little filtering, consistent with the Pertpy tutorial for this object).

## Preprocess

In [21]:
self.preprocess(kws_hvg=True, kws_scale=True, regress_out=None)

## Cluster

In [None]:
_ = self.cluster()

# Analyses

The following examples concern CRISPR or other perturbation design-specific analyses.

## Mixscape: Cell-Level Perturbation Classification & Scoring

**Is a perturbed cell detectibly perturbed, and to what extent?** Mixscape first calculates the "**perturbation signature**" by determining which control condition cells most closely resemble each perturbed cell in terms of mRNA expression and then subtracts the control expression from that of the perturbed cells' (i.e., centers perturbed cells' gene expression on their control neighbors).

Then, it **identifies** and removes perturbed **cells with no detectible perturbation** (i.e., assigns them to predicted classes of perturbed versus not perturbed). You can then create visuals based on whether the cell is detectibly perturbed, "non-perturbed" (not detectibly perturbed), or control (no treatment). Optionally, you can visualize protein expression by this predicted class in certain multi-modal data.

**Are there perturbation-specific clusters?** Mixscape uses linear discriminant analysis (LDA) to cluster cells that resemble each other in terms of gene expression and perturbation condition. _(LDA reduces dimensionality and attempts to maximize the separability of classes. Unperturbed cells are removed from analysis.)_ 

<u> __Features__ </u>  

- Plot targeting efficiency.
- Remove confounds (e.g., cell cycle, batch effects)
- Classify cells as affected or unaffected (i.e., "escapees") by the perturbation
- Quantify and visualize degree of perturbation response

<u> __Input__ </u> 

See documentation, but the key arguments are listed here.

* **col_cell_type**: If you want to run using a different cell classification column, (e.g., CellTypist annotations that weren't used for the original `self._columns["col_cell_type"]``), you can specify a different column by passing `col_cell_type=<column name>` if you'd like.
* **target_gene_idents**: A list of gene symbols to focus on in plots/analyses. Specify as True to include all.
* **target_gene_idents**: The default layer of data used is "log1p." Remember that Mixscape centers cells on their control neighbors when considering whether to use centered and/or scaled data.
* **protein_of_interest**: If you have protein expression data (e.g., CITE-seq), you can specify a protein whose expression to plot against the perturbation conditions.

<u> __Output__ </u>  

Assuming your `Crispr` object is named "self":
- Targeting Efficiency: `self.figures["mixscape"]["targeting_efficiency"]`
- Differential Expression Ordered by Posterior Probabilities: `self.figures["mixscape"]["DEX_ordered_by_ppp_heat"]`
- Posterior Probabilities Violin Plot: `self.figures["mixscape"]["ppp_violin"]`
- Perturbation Scores: `self.figures["mixscape"]["perturbation_score"]`
- Perturbation Clusters (from LDA): `self.figures["mixscape"]["perturbation_clusters"]`

The above instructions are for accessing output via the object attributes. Assuming output is assigned to a variable `figs` (i.e., `figs = ` would replace the `_ = ` in the code below), replace `self.figures["mixscape"]` in the above code with `figs`.

<u> __Notes__ </u>  

- If `._columns["col_sample_id"]` is not None, perturbation scores will by default be calculated and/or plotted taking that into consideration (e.g., biological replicates) unless `col_split_by=False`. That argument can also be set to a different column name explicitly, in which case that specification will be used as the `col_split_by` argument in Pertpy Mixscape functions in place of sample ID.

#### Run Mixscape

In [None]:
%%time

target_gene_idents = ["NT","JAK2","STAT1","IFNGR1","IFNGR2", "IRF1"]
_ = self.run_mixscape(target_gene_idents=target_gene_idents,
                      protein_of_interest="PDL1")

#### Create Different Mixscape Plots

If you want to re-create mixscape **plots with <u> different target genes and/or proteins of interest**</u> later, you can use `self.plot_mixscape(<ONE OR MORE TARGET GENES>)`. If you want a different color for the perturbation score curves, specify `color=` in that method.

In [None]:
tgis = pd.Series(self.rna.uns["mixscape"].keys()).sample(1)
_ = self.plot_mixscape(tgis, color="red")