# PitViper Report

In [2]:
# Load necessary libraries
import sys
import os

# Load PitViper functions
modules_path = ['workflow/notebooks/', "../../../workflow/notebooks/"]
for module in modules_path:
    module_path = os.path.abspath(os.path.join(module))
    if module_path not in sys.path:
        sys.path.append(module_path)

from functions_pitviper_report import * 

# Change working directory
working_directory_update(snakemake.output[0])

# Initialize token
token = snakemake.params

HTML('''<head><script src="https://ajax.googleapis.com/ajax/libs/jquery/3.7.0/jquery.min.js"></script></head>
        <script> code_show=true;  
            function code_toggle() {  if (code_show){  $('div.input').hide(); $('div.jp-Cell-inputWrapper').hide(); }
                                      else {  $('div.input').show(); $('div.jp-Cell-inputWrapper').show(); }
                     code_show = !code_show }  $( document ).ready(code_toggle); </script> 
        <form action="javascript:code_toggle()"><input type="submit" value="Toggle Code"></form>''')


R[write to console]: snapshotDate(): 2021-10-19



NameError: name 'snakemake' is not defined

In [None]:
md(f"""This notebook was generated automatically by PitViper.

## Summary

This report contains the results of the analysis of the data contained in the folder `results/{token}`.

Graphs are generated using the python library [Altair](https://altair-viz.github.io/index.html). Figures can be downloaded in SVG format from the drop-down menu at the top right of each graphic.""")

## Import results

In [None]:
results_directory, tools_available = import_results(token)

## Download config YAML

In [None]:
download_config(token)

## Download raw data:

User can download raw data by clicking the button below. Data are embedded in the notebook and can be downloaded after the notebook is exported to HTML.

In [None]:
download_raw_counts(token)

## Download normalized data:

User can download normalized data by clicking the button below. Data are embedded in the notebook and can be downloaded after the notebook is exported to HTML.

In [None]:
download_normalized_counts(token)

## Mapping Quality Control

If available, mapping quality control metrics will be shown by `show_mapping_qc` function.

In [None]:
show_mapping_qc(token)

## Read count distribution

Normalized read count distribution for all replicates is shown by calling `show_read_count_distribution` function.

In [None]:
alt.data_transformers.disable_max_rows()

show_read_count_distribution(token)

## Principal component analysis

PCA projection of normalized read counts from all replicates is shown using `pca_counts` function.

In [None]:
pca_counts(token)

## Global results

- MAGeCk MLE: 

> The **beta score** describes how the gene is selected: a positive beta score indicates a positive selection, and a negative beta score indicates a negative selection. [source](https://www.bioconductor.org/packages/release/bioc/vignettes/MAGeCKFlute/inst/doc/MAGeCKFlute.html)

- MAGeCK RRA:

> lfc:  **Gene log fold changes** (LFC) from sgRNA LFCs. Median by default. [source](https://sourceforge.net/p/mageck/wiki/Home/)

- BAGEL:

> BF: evaluates the **likelihood** that the observed fold changes for gRNA targeting the gene were drawn from either the essential or the nonessential training distributions. [source](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1015-8)

- CRISPhieRmix:

> locfdr: a mixture deconvolution approach to estimate **local false discovery rates**. [source](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1538-6)

- SSREA:

> NES: **normalized enrichment score** (NES) is the primary statistic for examining gene set enrichment results. By normalizing the enrichment score, GSEA accounts for differences in gene set size and in correlations between gene sets and the expression dataset; therefore, the normalized enrichment scores (NES) can be used to compare analysis results across gene sets. In this context, genesets are replaced by lists of sgRNAs targeting the same element. [source](https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html)

- Directional Scoring Method:

> score: sum of log2FoldChange*(-log10(padj)) of sgRNAs targeting the same gene and passing individual thresholds as described in the PitViper article.

In [None]:
tool_results(results_directory, tools_available, token)

## sgRNA read counts - Heatmap

Next module allows to visualize a row-normalized heatmap of read counts by guide.

Replicates can be discarded or rearranged by dragging and dropping from left to right.

In [None]:
show_sgRNA_counts(token)

### sgRNA read counts - Line plot

Same data as above, but displayed as a line plot. Error bars represent the standard deviation of the normalized read counts.

In [None]:
show_sgRNA_counts_lines(token)

## Results by tool and by element

The following section allows to browse results by tool and by element. Produce on plot per tool and gene.

In [None]:
tool_results_by_element(results_directory, tools_available, token)

## Data exploration

This section allows to explore the data in a more interactive way.

Conditions can be selected by clicking on the first widget. The second widget allows to select the tools to use for the analysis. Several tools can be selected at the same time. Upon selection, parameters widgets will appear below.

Selection mode can be changed by clicking on the third widget. The default mode is "Intersection" to keep only genes that are selected in all conditions. The "Union" mode will keep all genes that are selected in at least one condition.

For each tool, the following parameters can be changed to filter the results:

- MAGeCK MLE: beta score threshold (score), adjsuted p-value threshold (FDR) and orientation relative to the score (greater or lower).

- MAGeCK RRA: log fold change threshold (score), adjsuted p-value threshold (FDR) and orientation relative to the score (selection). Negative or positive results can be selected (diretion).

- BAGEL: Bayes Factor threshold (score) and orientation relative to the score (greater or lower).

- CRISPhieRmix: local false discovery rate threshold (FDR), mean of top-3 sgRNAs LFC (score) and orientation relative to the score (selection).

- SSREA: normalized enrichment score threshold (score), adjsuted p-value threshold (FDR) and orientation relative to the score (selection).

- Directional Scoring Method: direction of the selection.

In [None]:
multiple_tools_results(tools_available, token)

# Compare conditions

Choose a tool and two comparisons. Results for each gene in each comparisons will be displayed in a scatter plot. To highlight genes, enter a list of genes separated by a comma.

In addition of the plot, the lists of genes in each cadran are displayed below the plot.

In [None]:
condition_comparison(results_directory, tools_available, token)