# Multiplexed shRNA screening AP009 Analytics
This notebook includes analysis scripts of processing count table of multiplexed shRNA screening experiments on AP009 cell line received from Cellecta, and evaluating 
* whether multi-shRNA system works as expected
* prognostic effect of genotype X, and
* predictive effect of genetype X on response to treatment

## Dependency pkgs

In [6]:
import pandas as pd
import numpy as np
from prettytable import PrettyTable

# seed of random number generator
rng_seed = 1234

## Experimental design and parameters

Load experimental design xlsx file (from Ian)
* Master list tab of target genes, vector types, and annotations
* Experimental design tab of groups and samples
  
Load sample description xlsx file (from Cellecta)
* Actual Sample ID and description on flowcell, and inferred information of Group, Day_Tx, Replicate, Dox, etc.

Also check out schematic workflow of experimental design (from Zheng)

![multiplexed shRNA screening experiment](schematic_workflow_experimental_design.png)

In [7]:
#### experimental design xlsx file
file_path = "~/Documents/Projects/Multi_shRNA_screening_AP009/data/Multiplexed shRNA screen in 2D AP009 - Cellecta.xlsx"

xls = pd.ExcelFile(file_path)

## Gene targets
gene_targets_df = pd.read_excel(xls, sheet_name="Master list")
# Filter out "Individual dual shRNA vector" from gene targets
filtered_gene_targets_df = gene_targets_df[gene_targets_df["Vector type"] != "Individual dual shRNA vector"]
filtered_gene_targets = filtered_gene_targets_df["Mouse gene symbol"].dropna().unique()

# ## Define experimental parameters
# # Num clonal barcodes per gene
# num_clonal_barcodes = 12000  
# # Num shRNAs per gene
# num_shRNAs_per_gene = 10  
# # N_reps per condition per timepoint
# num_replicates = 2  

## Define experimental conditions and timepoints 
# available timepoints
time_points = ["0d", "3d", "6d", "9d"]
# experimental conditions based on design
conditions = [
    "Baseline_NoDox_Vehicle",
    "Baseline_Dox_PreTx",  # Only at 0d
    "Prognostic_Dox_Vehicle",
    "Predictive_Dox_7977_LowDose",  # IC30 early, IC50 later
    "Predictive_Dox_7977_HighDose"  # IC90
]

#### sample description xlsx file
file_path_cellecta = "/Users/bli/Documents/Projects/Multi_shRNA_screening_AP009/data/sample description.xlsx"

sd_df = pd.read_excel(file_path_cellecta, sheet_name='Sheet1')

# Set the first row as column headers and remove it from the data
sd_df.columns = sd_df.iloc[0]
sd_df = sd_df[1:].reset_index(drop=True)

# Rename columns to remove any unintended whitespace
sd_df.columns = sd_df.columns.str.strip()

Utility function of table viewing

In [12]:
## utility function of printing table
def ViewTable(df, top_n_rows = None):
    table = PrettyTable(df.columns.tolist())
    if top_n_rows:
        df_tmp = sd_df.head(top_n_rows)
    else:
        df_tmp = sd_df
    for row in df_tmp.itertuples(index=False, name=None):
        table.add_row(row)
    print(table)

In [13]:
ViewTable(sd_df, 5)

+-----------+--------------------+----------------+------------------------------------------------+-------------------+-----+-------+--------+-----------+-----+------+
| Sample ID | Sample Description |    Library     |                     Vector                     |      Flowcell     | nan | Group | Day_Tx | Replicate | Dox | Note |
+-----------+--------------------+----------------+------------------------------------------------+-------------------+-----+-------+--------+-----------+-----+------+
|     D1    |  T13_2_Dox_0.6nM   | 2.2K-REVMED-ZZ |  pRSIT16cb-U6tet-sh-CMV-tetR-2A-TagRFP-2A-Puro | 25-03-11  102190  | nan |   5   |   9    |     2     |  Y  | nan  |
|     D2    |  T13_1_Dox_0.6nM   | 2.2K-REVMED-ZZ |  pRSIT16cb-U6tet-sh-CMV-tetR-2A-TagRFP-2A-Puro | 25-03-11  102190  | nan |   5   |   9    |     1     |  Y  | nan  |
|     D3    |  T13_2_Dox_3.5nM   | 2.2K-REVMED-ZZ |  pRSIT16cb-U6tet-sh-CMV-tetR-2A-TagRFP-2A-Puro | 25-03-11  102190  | nan |   4   |   9    |     2     |

In [14]:
ViewTable(sd_df)

+-----------+--------------------+----------------+------------------------------------------------+-------------------+-----+--------+--------+-----------+-----+--------------------------------+
| Sample ID | Sample Description |    Library     |                     Vector                     |      Flowcell     | nan | Group  | Day_Tx | Replicate | Dox |              Note              |
+-----------+--------------------+----------------+------------------------------------------------+-------------------+-----+--------+--------+-----------+-----+--------------------------------+
|     D1    |  T13_2_Dox_0.6nM   | 2.2K-REVMED-ZZ |  pRSIT16cb-U6tet-sh-CMV-tetR-2A-TagRFP-2A-Puro | 25-03-11  102190  | nan |   5    |   9    |     2     |  Y  |              nan               |
|     D2    |  T13_1_Dox_0.6nM   | 2.2K-REVMED-ZZ |  pRSIT16cb-U6tet-sh-CMV-tetR-2A-TagRFP-2A-Puro | 25-03-11  102190  | nan |   5    |   9    |     1     |  Y  |              nan               |
|     D3    |  T13_2

In [None]:
print(df.summary)