### Equalizing Codebook, 4 barcoding rounds
#### Author: Arun Chakravorty 
##### Purpose: An algorithm that generates a codebook that has maximally equalized expression across all hybs. 


In [None]:
from FPKM_Equalizing_4BarcodingRounds import *

### 1. Obtaining FPKM per gene  

Only run this if you need to pair genes you want with their expression. If the gene is not present in the expression table, the expression will be set to 0. Otherwise, start from 2(Creating Codebook) directly. 

In [None]:
# Load in dataframe containing genes and expression per gene
# Make sure the column heading is `gene_symbols` and `avg`
max_tpm = pd.read_csv('ExampleFiles/max_tpm_ensemble.csv')
max_tpm = max_tpm.rename(columns={'Unnamed: 0': 'gene_symbols', 'TPM': 'avg'})
max_tpm.head()

In [None]:
# Load in the different genes per channel. 
# Code will pair 'ensemble id' with the expression found under `gene_symbols` in the max_tpm table. 
# Choosing 645 genes to fit within 9^4 with parity check. 

channel1 = pd.read_csv('ExampleFiles/channel1_genes.csv')[5:650]
channel2 = pd.read_csv('ExampleFiles/channel2_genes.csv')[5:650]
channel3 = pd.read_csv('ExampleFiles/channel3_genes.csv')[5:650]

In [None]:
channel1.head()

In [None]:
# Obtain FPKMS per gene 
Channel1FPKMS = getFPKMS(channel1, max_tpm)
Channel1FPKMS.head()

### 2. Creating Codebook. 

You may start from here if you already have a dataframe of genes and average expression for each. Simply make sure the columns are `gene_symbols` and `avg` like the dataframe above. **IMPORTANT** Make sure the dataframe is sorted by order of expression.  

Parameters: # of pseudocolors, FPKM dataframe(must have columns as 'gene_symbols' and 'avg')

In [None]:
'''
Inputs
pc - Integer number of pseudocolors in codebook design.
FPKMS - Pandas dataframe containing the genes under 'gene_symbols', and the expression value under 'avg'.

Returns a pandas dataframe containing all genes and their barcode assignment.
'''

Channel1Codebook = GenerateCodebook(9, Channel1FPKMS)

#### 3. Checking Results

In [None]:
VerifyCodebook(Channel1Codebook)

In [None]:
'''
Inputs
Codebook - Pandas dataframe with gene name, under 'Gene', and codeword assigninment under 'hyb1', 'hyb2' etc.
FPKMS_table - Pandas dataframe containing the genes under 'gene_symbols', and the expression value under 'avg'.
numColors - Integer number of pseudocolors in codebook design.

Returns a pandas dataframe containing the average expression for each hybridization. Columns are each barcoding round, while each row represents the pseudocolor. 
'''

FinalFPKMS = FPKMSforCodebook(Channel1Codebook, Channel1FPKMS, 9)
FinalFPKMS

### 4. Save Codebook

In [None]:
Channel1Codebook.to_csv('put_your_path_here', index= False)

In [None]:
FinalFPKMS.to_csv('your_path')