### Equalizing Codebook, 4 barcoding rounds
#### Author: Arun Chakravorty 
##### Purpose: An algorithm that generates a codebook that has maximally equalized expression across all hybs. 


In [1]:
from FPKM_Equalizing_4BarcodingRounds import *

### 1. Obtaining FPKM per gene  

Only run this if you need to pair genes you want with their expression. If the gene is not present in the expression table, the expression will be set to 0. Otherwise, start from 2(Creating Codebook) directly. 

In [2]:
# Load in dataframe containing genes and expression per gene
# Make sure the column heading is `gene_symbols` and `avg`
max_tpm = pd.read_csv('ExampleFiles/max_tpm_ensemble.csv')
max_tpm = max_tpm.rename(columns={'Unnamed: 0': 'gene_symbols', 'TPM': 'avg'})
max_tpm.head()

Unnamed: 0,gene_symbols,avg
0,ENSMUSG00000007777,31.121931
1,ENSMUSG00000042208,13.532153
2,ENSMUSG00000020831,149.176683
3,ENSMUSG00000107002,50.3949
4,ENSMUSG00000058706,4.861402


In [3]:
# Load in the different genes per channel. 
# Code will pair 'ensemble id' with the expression found under `gene_symbols` in the max_tpm table. 
# Choosing 645 genes to fit within 9^4 with parity check. 

channel1 = pd.read_csv('ExampleFiles/channel1_genes.csv')[5:650]
channel2 = pd.read_csv('ExampleFiles/channel2_genes.csv')[5:650]
channel3 = pd.read_csv('ExampleFiles/channel3_genes.csv')[5:650]

In [4]:
channel1.head()

Unnamed: 0.1,Unnamed: 0,ensemble id,gene sym
5,5,ENSMUSG00000057278,Snrpg
6,6,ENSMUSG00000026069,Il1rl1
7,7,ENSMUSG00000029304,Spp1
8,8,ENSMUSG00000011752,Pgam1
9,9,ENSMUSG00000090553,Snrpe


In [5]:
# Obtain FPKMS per gene 
Channel1FPKMS = getFPKMS(channel1, max_tpm)
Channel1FPKMS.head()

Unnamed: 0,gene_symbols,avg
0,ENSMUSG00000057278,1003.7533
1,ENSMUSG00000026069,975.897425
2,ENSMUSG00000029304,902.313874
3,ENSMUSG00000011752,816.43778
4,ENSMUSG00000090553,782.729194


### 2. Creating Codebook. 

You may start from here if you already have a dataframe of genes and average expression for each. Simply make sure the columns are `gene_symbols` and `avg` like the dataframe above. **IMPORTANT** Make sure the dataframe is sorted by order of expression.  

Parameters: # of pseudocolors, FPKM dataframe(must have columns as 'gene_symbols' and 'avg')

In [6]:
'''
Inputs
pc - Integer number of pseudocolors in codebook design.
FPKMS - Pandas dataframe containing the genes under 'gene_symbols', and the expression value under 'avg'.

Returns a pandas dataframe containing all genes and their barcode assignment.
'''

Channel1Codebook = GenerateCodebook(9, Channel1FPKMS)

Created empty codebook
Assigned Highest Expressing Genes
RemainingCodebook Length 720
[8, 8, 3, 1]
1
RemainingCodebook Length 719
[9, 9, 9, 9]
2
RemainingCodebook Length 718
[5, 1, 1, 7]
3
RemainingCodebook Length 717
[7, 6, 1, 5]
4
RemainingCodebook Length 716
[6, 5, 6, 8]
5
RemainingCodebook Length 715
[2, 1, 8, 2]
6
RemainingCodebook Length 714
[1, 7, 3, 2]
7
RemainingCodebook Length 713
[5, 3, 5, 4]
8
RemainingCodebook Length 712
[3, 7, 2, 3]
9
RemainingCodebook Length 711
[7, 2, 7, 7]
10
RemainingCodebook Length 710
[8, 4, 2, 5]
11
RemainingCodebook Length 709
[9, 9, 8, 8]
12
RemainingCodebook Length 708
[6, 7, 9, 4]
13
RemainingCodebook Length 707
[7, 6, 4, 8]
14
RemainingCodebook Length 706
[3, 4, 7, 5]
15
RemainingCodebook Length 705
[7, 8, 5, 2]
16
RemainingCodebook Length 704
[4, 3, 9, 7]
17
RemainingCodebook Length 703
[2, 4, 4, 1]
18
RemainingCodebook Length 702
[1, 5, 7, 4]
19
RemainingCodebook Length 701
[4, 9, 8, 3]
20
RemainingCodebook Length 700
[7, 1, 2, 1]
21
Remaini

181
RemainingCodebook Length 539
[2, 7, 4, 4]
182
RemainingCodebook Length 538
[1, 1, 8, 1]
183
RemainingCodebook Length 537
[6, 8, 3, 8]
Adding to i
184
RemainingCodebook Length 536
[6, 8, 7, 3]
185
RemainingCodebook Length 535
[9, 6, 6, 3]
186
RemainingCodebook Length 534
[4, 7, 2, 4]
187
RemainingCodebook Length 533
[1, 3, 4, 8]
188
RemainingCodebook Length 532
[8, 2, 1, 2]
Adding to i
189
RemainingCodebook Length 531
[7, 9, 9, 7]
190
RemainingCodebook Length 530
[8, 9, 3, 2]
Adding to i
191
RemainingCodebook Length 529
[9, 1, 1, 2]
192
RemainingCodebook Length 528
[3, 2, 5, 1]
193
RemainingCodebook Length 527
[6, 8, 3, 8]
Adding to i
194
RemainingCodebook Length 526
[2, 8, 3, 4]
195
RemainingCodebook Length 525
[5, 7, 6, 9]
196
RemainingCodebook Length 524
[4, 2, 2, 8]
197
RemainingCodebook Length 523
[2, 2, 1, 5]
Adding to i
Adding to i
Adding to i
198
RemainingCodebook Length 522
[8, 5, 3, 7]
199
RemainingCodebook Length 521
[7, 7, 7, 3]
200
RemainingCodebook Length 520
[2, 3, 6,

353
RemainingCodebook Length 367
[4, 9, 2, 6]
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
354
RemainingCodebook Length 366
[6, 9, 9, 6]
Adding to i
Adding to i
Adding to i
355
RemainingCodebook Length 365
[1, 7, 8, 7]
356
RemainingCodebook Length 364
[5, 9, 3, 8]
357
RemainingCodebook Length 363
[6, 1, 4, 2]
Adding to i
Adding to i
Adding to i
358
RemainingCodebook Length 362
[6, 5, 4, 6]
359
RemainingCodebook Length 361
[2, 5, 4, 2]
360
RemainingCodebook Length 360
[7, 8, 5, 2]
Adding to i
361
RemainingCodebook Length 359
[3, 6, 5, 5]
362
RemainingCodebook Length 358
[9, 8, 3, 2]
363
RemainingCodebook Length 357
[3, 3, 7, 4]
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
364
RemainingCodebook Length 356
[9, 1, 8, 9]
Adding to i
365
RemainingCodebook Length 355
[4, 3, 5, 3]
Adding to i
Adding to i
Adding to i
366
RemainingCodebook Length 354
[5, 7, 5, 8]
Adding to i
367
RemainingCodebook Length 353
[2, 7, 9, 9]
368
RemainingCodebook Length 352
[

Adding to i
Adding to i
Adding to i
485
RemainingCodebook Length 235
[6, 3, 5, 5]
486
RemainingCodebook Length 234
[8, 5, 3, 7]
Adding to i
Adding to i
Adding to i
Adding to i
487
RemainingCodebook Length 233
[8, 9, 8, 7]
488
RemainingCodebook Length 232
[2, 1, 4, 7]
Adding to i
Adding to i
489
RemainingCodebook Length 231
[4, 1, 7, 3]
Adding to i
Adding to i
Adding to i
Adding to i
490
RemainingCodebook Length 230
[7, 6, 7, 2]
Adding to i
491
RemainingCodebook Length 229
[7, 6, 1, 5]
Adding to i
492
RemainingCodebook Length 228
[9, 3, 7, 1]
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
493
RemainingCodebook Length 227
[7, 3, 1, 2]
494
RemainingCodebook Length 226
[2, 2, 5, 9]
495
RemainingCodebook Length 225
[8, 9, 2, 1]
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
496
RemainingCodebook Length 224
[8, 2, 6, 7]
Adding to i
Adding to i
Adding to i
Adding to i
Addi

Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
589
RemainingCodebook Length 131
[8, 7, 4, 1]
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
590
RemainingCodebook Length 130
[3, 1, 8, 3]
Adding to i
Adding to i
591
RemainingCodebook Length 129
[9, 1, 4, 5]
592
RemainingCodebook Length 128
[1, 2, 3, 6]
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
593
RemainingCodebook Length 127
[1, 2, 7, 1]
Adding to i
594
RemainingCodebook Length 126
[1, 6, 2, 9]
595
RemainingCodebook Length 125
[9, 4, 7, 2]
Adding to i
Adding to i
596
RemainingCodebook Length 124
[9, 8, 7, 6]
Adding to i
Adding to i
Adding to i
597
RemainingCodebook Length 123
[9, 1, 5, 6]
Adding to i
Adding to i
598
RemainingCodebook Length 122
[3, 4, 8, 6]
599
RemainingCodebook Length 121
[2, 3, 5, 1]
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
Adding to i
600
RemainingCodebook Length 120
[9, 2, 3, 5]
Adding to i
Adding to i
601
RemainingCod

#### 3. Checking Results

In [7]:
VerifyCodebook(Channel1Codebook)

Codebook has correct length and no repeats
All genes are unique


In [8]:
'''
Inputs
Codebook - Pandas dataframe with gene name, under 'Gene', and codeword assigninment under 'hyb1', 'hyb2' etc.
FPKMS_table - Pandas dataframe containing the genes under 'gene_symbols', and the expression value under 'avg'.
numColors - Integer number of pseudocolors in codebook design.

Returns a pandas dataframe containing the average expression for each hybridization. Columns are each barcoding round, while each row represents the pseudocolor. 
'''

FinalFPKMS = FPKMSforCodebook(Channel1Codebook, Channel1FPKMS, 9)
FinalFPKMS

Unnamed: 0,hyb1,hyb2,hyb3,hyb4
1,8918.809878,8835.880757,8787.741483,8851.762409
2,8861.290372,8804.384146,8808.546926,8808.390003
3,8849.856409,8840.102942,8841.57833,8801.511324
4,8790.818898,8835.340419,8841.273122,8851.774883
5,8811.405771,8826.793243,8892.395211,8753.612601
6,8777.635651,8782.232906,8840.860454,8903.601922
7,8777.798596,8839.656434,8840.68081,8851.267372
8,8819.456964,8837.522527,8789.715166,8800.76653
9,8826.468218,8831.627382,8790.749256,8810.853713


### 4. Save Codebook

In [9]:
Channel1Codebook.to_csv('put_your_path_here', index= False)

In [None]:
FinalFPKMS.to_csv('your_path')