## Neat Sequencing barcode scheme

How would you tag ~100,000 molecular biology samples with unique barcodes?

One way to do it would be to assign two different barcodes, $A$ and $B$, to each sample. 

Let's say the barcodes have to fit on a single 384-well plate. This way, we have 192 $A$ barcodes and 192 $B$ barcodes, and the total number of samples we can uniquely barcode is $192 \times 192 = 36,864$, which is a pretty good number for a PacBio run, which will provide about 10 times that many consensus reads, giving us 10X coverage in the ideal case. 

In [64]:
import numpy as np
import pandas
from itertools import product 

In [65]:
barcodes_A = [ 'A{0:03d}'.format( i ) for i in range( 192 ) ] 
barcodes_B = [ 'B{0:03d}'.format( i ) for i in range( 192 ) ] 
barcodes = product( barcodes_A, barcodes_B ) 

Plates come in with a variable number of samples. Here, we assume that all plates have at least 5 samples. We're going to assume that they're all 384-well plates, because those are the only kind we sell ;) 

In [66]:
n_plates = 30
plates = [ range( np.random.randint( 5, 384 ) ) for i in range( n_plates ) ] 
plates

[range(0, 92),
 range(0, 345),
 range(0, 215),
 range(0, 354),
 range(0, 63),
 range(0, 273),
 range(0, 121),
 range(0, 69),
 range(0, 63),
 range(0, 122),
 range(0, 290),
 range(0, 375),
 range(0, 167),
 range(0, 201),
 range(0, 57),
 range(0, 229),
 range(0, 200),
 range(0, 14),
 range(0, 219),
 range(0, 89),
 range(0, 47),
 range(0, 63),
 range(0, 325),
 range(0, 58),
 range(0, 218),
 range(0, 186),
 range(0, 296),
 range(0, 24),
 range(0, 313),
 range(0, 267)]

In [67]:
df = pandas.DataFrame(None, columns=['plate_index', 'well_index', 'barcode_A', 'barcode_B'])
counter = 0 
for plate_index, plate in enumerate( plates ):
    for well_index, well in enumerate( plate ): 
        barcode_A, barcode_B = next( barcodes ) 
        df.loc[ counter ] = ( plate_index, well_index, barcode_A, barcode_B ) 
        counter += 1 
df

Unnamed: 0,plate_index,well_index,barcode_A,barcode_B
0,0.0,0.0,A000,B000
1,0.0,1.0,A000,B001
2,0.0,2.0,A000,B002
3,0.0,3.0,A000,B003
4,0.0,4.0,A000,B004
5,0.0,5.0,A000,B005
6,0.0,6.0,A000,B006
7,0.0,7.0,A000,B007
8,0.0,8.0,A000,B008
9,0.0,9.0,A000,B009


In [73]:
# do all the pipet steps for A 

# groupby plate first! 

sorted_by_A = df.sort_values( by='barcode_A' ) 

for i, row in sorted_by_A.iterrows():
    print( 'Transfer 1.0 µL from barcode_plate {} to well {}'.format( row.barcode_A, row.well_index ) ) 

# do all the pipet steps for B 

sorted_by_B = df.sort_values( by='barcode_B' ) 

for i, row in sorted_by_B.iterrows():
    pass # do the pipet 

Transfer 1.0 µL from barcode_plate A000 to well 0.0
Transfer 1.0 µL from barcode_plate A000 to well 30.0
Transfer 1.0 µL from barcode_plate A000 to well 31.0
Transfer 1.0 µL from barcode_plate A000 to well 32.0
Transfer 1.0 µL from barcode_plate A000 to well 33.0
Transfer 1.0 µL from barcode_plate A000 to well 34.0
Transfer 1.0 µL from barcode_plate A000 to well 35.0
Transfer 1.0 µL from barcode_plate A000 to well 36.0
Transfer 1.0 µL from barcode_plate A000 to well 37.0
Transfer 1.0 µL from barcode_plate A000 to well 38.0
Transfer 1.0 µL from barcode_plate A000 to well 29.0
Transfer 1.0 µL from barcode_plate A000 to well 39.0
Transfer 1.0 µL from barcode_plate A000 to well 41.0
Transfer 1.0 µL from barcode_plate A000 to well 42.0
Transfer 1.0 µL from barcode_plate A000 to well 43.0
Transfer 1.0 µL from barcode_plate A000 to well 44.0
Transfer 1.0 µL from barcode_plate A000 to well 45.0
Transfer 1.0 µL from barcode_plate A000 to well 46.0
Transfer 1.0 µL from barcode_plate A000 to well