# Ligation and Digestion ECHO Dispense
### Designing gRNA combinations for dispensing via ECHO

<b>General notebook overview:</b>
* The notebook will take a set of ART recommendations as gRNA arrays
* The gRNA arrays are subdivided into gRNA units
* gRNA units are then assigned prefix and suffix linkers according to their positions
* The number of prefix/suffix linkers (max=4) is then translated into a dictionary, which serves as the ECHO Source plate

<b>Requirements:</b>
* ART recommendations
* Stock plasmid concentrations

<b>Outputs: </b>
- .csv files for the ECHO to do dispensing
- .csv for cryostock inoculation
- .csv description for the source plate
- .csv description for the destination plate, which may then be used in the Assemblies workbook

#### Import libraries and relevant files

* Files include the filtered_recs_for_dbtl{cycle}
* The plasmid library for assigning Stock No.

In [1]:
import matplotlib.pylab as plt
import chart_studio.plotly as py
import numpy as np
import pandas as pd
import array
import random
import string
import os
from datetime import datetime
from collections import defaultdict

Import .csv with list of new combinations, sort the values, and then reindex the sorted values

Isolate "cycle" for making appropriate files (i.e., DBTL4, DBTL5, etc)

In [2]:
cycle = 6
filename=f'filtered_recs_for_dbtl{cycle}.csv'

data = pd.read_csv(f"Recommendations/{filename}")
dispense_path= f"dbtl{cycle}"
timestamp = datetime.now().strftime("%Y%m%d")

df = data.sort_values('line_name',ascending=(True))
df = df.sort_values('number_of_grna',ascending=(False))
df= df.reset_index(drop=True)

#### Import IY Plasmid library and then make the library a dictionary for ease of cryostock inoculation (Stock No.)
* Make a dictionary for the 384-well plate format

In [3]:
IY_Library=pd.read_csv('IY_Library.csv')
lookup_dict = dict(zip(IY_Library['Gene'], IY_Library['Stock No.']))

In [4]:
well_count = {'Value': range(1, 385)}
df384 = pd.DataFrame(well_count)

# Dictionary for a 384-well plate
rows = list(string.ascii_uppercase[:16])  # 'A' to 'P'
wells = [f"{row}{col}" for row in rows for col in range(1, 25)]
df384_translate = {i: wells[i-1] for i in range(1, 385)}

#### Making the initial assembly file
* Split the recommendation line names into their constitutive guides based on their position in the string
* Truncated the DataFrame to include only lines that matter for the notebook
* Save a .csv for the ECHO Assemblies

In [5]:
# Function to split the string into four columns
def split_guides(row):
    split_data = row.split('_')
    return pd.Series([f"{split_data[i]}_{split_data[i+1]}" for i in range(0, len(split_data), 2)])

# Apply the function to df and name the columns
df[['gRNA_1', 'gRNA_2', 'gRNA_3', 'gRNA_4']] = df['line_name'].apply(split_guides)

# Remove unwanted columns like line name and isoprenol predictions
df_truncated = df[['number_of_grna','gRNA_1', 'gRNA_2', 'gRNA_3', 'gRNA_4']]
df_truncated.head()

df_truncated.to_csv(os.path.join(dispense_path,f'{timestamp}_Targets_DBTL{cycle}.csv'),index=False)
df_truncated.head()

Unnamed: 0,number_of_grna,gRNA_1,gRNA_2,gRNA_3,gRNA_4
0,4,PP_0368,PP_0528,PP_0751,PP_0815
1,4,PP_0368,PP_0814,PP_0815,PP_1769
2,4,PP_0812,PP_0813,PP_0815,PP_1506
3,4,PP_0751,PP_0814,PP_0815,PP_1769
4,4,PP_0751,PP_0814,PP_0815,PP_1506


#### Dispense Library for Ligation and Digestion

* Parse the line names into different gRNAs based on their position
* Include the counts of each gRNA_Target in the event that they exceed the volume for ECHO Assemblies down the line (it would have to be > 35 counts)

In [6]:
# Function to get unique values and their counts from a column
def get_unique_counts(column):
    unique_values = column.value_counts().reset_index()
    unique_values.columns = ['gRNA_Target', 'Count']
    return unique_values

# Initialize an empty DataFrame to store the results
result_df = pd.DataFrame()

# Process each column
for col in df_truncated.columns:
    if col != 'number_of_grna':
        unique_counts_df = get_unique_counts(df[col])
        unique_counts_df['Position'] = col
        result_df = pd.concat([result_df, unique_counts_df])

# Reset index for the final result
result_df=result_df.reset_index(drop=True, inplace=False)
result_df.head()

Unnamed: 0,gRNA_Target,Count,Position
0,PP_0368,17,gRNA_1
1,PP_0751,12,gRNA_1
2,PP_0528,10,gRNA_1
3,PP_0815,10,gRNA_1
4,PP_0812,4,gRNA_1


* Map guides to IY plasmid placeholder library then inspect the dataframe

In [7]:
def count_unique_gRNA_positions(df, Position):
    unique_positions = df[Position].unique()
    print(f"Unique gRNAs: {unique_positions}")
    return len(unique_positions)

# Count the unique gRNAs in the 'Position' column
unique_gRNA_positions = count_unique_gRNA_positions(result_df, 'Position')
print(f"Number of unique gRNAs: {unique_gRNA_positions}")

Unique gRNAs: ['gRNA_1' 'gRNA_2' 'gRNA_3' 'gRNA_4']
Number of unique gRNAs: 4


In [8]:
# Create a DataFrame with the new entries
new_entries = [f'Vector_for_{i+1}' for i in range(1, unique_gRNA_positions)]
new_df = pd.DataFrame({'gRNA_Target':'Vector', 'Count': 'Variable','Position': new_entries})

# Concatenate the original DataFrame with the new DataFrame
result_df2 = pd.concat([result_df, new_df], ignore_index=True)

#### Associating prefix and suffix linkers based on the position of the guide

- Position <1> is 1P, 2S
- Position <2> is 2P, 3S
- Position <3> is 3P, 5S
- Position <4> is 5P, 6S
- For 2-, 3-, and 4-gRNA arrays, we will need 3 vectors

In [9]:
def add_prefix_suffix(row):
    if row['Position'] == 'gRNA_1':
        row['Prefix'] = '1P'
        row['Suffix'] = '2S'
    elif row['Position'] == 'gRNA_2':
        row['Prefix'] = '2P'
        row['Suffix'] = '3S'
    elif row['Position'] == 'gRNA_3':
        row['Prefix'] = '3P'
        row['Suffix'] = '5S'
    elif row['Position'] == 'gRNA_4':
        row['Prefix'] = '5P'
        row['Suffix'] = '6S'        
    elif row['Position'] == 'Vector_for_2':
        row['Prefix'] = '1S'
        row['Suffix'] = '3P'    
    elif row['Position'] == 'Vector_for_3':   
        row['Prefix'] = '1S'
        row['Suffix'] = '5P'    
    elif row['Position'] == 'Vector_for_4':
        row['Prefix'] = '1S'
        row['Suffix'] = '6P'
    else: 
        raise ValueError("ERROR: Position has an invalid value")
    return row

# Apply the function to each row in the result_df
dest_df = result_df2.apply(add_prefix_suffix, axis=1)
dest_df

Unnamed: 0,gRNA_Target,Count,Position,Prefix,Suffix
0,PP_0368,17,gRNA_1,1P,2S
1,PP_0751,12,gRNA_1,1P,2S
2,PP_0528,10,gRNA_1,1P,2S
3,PP_0815,10,gRNA_1,1P,2S
4,PP_0812,4,gRNA_1,1P,2S
5,PP_0813,4,gRNA_1,1P,2S
6,PP_0814,2,gRNA_1,1P,2S
7,PP_1506,1,gRNA_1,1P,2S
8,PP_0815,18,gRNA_2,2P,3S
9,PP_0751,9,gRNA_2,2P,3S


#### Accounting for dispenses of prefix or suffix linkers that exceed the maximum volume of thbe 384PP ECHO plate
* Relabel those that are > max_dispenses as Prefix_num or Suffix_num for which "_2" for which 8 < num < 16, "=_3" for which 16 < num < 24, etc

In [10]:
max_dispenses = 8

def rename_values_after_threshold(dest_df, column, max_dispenses):
    value_counts = dest_df[column].value_counts()
    renamed_counts = {value: 0 for value in value_counts.index if value_counts[value] > max_dispenses}
    threshold_crosses = {value: 1 for value in value_counts.index if value_counts[value] > max_dispenses}

    for index, row in dest_df.iterrows():
        value = row[column]
        if value in renamed_counts:
            renamed_counts[value] += 1
            if renamed_counts[value] > max_dispenses * threshold_crosses[value]:
                threshold_crosses[value] += 1
            if renamed_counts[value] > max_dispenses:
                threshold_count = threshold_crosses[value]
                dest_df.at[index, column] = f"{value}_{threshold_count}"

# Apply renaming to both Prefix and Suffix columns after max_dispenses
rename_values_after_threshold(dest_df, 'Prefix', max_dispenses)
rename_values_after_threshold(dest_df, 'Suffix', max_dispenses)

##### First template for the ECHO mapping

In [11]:
dest_df['Placeholder'] = dest_df['gRNA_Target'].map(lookup_dict)
new_order = ['Placeholder', 'gRNA_Target', 'Count','Position','Prefix','Suffix']
dest_df = dest_df[new_order]

dest_df

Unnamed: 0,Placeholder,gRNA_Target,Count,Position,Prefix,Suffix
0,IY1529,PP_0368,17,gRNA_1,1P,2S
1,IY2131,PP_0751,12,gRNA_1,1P,2S
2,IY1608,PP_0528,10,gRNA_1,1P,2S
3,IY2099,PP_0815,10,gRNA_1,1P,2S
4,IY2096,PP_0812,4,gRNA_1,1P,2S
5,IY2097,PP_0813,4,gRNA_1,1P,2S
6,IY2098,PP_0814,2,gRNA_1,1P,2S
7,IY2113,PP_1506,1,gRNA_1,1P,2S
8,IY2099,PP_0815,18,gRNA_2,2P,3S
9,IY2131,PP_0751,9,gRNA_2,2P,3S


### Diluting plasmids

* Plasmids are different concentrations after miniprepping, dilute the plasmids to 100 ng/uL where necessary otherwise dispense 4.0 uL
* Add the dispense values to the DataFrame

In [12]:
# Generate a .csv file based on the gRNAs used 
unique_values = dest_df['gRNA_Target'].unique()
unique_df = pd.DataFrame(unique_values, columns=['gRNA_Target'])
unique_df.to_csv(f"DBTL{cycle}/DBTL{cycle}_gRNAs.csv", index=False)

In [13]:
unique_df2=unique_df.copy()
unique_df2['Placeholder'] = unique_df['gRNA_Target'].map(lookup_dict)
unique_df2
unique_df2 = unique_df2.sort_values(by='Placeholder')
unique_df2.to_csv(f"DBTL{cycle}/cryostock_inoculation_dbtl{cycle}.csv", index=False)

##### User input plasmid concentrations via .csv file

* Import the concentrations from an edited .csv populated with nanodrop values
* Enter the file name.csv as the entry

In [14]:
def main():
    global plasmid_conc  # Declare plasmid_conc as global
    file_name = input("Please enter the .csv file name (with extension; e.g., dbtl6_plasmid_conc): ")

    try:
        # Read the .csv file and save it to plasmid_conc
        plasmid_conc = pd.read_csv(file_name)
        print("File read successfully!")
    except Exception as e:
        print(f"An error occurred: {e}")
        return  # Exit the function if there's an error

    # Further processing with the DataFrame if no error occurred
    print(plasmid_conc.head())  # Display the first few rows of the dataframe

if __name__ == "__main__":
    main()

Please enter the .csv file name (with extension; e.g., dbtl6_plasmid_conc):  dbtl6_plasmid_conc.csv


File read successfully!
  gRNA_Target  Concentration
0     PP_0368            350
1     PP_0751            200
2     PP_0528            125
3     PP_0815            115
4     PP_0812            200


In [15]:
plasmid_df=pd.DataFrame(plasmid_conc, columns=['gRNA_Target', 'Concentration'])
plasmid_df.head()

Unnamed: 0,gRNA_Target,Concentration
0,PP_0368,350
1,PP_0751,200
2,PP_0528,125
3,PP_0815,115
4,PP_0812,200


##### Adding water to specified plasmid wells based on concentration provided

* Take plasmid concentration (ng/uL) and determine the atmount of plasmid and water necessary for dilution
* Total volume may not exceed 4 uL
* Plasmid is 100 ng/uL or 4 uL

In [16]:
target_concentration = 100  # Target concentration in units
target_volume = 4  # Target volume in uL

plasmid_df['plasmid_volume'] = target_volume * (target_concentration / plasmid_df['Concentration'])

# Adjust volumes if they exceed 4 uL or if concentration is too high (> 100 units)
plasmid_df.loc[plasmid_df['plasmid_volume'] > target_volume, 'plasmid_volume'] = target_volume
plasmid_df.loc[plasmid_df['Concentration'] > target_concentration, 'plasmid_volume'] = target_volume * (target_concentration / plasmid_df['Concentration'])

# Round volumes to 2 decimal places for clarity
plasmid_df['plasmid_volume'] = plasmid_df['plasmid_volume'].round(1)
plasmid_df['water_volume']= target_volume-plasmid_df['plasmid_volume'].round(1)
plasmid_df.tail()

Unnamed: 0,gRNA_Target,Concentration,plasmid_volume,water_volume
10,PP_4191,370,1.1,2.9
11,PP_4192,235,1.7,2.3
12,PP_4120,100,4.0,0.0
13,PP_4189,100,4.0,0.0
14,Vector,150,2.7,1.3


##### Map to ECHO working DataFrame
- Match the plasmid and water volumes to their associated gRNA_Target
- Show the full destination data frame where plasmid and water are listed as volumes

In [17]:
plasmid_dict = plasmid_df.set_index('gRNA_Target')['plasmid_volume'].to_dict()
water_dict = plasmid_df.set_index('gRNA_Target')['water_volume'].to_dict()

dest_df['plasmid_volume'] = dest_df['gRNA_Target'].map(plasmid_dict)
dest_df['water_volume'] = dest_df['gRNA_Target'].map(water_dict)
dest_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dest_df['plasmid_volume'] = dest_df['gRNA_Target'].map(plasmid_dict)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dest_df['water_volume'] = dest_df['gRNA_Target'].map(water_dict)


Unnamed: 0,Placeholder,gRNA_Target,Count,Position,Prefix,Suffix,plasmid_volume,water_volume
0,IY1529,PP_0368,17,gRNA_1,1P,2S,1.1,2.9
1,IY2131,PP_0751,12,gRNA_1,1P,2S,2.0,2.0
2,IY1608,PP_0528,10,gRNA_1,1P,2S,3.2,0.8
3,IY2099,PP_0815,10,gRNA_1,1P,2S,3.5,0.5
4,IY2096,PP_0812,4,gRNA_1,1P,2S,2.0,2.0
5,IY2097,PP_0813,4,gRNA_1,1P,2S,4.0,0.0
6,IY2098,PP_0814,2,gRNA_1,1P,2S,2.3,1.7
7,IY2113,PP_1506,1,gRNA_1,1P,2S,1.0,3.0
8,IY2099,PP_0815,18,gRNA_2,2P,3S,3.5,0.5
9,IY2131,PP_0751,9,gRNA_2,2P,3S,2.0,2.0


##### Accounting for the amount of water and plasmid to make sure we have enough

* The ECHO PP384 maximum volume is 65 uL
* More than 40 uL removed necessitates another water well

In [18]:
# Ensure the 'Count', 'plasmid', and 'water' columns are numeric, coerce errors to NaN
dest_dfx=dest_df.copy()
dest_dfx['Count'] = pd.to_numeric(dest_df['Count'], errors='coerce')
dest_dfx['water_volume'] = pd.to_numeric(dest_df['water_volume'], errors='coerce')

# Drop rows with NaN values in 'Count', 'plasmid', or 'water'
dest_dfx = dest_dfx.dropna(subset=['water_volume'])

# Calculate the sumproduct of 'Count' with 'plasmid' and 'water'
sum_water = (dest_dfx['water_volume']).sum()
sum_water = np.ceil(sum_water)
# Print the results
print(f"Total amount of water is: {sum_water}")

Total amount of water is: 63.0


### Developing the ECHO Source Plate based on plasmid concentrations, linkers, and guides

- Input concentrations of plasmid
- Assume the following reaction:

| Reagent | Volume (uL) |
|:---|:---|
| Suffix | 5.0 |
| Prefix | 5.0 |
| T4 Ligase | 0.5 |
| BsaI v2 | 0.5 |
| Plasmid | 4.0 |
| Water | 12.0 |
| T4 ligase buffer | 3.0 |

    
- This will require more than one well for:
    - Some prefix/suffixes
    - Water

- Briefly check that the vectors are labeled correctly with prefixes/suffixes

In [19]:
dest_df

Unnamed: 0,Placeholder,gRNA_Target,Count,Position,Prefix,Suffix,plasmid_volume,water_volume
0,IY1529,PP_0368,17,gRNA_1,1P,2S,1.1,2.9
1,IY2131,PP_0751,12,gRNA_1,1P,2S,2.0,2.0
2,IY1608,PP_0528,10,gRNA_1,1P,2S,3.2,0.8
3,IY2099,PP_0815,10,gRNA_1,1P,2S,3.5,0.5
4,IY2096,PP_0812,4,gRNA_1,1P,2S,2.0,2.0
5,IY2097,PP_0813,4,gRNA_1,1P,2S,4.0,0.0
6,IY2098,PP_0814,2,gRNA_1,1P,2S,2.3,1.7
7,IY2113,PP_1506,1,gRNA_1,1P,2S,1.0,3.0
8,IY2099,PP_0815,18,gRNA_2,2P,3S,3.5,0.5
9,IY2131,PP_0751,9,gRNA_2,2P,3S,2.0,2.0


- From the working destination:
    - Extract guides and their positions for assembly down the line

In [20]:
# Extract last two characters from 'Position'
assembly_df=dest_df.copy()

excise = ['gRNA_Target','Position']
assembly_df=assembly_df[excise]
assembly_df['Position_suffix'] = assembly_df['Position'].str[-1:]

# Concatenate 'gRNA_Target' with 'Position_suffix'
assembly_df['gRNA_Target'] = dest_df['gRNA_Target'] + '_' + assembly_df['Position_suffix']

# Drop 'Position_suffix' column if no longer needed
assembly_df = assembly_df.drop(columns=['Position_suffix'])
assembly_df = assembly_df.drop(columns=['Position'])
assembly_df.head()

Unnamed: 0,gRNA_Target
0,PP_0368_1
1,PP_0751_1
2,PP_0528_1
3,PP_0815_1
4,PP_0812_1


- Ravel columns of interest from the working destination

In [21]:
all_values = dest_df[['gRNA_Target', 'Prefix', 'Suffix']].values.ravel()
unique_values = pd.unique(all_values)

# Create a new DataFrame from the list of unique values
source_df = pd.DataFrame({'Source': unique_values})
source_df

Unnamed: 0,Source
0,PP_0368
1,1P
2,2S
3,PP_0751
4,PP_0528
5,PP_0815
6,PP_0812
7,PP_0813
8,PP_0814
9,PP_1506


#### Add extra prefix and suffix wells if dispense volume is above a given threshold
- If the number of dispenses in the destination is higher than 45 uL total volume, make a new source well for the given reagent. We require 5 uL per suffix/prefix and 4 uL total plasmid for these reactions.
- Get the value_counts for prefixes, gRNAs, and suffixes to determine whether they are above the plate volume threshold
- The maximum amount of plasmid volume to be used is 16 uL, so this is for linkers specifically
- Annotate with "_2" 

In [22]:
value_counts = dest_df['Prefix'].value_counts()

# Define the linker threshold assuming 65 uL is dispensed and accounting for 15 uL dead volume
max_dispenses = 8 #for 40 uL 

# Initialize an empty list to store new rows
new_rows_list = []

# Check if counts exceed the threshold
for value, count in value_counts.items():
    if count > max_dispenses:
        # In case the threshold is exceeded multiple times
        additional_rows = count // max_dispenses
        for _ in range(additional_rows):
            new_row = {source_df.columns[0]: f"{value}_{additional_rows+1}"}
            new_rows_list.append(new_row)

# Convert the list of new rows to a DataFrame
new_rows_df = pd.DataFrame(new_rows_list, columns=source_df.columns)

# Add the lines to the source_df plate
source_df2 = pd.concat([source_df, new_rows_df], ignore_index=True)
source_df2

Unnamed: 0,Source
0,PP_0368
1,1P
2,2S
3,PP_0751
4,PP_0528
5,PP_0815
6,PP_0812
7,PP_0813
8,PP_0814
9,PP_1506


#### Add extra water rows if dispense volume is above a given threshold

- Annotate with "_2" 

In [23]:
sum_water=sum_water.round(0)
water_aliquot=65
working_vol=water_aliquot-20
water_wells = np.ceil(sum_water/working_vol)
water_rows_list=[]

for i in range(int(water_wells)):
    water_row = {f"Water_{i+1}"}
    water_rows_list.append(water_row)

water_df = pd.DataFrame(water_rows_list,columns=source_df.columns)
source_df3 = source_df2.copy
source_df3 = pd.concat([source_df2, water_df], ignore_index=True)
source_df3

Unnamed: 0,Source
0,PP_0368
1,1P
2,2S
3,PP_0751
4,PP_0528
5,PP_0815
6,PP_0812
7,PP_0813
8,PP_0814
9,PP_1506


#### Curation
* Reorder the source plate so that sources are alphabetical
* Map the full source plate (ECHO_source) to a 384-well plate, written as a dictionary
* Inspect the ECHO_source DataFrame

In [24]:
ECHO_source=source_df3.sort_values('Source',ascending=(True))
ECHO_source=ECHO_source.reset_index(drop=True)

In [25]:
# Remapping source plate for ECHO
ECHO_source.index = range(1, len(ECHO_source) + 1)
ECHO_source['Well'] = ECHO_source.index.map(df384_translate)
new_order=['Well', 'Source']
ECHO_source=ECHO_source[new_order]
ECHO_source

#Save to dispense_files with timestamp
timestamp = datetime.now().strftime("%Y%m%d")
ECHO_source.to_csv(os.path.join(dispense_path,f'{timestamp}_ECHO_LD_Source_Map_DBTL{cycle}.csv'), index=False)

In [26]:
ECHO_source

Unnamed: 0,Well,Source
1,A1,1P
2,A2,1S
3,A3,2P
4,A4,2P_2
5,A5,2S
6,A6,3P
7,A7,3P_2
8,A8,3S
9,A9,3S_2
10,A10,5P


Create a destination plate DataFrame for later use as the source plate in assemblies

In [27]:
ECHO_dest=assembly_df
len(ECHO_dest)
ECHO_dest=ECHO_dest.reset_index(drop=True)
ECHO_dest.index = ECHO_dest.index + 1


ECHO_dest.index = range(1, len(ECHO_dest) + 1)
ECHO_dest['Well'] = ECHO_dest.index.map(df384_translate)
new_order2=['Well', 'gRNA_Target']
ECHO_dest=ECHO_dest[new_order2]

ECHO_dest.to_csv(os.path.join(dispense_path,f'{timestamp}_ECHO_Assembly_Source_Map_DBTL{cycle}.csv'),index=False)

#### Make the source plate into a dictionary

- Dictionary can then be used to map Source wells

In [28]:
# Convert the dictionary DataFrame to a dictionary
replacement_dict = ECHO_source.set_index('Source')['Well'].to_dict()

For dispensing the different water wells

In [29]:
# Initialize variables
running_sum = 0
well_counter = 1
well_list = []

# Loop through the DataFrame and assign wells
for value in dest_df['water_volume']:
    running_sum += value
    if running_sum > 40:
        well_counter += 1
        running_sum = value  # start new well with current value
    well_list.append(f'Water_{well_counter}')

dest_df['water_source'] = well_list

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dest_df['water_source'] = well_list


In [30]:
# Replace the prefix, suffix, and gRNA_Target in the source DataFrame using the dictionary
for column in dest_df.columns:
    dest_df.loc[:, column] = dest_df[column].map(replacement_dict).fillna(dest_df[column])

LD_dispense_partial_df = dest_df.copy()
LD_dispense_partial_df

  dest_df.loc[:, column] = dest_df[column].map(replacement_dict).fillna(dest_df[column])
  dest_df.loc[:, column] = dest_df[column].map(replacement_dict).fillna(dest_df[column])


Unnamed: 0,Placeholder,gRNA_Target,Count,Position,Prefix,Suffix,plasmid_volume,water_volume,water_source
0,IY1529,A15,17,gRNA_1,A1,A5,1.1,2.9,B6
1,IY2131,A17,12,gRNA_1,A1,A5,2.0,2.0,B6
2,IY1608,A16,10,gRNA_1,A1,A5,3.2,0.8,B6
3,IY2099,A21,10,gRNA_1,A1,A5,3.5,0.5,B6
4,IY2096,A18,4,gRNA_1,A1,A5,2.0,2.0,B6
5,IY2097,A19,4,gRNA_1,A1,A5,4.0,0.0,B6
6,IY2098,A20,2,gRNA_1,A1,A5,2.3,1.7,B6
7,IY2113,A22,1,gRNA_1,A1,A5,1.0,3.0,B6
8,IY2099,A21,18,gRNA_2,A3,A8,3.5,0.5,B6
9,IY2131,A17,9,gRNA_2,A3,A8,2.0,2.0,B6


##### Format for destination wells based on a 384-well plate format

In [31]:
LD_dispense_partial_df.index = range(1, len(LD_dispense_partial_df) + 1)

# Remapping source plate for ECHO
LD_dispense_partial_df['Destination Well'] = LD_dispense_partial_df.index.map(df384_translate)
LD_dispense_partial_df

Unnamed: 0,Placeholder,gRNA_Target,Count,Position,Prefix,Suffix,plasmid_volume,water_volume,water_source,Destination Well
1,IY1529,A15,17,gRNA_1,A1,A5,1.1,2.9,B6,A1
2,IY2131,A17,12,gRNA_1,A1,A5,2.0,2.0,B6,A2
3,IY1608,A16,10,gRNA_1,A1,A5,3.2,0.8,B6,A3
4,IY2099,A21,10,gRNA_1,A1,A5,3.5,0.5,B6,A4
5,IY2096,A18,4,gRNA_1,A1,A5,2.0,2.0,B6,A5
6,IY2097,A19,4,gRNA_1,A1,A5,4.0,0.0,B6,A6
7,IY2098,A20,2,gRNA_1,A1,A5,2.3,1.7,B6,A7
8,IY2113,A22,1,gRNA_1,A1,A5,1.0,3.0,B6,A8
9,IY2099,A21,18,gRNA_2,A3,A8,3.5,0.5,B6,A9
10,IY2131,A17,9,gRNA_2,A3,A8,2.0,2.0,B6,A10


In [32]:
# Melt the DataFrame
LD_dispense_df = pd.melt(LD_dispense_partial_df, id_vars=['Destination Well'], value_vars=['gRNA_Target', 'Prefix', 'Suffix'],
                    var_name='Category', value_name='Source Well')

# Sort the DataFrame by 'Destination_Well' to maintain the order
LD_dispense_df = LD_dispense_df.sort_values(by='Destination Well').reset_index(drop=True)

print("There will be", f'{len(LD_dispense_df)}', "dispenses if enzyme, ligase, and water are already mixed.")

There will be 114 dispenses if enzyme, ligase, and water are already mixed.


In [33]:
LD_dispense_df

Unnamed: 0,Destination Well,Category,Source Well
0,A1,gRNA_Target,A15
1,A1,Suffix,A5
2,A1,Prefix,A1
3,A10,Prefix,A3
4,A10,gRNA_Target,A17
...,...,...,...
109,B8,Suffix,A14
110,B8,Prefix,A10
111,B9,Prefix,A10
112,B9,Suffix,A14


### Dispense library for multiplexed gRNA plasmid construction

* Assuming that plasmids are not all the same concentration

In [34]:
LD_dispense_partial_df

Unnamed: 0,Placeholder,gRNA_Target,Count,Position,Prefix,Suffix,plasmid_volume,water_volume,water_source,Destination Well
1,IY1529,A15,17,gRNA_1,A1,A5,1.1,2.9,B6,A1
2,IY2131,A17,12,gRNA_1,A1,A5,2.0,2.0,B6,A2
3,IY1608,A16,10,gRNA_1,A1,A5,3.2,0.8,B6,A3
4,IY2099,A21,10,gRNA_1,A1,A5,3.5,0.5,B6,A4
5,IY2096,A18,4,gRNA_1,A1,A5,2.0,2.0,B6,A5
6,IY2097,A19,4,gRNA_1,A1,A5,4.0,0.0,B6,A6
7,IY2098,A20,2,gRNA_1,A1,A5,2.3,1.7,B6,A7
8,IY2113,A22,1,gRNA_1,A1,A5,1.0,3.0,B6,A8
9,IY2099,A21,18,gRNA_2,A3,A8,3.5,0.5,B6,A9
10,IY2131,A17,9,gRNA_2,A3,A8,2.0,2.0,B6,A10


Temporarily split DataFrame into gRNA_Targets and other_categories

In [35]:
gRNA_Target_df = LD_dispense_df[LD_dispense_df['Category'] == 'gRNA_Target'].reset_index(drop=True)
other_categories_df = LD_dispense_df[LD_dispense_df['Category'] != 'gRNA_Target'].reset_index(drop=True)
dispense_volumes={'Suffix': 5.0, 'Prefix':5.0}
other_categories_df['Transfer Volume'] = other_categories_df['Category'].map(dispense_volumes)
#other_categories_df=other_categories_df['Destination Well','Transfer Volume','Source Well']

In [36]:
LD_dispense_partial_df

Unnamed: 0,Placeholder,gRNA_Target,Count,Position,Prefix,Suffix,plasmid_volume,water_volume,water_source,Destination Well
1,IY1529,A15,17,gRNA_1,A1,A5,1.1,2.9,B6,A1
2,IY2131,A17,12,gRNA_1,A1,A5,2.0,2.0,B6,A2
3,IY1608,A16,10,gRNA_1,A1,A5,3.2,0.8,B6,A3
4,IY2099,A21,10,gRNA_1,A1,A5,3.5,0.5,B6,A4
5,IY2096,A18,4,gRNA_1,A1,A5,2.0,2.0,B6,A5
6,IY2097,A19,4,gRNA_1,A1,A5,4.0,0.0,B6,A6
7,IY2098,A20,2,gRNA_1,A1,A5,2.3,1.7,B6,A7
8,IY2113,A22,1,gRNA_1,A1,A5,1.0,3.0,B6,A8
9,IY2099,A21,18,gRNA_2,A3,A8,3.5,0.5,B6,A9
10,IY2131,A17,9,gRNA_2,A3,A8,2.0,2.0,B6,A10


#### Consolidate individual dispenses
* We want a three column output with Source well, volume, and destination well

| Sample Group | Source Plate Type | Source Well | Source Plate Name | Destination Plate Name | Destination Well | Transfer Volume |
|--------------|-------------------|-------------|-------------------|------------------------|------------------|-----------------|
| AQ           | 384PP_AQ_SP2      | A1          | Parts_Library_1   | Digest_and_Ligate      | A1               | 2000            |
| AQ           | 384PP_AQ_SP2      | A2          | Parts_Library_1   | Digest_and_Ligate      | A2               | 2000            |
|


In [37]:
other=['Source Well','Transfer Volume','Destination Well']
gRNAs=['gRNA_Target','plasmid_volume','Destination Well']
water=['water_source','water_volume','Destination Well']

# Extracting columns and creating DataFrames
other_df = pd.DataFrame(other_categories_df[other])
gRNAs_df = pd.DataFrame(LD_dispense_partial_df[gRNAs])
gRNAs_df = gRNAs_df.rename(columns={'gRNA_Target': 'Source Well'})
gRNAs_df = gRNAs_df.rename(columns={'plasmid_volume':'Transfer Volume'})
water_df = pd.DataFrame(LD_dispense_partial_df[water])
water_df = water_df.rename(columns={'water_source': 'Source Well'})
water_df = water_df.rename(columns={'water_volume':'Transfer Volume'})         
water_df_filtered = water_df[water_df['Transfer Volume'] != 0]
# Concatenate the data frames so that everything is on the same list

Final_LD_Dispense = []
Final_LD_Dispense = pd.concat([gRNAs_df, water_df_filtered, other_df], axis=0)

In [38]:
Final_LD_Dispense['Transfer Volume'].min()

0.5

Add the ECHO specific columns (sample_group, source_plate_name, destination_plate_name)

In [39]:
ECHO_columns = {
    "Sample Group": "AQ",
    "Source Plate Name": f"Parts_Library_DBTL{cycle}",
    "Destination Plate Name": f"Ligation_Digestion_DBTL{cycle}",
    "Source Plate Type": "384PP_AQ_SP2"
}

# Add new columns to DataFrame
Final_LD_Dispense = Final_LD_Dispense.assign(**ECHO_columns)
Final_LD_Dispense.head(1)

Unnamed: 0,Source Well,Transfer Volume,Destination Well,Sample Group,Source Plate Name,Destination Plate Name,Source Plate Type
1,A15,1.1,A1,AQ,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,384PP_AQ_SP2


In [40]:
#Reorganize the DataFrame headers

columns_order = [3, 6, 0, 4, 5, 2, 1]  # "New Column 1" will be first
Final_LD_Dispense = Final_LD_Dispense[Final_LD_Dispense.columns[columns_order]]
Final_LD_Dispense['Transfer Volume'] = Final_LD_Dispense['Transfer Volume'].multiply(1000).round()
Final_LD_Dispense

Unnamed: 0,Sample Group,Source Plate Type,Source Well,Source Plate Name,Destination Plate Name,Destination Well,Transfer Volume
1,AQ,384PP_AQ_SP2,A15,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,A1,1100.0
2,AQ,384PP_AQ_SP2,A17,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,A2,2000.0
3,AQ,384PP_AQ_SP2,A16,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,A3,3200.0
4,AQ,384PP_AQ_SP2,A21,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,A4,3500.0
5,AQ,384PP_AQ_SP2,A18,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,A5,2000.0
...,...,...,...,...,...,...,...
71,AQ,384PP_AQ_SP2,A12,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,B7,5000.0
72,AQ,384PP_AQ_SP2,A14,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,B8,5000.0
73,AQ,384PP_AQ_SP2,A10,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,B8,5000.0
74,AQ,384PP_AQ_SP2,A10,Parts_Library_DBTL6,Ligation_Digestion_DBTL6,B9,5000.0


In [41]:
length=len(Final_LD_Dispense)
print("The total number of dispenses accounting for plasmid concentrations < 100 ng/uL is " f'{length}')

The total number of dispenses accounting for plasmid concentrations < 100 ng/uL is 145


In [42]:
Final_LD_Dispense = Final_LD_Dispense.sort_values('Destination Well',ascending=(True))

#Save to Dispense_Files
Final_LD_Dispense.to_csv(os.path.join(dispense_path, f'{timestamp}_ECHO_LD_Dispense_DBTL{cycle}.csv'), index=False)