# BME 590 - Workshop 4 - Modular Cloning
**Professor:** Emma Chory, Ph.D.

**Authors:** 
Rick Wierenga, Joe Laforet, Stefan Golas, Ben Perry

---

### Usage Note
**Reminder** - You should be running this notebook **locally** on **VS Code** not navigating it through **GitHub**.

**Reminder 2** - Remember to run `git pull` before copying this notebook so you get the most recent class updates. See [Section 6 of the class README](https://github.com/chory-lab/bme590-fall-2025#step-6-updating-to-the-latest-version)

---

## Working from Files

In lab automation, it can be useful to load instructions or "work orders" form a file. 

For example, imagine a pipeline that takes a traditional bioinformatic analysis pipeline to determine which genes are most expressed in a given experiment, and subsequently uses **mapped locations** of the relevant wells on a liquid handling dock to proceed with an experiment. 

Being able to not only connect an outside workflow to the robot via file transfer, but also being able to **map biological elements** to their respective wells and plates on a deck can help integrate workflows even further.

Another example is golden mutagenesis, a technique that uses the modularity of golden gate cloning to generate diverse plasmid libraries. Libraries of DNA fragments can be assembled as modules to generate combinatorial diversity from a relatively small number of individual parts that can be assembled in a standardized way.

This technique [has been used](https://www.nature.com/articles/s41598-019-47376-1) to evolve a number of enzymes in recent years.

### Pandas, Polars, and Golden Gate Assembly

One of the most common types of data file is the **.csv** file, or comma separated values file. There are, of course, nearly infinite ways to store information in file formats; however, let's suppose we have a **mapping** of DNA fragment IDs and the wells which each fragment is in on the liquid handling deck.

To illustrate this scenario, let's set up our PLR deck. First, imports:

In [None]:
# standard imports
from pylabrobot.liquid_handling.backends.backend import LiquidHandlerBackend
from pylabrobot.liquid_handling import LiquidHandler
from pylabrobot.liquid_handling.backends import LiquidHandlerChatterboxBackend
from pylabrobot.resources.opentrons import OTDeck
from pylabrobot.visualizer.visualizer import Visualizer

# resources for deck setup
from pylabrobot.resources import (
    Deck,
    set_tip_tracking,
    set_volume_tracking,
    set_cross_contamination_tracking,
    corning_96_wellplate_360ul_flat,
    opentrons_96_tiprack_1000ul,
    opentrons_24_tuberack_eppendorf_2ml_safelock_snapcap_acrylic
)

# import os
import os

Enable error tracking for cross-contamination, volumes, and tips.

In [None]:
set_tip_tracking(enabled = True)
set_volume_tracking(enabled = True)
set_cross_contamination_tracking(enabled = True)

Now define our standard deck visualization function

In [None]:
async def visualize_deck(deck: Deck,
                         backend: LiquidHandlerBackend):
    # try setting up the deck with error-catching
    try:
        lh = LiquidHandler(backend=backend, deck=deck)
        vis = Visualizer(resource = lh)
        await lh.setup()
        await vis.setup()
        return lh
    except Exception as e:
        print(f"Error! Got excpetion: {e}")

Now, let's set up our OpenTrons-2 deck with plates for our DNA fragments and a plate to combine them all together on.

In [None]:
async def make_golden_gate_ot2():

    # instantiate deck
    deck = OTDeck()

    # add plates with fragments
    plate_slots = [4, 5, 6]
    for i, plate_slot in enumerate(plate_slots):
        deck.assign_child_at_slot(corning_96_wellplate_360ul_flat(name = f"fragments_{i}"), plate_slot)
    
    # add tip racks
    tip_rack_slots = [1, 2, 3]
    for i, tip_slot in enumerate(tip_rack_slots):
        deck.assign_child_at_slot(opentrons_96_tiprack_1000ul(name = f"tip_rack_{i}"), tip_slot)
    
    # add working plate
    deck.assign_child_at_slot(corning_96_wellplate_360ul_flat(name = f"cloning_plate"), 10)

    return deck

# call function
deck = await make_golden_gate_ot2()
lh = await visualize_deck(deck, LiquidHandlerChatterboxBackend())

Now, assume we have created a mapping file of DNA ID - Plate Name - Well Location and saved it as a `.csv` file. We need some way to **load the file into memory** to see our ID mapping.

Luckily, one of the widest used Python libraries is Pandas, which enables you to load data into a **table in memory, known as a DataFrame**, upon which you can do operations.


To start, let's **import pandas**

In [None]:
import pandas as pd
print(pd.__version__)

You should have gotten an **error** along these lines (if you did not, you can go ahead and skip this section)

```txt
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[5], line 1
----> 1 import pandas as pd

ModuleNotFoundError: No module named 'pandas'
```

This happens because **pandas is not an internal Python module** and we do not have it installed in our current environment (remember our setup from the class README - section 3.4?). In general, if you would like to install external packages, you need to do so from the terminal in your environment. Let's go ahead and install it

---

### **Exercise 0.** Pandas Install (5 pts)


In this case, we can install pandas into our **conda environment** from the terminal as follows:

1. Run `conda activate lab-automation` to activate the lab-automation environment

2. Install pandas via `conda install pandas`.

3. Now, if needed, **reload your Jupyter Kernel** by clicking **Restart** in VS Code. 

4. Now, if needed, **run all the cells up to this point again**. You should no longer get a pandas import error since it is installed!

**Note:** There is no submission for this problem becuase you will need Pandas for some of the other exercises. You will not recieve the points here only if your other exercises do not include pandas.

Now that pandas is installed, let's **load our CSV** into memory using the `read_csv()` function. We need to point this function to our data, which is present in the newest version of the GitHub repo, under the `workshop_4_data/` folder.


---

**Note:** If you have not used pandas before, you may find [this](https://pandas.pydata.org/docs/user_guide/10min.html) tutorial helpful. **You WILL be using Pandas in the exercises for this workshop, so this is recommended reading!**

---

In [None]:
cwd = os.getcwd()
print(os.path.dirname(cwd))

**Important:** You are telling Python where to look for the data you will load. Therefore, `cwd`, when printed above should point to the `bme590-fall-2025`. If it does not, please copy past the directory path and replace the `cwd` varialbe with the string representation of the path

In [None]:
cwd = os.getcwd() # -> this should point to bme590-fall-2025/assignments. If it doesn't replace the term here with the copy-pasted path location to the parent folder in which this scirpt is.

# define the path pointing to our data file
csv_path = os.path.join(os.path.dirname(cwd), "workshop_4_data", "fragments.csv")

# read the dataframe to RAM
df = pd.read_csv(csv_path)

# print the first 5 rows of the data
df.head() # <- NOTE: df.head(n) prints the first n rows of a dataframe. df.tail(n) does the same but for the last n rows.

As you can see, we have the following columns:

- `well_id` - The identifier of the well for each DNA fragment

- `fragment_id` - The identifier of the DNA fragment itself

- `plate_id` - The identifier of the plate itself.

- `volume (uL)` - The volume of that fragment in a given plate well.

We can now write a for loop to iterate through each row using `df.iterrows()`. **Note** that this function returns a tuple at each iteration of the form `index, row`, where `row` is a **dictionary** of column_name-value pairs.

The below code will run a for loop and show you how to iterrate through the loaded dataframe and extract the values for each column.

**Note -** The `break` keyword exits the loop immediately, so only one iteration is done, as an example print statement.

In [None]:
for idx, row in df.iterrows(): # iterate over rows
    print(f"{row.keys()=}") # print the keys in the row dictionary
    print()
    print(f"{row['well_id']=}") # extract the well id
    print(f"{row['fragment_id']=}") # extract the fragment id
    print(f"{row['plate_id']=}") # extract the plate id
    print(f"{row['volume (uL)']=}") # extract the volume
    break

You can also filter the dataframe by values in certain columns. For example, filtering it so that only plate 2 is present in the data.

In [None]:
df_plate_2 = df[df['plate_id'] == 2]
df_plate_2.head()

You can also filter by more specific things, such as getting the plate and well containing a fragment id of interest

In [None]:
df_specific_fragment = df[df["fragment_id"] == "X095"]
df_specific_fragment.head()

Since there are multiple wells containing our fragment of interest, we can **randomly choose one** like this.

**Hint -** This is probably going to be useful in the exercises. If there were a scenario where you needd more volume than was present in a singular well, you may need to use information from multiple rows. However, for this workshop, we will keep it simple

In [None]:
row = df_specific_fragment.sample(n = 1)
row

These are the basics for what you need to complete this workshop, but we encourage you to explore other pandas functionalities.

As a side note, if you ever work with **large scale datasets** such as those in the gigabyte range, a useful library is [Polars](https://pola.rs/), which enables large-scale multi-threaded dataframe operations, in a lower level language, and even [on the GPU](https://pola.rs/posts/gpu-engine-release/).

## Exercises

---

**TO-DO:** For each of the following exercses, since they are a bit different from each other, we will require submission of one or more of the following:

- a `.txt` file containing your code.

- a `.gif` file containing a GIF animation of your protocol running

- a `.png` or `.jpg` image of your deck setup, if needed.

- a `.csv` file, for some exercises

We will explicitly tell you for each exercise and sub-exercise, which items to submit. There will be a **sample submission format** for each exercise.

---

Some exercises below will ask you to define your own **functions or classes**. We will provide the **function or class name** and sometimes the **input argument names** for you, but in gneeral, the body of the functions is up to you.

You should include `time.sleep(x)` calls between every step so you have time to visualize the protocol as it runs. At a minimum, `x = 0.1` for 0.1 s delay. Experiment with this value for one that works for you.

Once you get your protocol working as intended for each problem, you will need to **record a GIF** of your protocol running for each exercise, as directed

**IMPORTANT** - You should be judiciously commenting your code to explain its function. We will grade every problem by **quality of code**. Excessively long code or lack of comments will be subject to **point deduction**

Furthermore, you can write the code how you see fit. However, **do not change function names** and make sure to **include your imports** at the top of your .txt file submission.

---


### **Exercise 1.** Deck Setup (15 pts)

First, given the deck we setup, complete the following function `add_fragments()` which should take in the following parameters:

- `deck` - The already set up OT-2 deck from earlier in this workshop.

- `fragment_csv_path` - The path to the fragment data we were using earlier.

Using the techniques learned from the **Liquid handling** workshop about **setting initial liquids**, write a function that puts the correct volume of each fragment in its target well, solely determined by the data in the `fragments.csv` file.

In [None]:
def add_fragments(deck : Deck,
                  fragment_csv_path : str):
    ... # YOUR CODE HERE


# call the function
add_fragments(deck, csv_path)

Submit your code as `exercise_1.txt`. Make sure it is commented. Imports are not necessary to include here.

---


### **Exercise 2.** Work Order Parsing (25 pts)

We have now been provided a list of work orders of golden mutagensis targets in the `cloning.csv` file. Let's go ahead and take a look of some example rows of this dataframe. 

In [None]:
# set the path
work_order_csv_path = os.path.join(os.path.dirname(cwd), "workshop_4_data", "cloning.csv")

# read the dataframe to RAM
work_order_df = pd.read_csv(work_order_csv_path)

# print the first 5 rows of the data
work_order_df.head()

Great! This file contains two columns:

- `well_id` - The well ID on the `cloning_plate` of fragments to combine. **Note -** the well ID index now is of the form A01, which should translate to A1. You will have to find a way to translate these IDs. See hint below.

- `fragment_tuple` - A string representation of a tuple of form: `(ID_1,ID_2,ID_3)` where each ID corresponds to a DNA fragment.

For this exercise, you will need to write a **generator** called `extract_fragment_combinations()` which creates a for loop over the rows in the dataframe, and yields data in the form: `well_id, fragments` where:

- `well_id` is the **converted** id from the form "A01" to "A1" for all combinations of IDs.

- `fragments` is a **tuple** of the form **fragment_X, fragment_Y, fragment_Z**

**HOWEVER** Our list was assembled by a mad scientist! Therefore, some of the combinations point to IDs which **don't exist** in the fragments set at all. If that is the case, we should have this function return **None, None**

**2.A.** Let's break down this problem piece by piece. First, let's write a function to convert the **well ID** to the right format. This should simply input the `old_well_id` and convert it to a normalized ID. To do this:

- Index the first character of the `ID` as the **row_id** for a given number.

- For the rest of the string, convert it to an **interger** using the `int()` conversion function.

- Finally, using [string concatenation](https://www.w3schools.com/python/gloss_python_string_concatenation.asp) or f-strings, combine the row_id and integer number together and return it.

In [None]:
def convert_well_id(old_well_id: str):
    ... # YOUR CODE HERE

**2.B.** Great! Now we need a function called `get_well()` which can input:

- `fragment_id` - The identifier of the fragment we are checking is present.

- `fragment_df` - The dataframe of fragment IDs and locations, originally loaded as **df**

The function should use pandas filtering to:

- Filter down the dataframe to only rows where `fragment_id` contains the fragment ID being searched for.

- Impelment logic that checks if the length of the filtered dataframe is 0. If so, return `None, None`

- Otherwise, sample **only one row** and use the `.item()` function to get the specific cell entry for given row and return a tuple containing data of the form `well_id, plate_name`

In [None]:
def get_well(fragment_id: str,
             fragment_df: pd.DataFrame):
    ... # YOUR CODE HERE

**2.C.** Finally, implement the `extract_fragment_conmbinations` function, which will input the work-order and existing fragments CSV path and:

- Load each dataframe into memory using `pd.read_csv(...)`

- Iterate through the rows of the work order dataframe using `iterrows()`

- For each row, extract the `fragment_tuple` and implement logic to check that each fragment is valid:

    - For each fragment in the fragment tuple, call `get_well()`. If `None, None` is returned for any of them, then set a boolean flag named `all_valid` to False.

- If `all_valid` is true, **yield** data of the form `well_id, fragment_tuple`

In [None]:
def extract_fragment_combinations(cloning_csv_path: str | os.PathLike,
                                  fragment_csv_path: str | os.PathLike):

    ... # YOUR CODE HERE

    # iterate over work orders
    for _, row in ...:

        # boolean logic implementation
        all_fragments_valid = True 
        
        ... # YOUR CODE HERE

        # scaffold logic
        for ... in ...:
            if ...:
                all_fragments_valid = False
                break  

        # only if fragments are valid, then yield the relevant results.
        if all_fragments_valid:
            well_id = convert_well_id(...)
            yield well_id, fragment_tuple

Great! Submit your code as a file named `exercise_2.txt`. Make sure it is commented. Imports are not necessary here.

---

### **Exercise 3.** Modular Cloning (40 pts)

Now that we have the deck setup and a way to iterate over our work-order data, we should now design a liquid handling protocol that will achieve the following:

- Iterate over the work order list to get the well for cloning and the triplet of clones to pipette together.

- For each well in this data, do the following:

    - For each DNA fragment in the fragments to combine, find the well and plate it is located in from the `fragments.csv`.

    - Using this information, pipette **10 uL of each fragment** to the target well in the work order list. **You should use a separate tip for each fragment.**

    - Finally, **mix** the **30 uL** contents in the target well together and move on to the next well

Since you have now gained experience in writing liquid handling protocols, we will not provide any **strict guidelines on helper functions** or structure of the protocol for this exercise. 

However, do not that your code will still be graded for its **efficiency** and **quality of comments** At a minimum, you should aim to have (and we will look for):

- A function to perform a mixing operation.

- A function to perform one entire pipetting step given a well ID and a triplet to combine.

- A function to find the appropriate well given a fragment ID.

For each function you write, make sure to include **robust comments** as to how it works. It may be helpful to look back at **workshop 2** for inspiration of function structure.

Your final protocol should run through the `run_protocol_exercise_3()` function.

Comment your code well and include all imports **not included at the top of the notebook** that you may need. 

**Include a description of each helper function in the commented section below.**

In [None]:
import ... from ... # add any imports needed

# --- HELPER FUNCTIONS ---

# Function N Code:
# Function N Description: 

# Function N Code:
# Function N Description: 

# Function N Code:
# Function N Description: 


async def run_protocol_exercise_3(deck: Deck,
                                  lh: LiquidHandler,
                                  fragment_df_path: str | os.PathLike):

    # for each well fragment work order, if the fragments returned are available, then run the code
    for well_id, fragments in extract_fragment_combinations(work_order_csv_path, csv_path):
        if fragments is not None:
            ...
        # otherwise, one of the fragments is misssing, so continue on
        else:
            continue

# setup the deck
deck = await make_golden_gate_ot2()
lh = await visualize_deck(deck, LiquidHandlerChatterboxBackend())

# add fragments
add_fragments(deck, csv_path)

# run protocol
await run_protocol_exercise_3(deck, lh, csv_path)

Submit a GIF of your protocol running as `exercise_3.gif` and a code file of your code **and function explanations** as `exercise_3.txt`

---

### **Exercise 4.** State Verification (15 pts)

Now that you have completed your experiment, you should utilize liquid tracking to write a function `verify_results()` which is able to iterate over ll the original work lists and verify, for each order, whether or not it was completed successfully.

To do this, simpoly:

- Iterate over the work list for each well.

- Determine, for that well, if it contains all 3 of the requisite liquids by checking against the provided tuple.

- If it was **not successful** (i.e. skipped because one of the clones was missing), then add the well_id to the `well_ids` list and add each missing fragment id to the growing list of `x_fragment`, `y_fragment`, and `x_fragment`.

- The function provided will save your results to an output `exercise_4.csv` file. Submit this along with the code for exercise 4.

In [None]:
def verify_results(cloning_csv_path : str,
                   deck: Deck):

    ... # YOUR CODE HERE

    # lists to store empty wells and fragment tuples for saving as a DataFrame
    empty_wells = []
    fragment_x_list = []
    fragment_y_list = []
    fragment_z_list = []

    # for each work order
    for ... in ...:

        liquids = ...

        if len(liquids) == 0:
            empty_wells.append(...)
            fragment_x_list.append(...)
            fragment_y_list.append(...)
            fragment_z_list.append(...)
        
        # otherwise, assert that all the liquids match the appropriate names
        else:
            for ... in ...:
                assert fragment == liquid[0], "liquid did not match!"
    
    # convert results to a dataframe
    result = pd.DataFrame(
        {
            "well_id" : empty_wells,
            "fragment_x" : fragment_x_list,
            "fragment_y" : fragment_y_list,
            "fragment_z" : fragment_z_list,
        }
    )

    return result

# save data to file
save_path = os.path.join(os.path.dirname(cwd), "workshop_4_data", "exercise_4.csv")
result_df = verify_results(work_order_csv_path, deck)
result_df.to_csv(save_path, index = False)

Submit the resulting data file as `exercise_4.csv` and the code as `exercise_4.txt`.


---

#### Conclusion

That's all for workshop 4! Double check that you have submitted a `.txt` file for every problem, a `.gif` file for problem 3 and a `.csv` file for exercise 4. You should have submitted:

- `exercise_1.txt` with the code for exercise 1.

- `exercise_2.txt` with the code for exercise 2.

- `exercise_2.txt` with the code for exercise 3.

- `exercise_1.gif` showing the full protocol in exercise 3, starting with the set up OT-2 deck with DNA clones.

- `exercise_4.txt` with the code for exercise 4.

- `exercise_4.csv` with the resulting data from running your code in exercise 4.

**EVERY CODE BLOCK SHOULD HAVE WELL-WRITTEN COMMENTS**

If you are still feeling unsure on deck setup, please **reach out to the teaching team**, contact info for whom can be found in the .`README.md` file on the [class GitHub](https://github.com/chory-lab/bme590-fall-2025)

---