# Question 1(a)
 
1. PyPlate is a [Python package](https://pyplate-hte.readthedocs.io/en/working/index.html) for designing high-throughput chemistry and biology experiments. Suppose that you need to screen conditions for 12 cross-coupling reactions of the form:

$A_i + B_i → C_i$

where $A_i$ and $B_i$ are starting materials, $C_i$ is a product, and i runs from 1…12. For each reaction, let $A_i$ be the limiting reagent (0.1 mmol), add 1.1 equivalents of $B_i$, 10 mol% $Pd(OAc)_2$, and 15 mol% of ligand (see below). (Equivalents are relative to limiting reagent.)

(a) We’d like to screen a common set of 2 temperatures (60 ℃ and 80℃), 4 solvents (toluene, glyme, TBME, and dichloroethane), and 3 ligands (XPhos, SPhos, and dppf). Please write a PyPlate Recipe that implements the above experimental design. Use a total reaction volume of 200 uL and 96 well plates with a maximum volume of 500 uL. Use a random number generator with a fixed seed to set the molecular weights of $A_i$ and $B_i$ (set between 100 and 500 g/mol). Use the real molecular weights of everything else.

Of course, multiple experimental designs are possible here. Can you design a Recipe that is not only easy to code but also practical to carry out in the lab? What sort of practical considerations are there? Please provide your answer as a clearly documented Jupyter notebook. You should graphically illustrate your design. These diagrams should _not_ show the details of precisely what is in each well, but rather explain the concept behind the design.

Note that PyPlate does not currently have a feature to specify temperatures. You will have to keep track of that manually. As you go through this exercise, please read through the PyPlate documentation. If you notice anything that could be improved (and there a lot of possibilities), please email me. I will award significant extra credit for thoughtful, chemically sensible and computationally reasonable proposals along these lines.

# Solution



# Terminology

I mostly follow the terminology used in the challenge description. But I've also added some terms for clarity. You'll see this consistently both in the code and the documentation.

- Substrate: The starting material in a chemical reaction, $A_1, A_2, ..., A_{12}$ and $B_1, B_2, ..., B_{12}$.
- Catalyst: ($Pd(OAc)_2$)
- Ligand: (XPhos, SPhos, dppf)
- Solvent: (toluene, glyme, TBME, dichloroethane)

# Plate Design

> The plate is designed on following assumptions:

We'll ignore the possibility of cross-contamination between wells. And also ignore the possibility of evaporation of the solvent.

## Visualization

### Plate size (12 \* 8)

We need **two** 96-well plates for each temperature, only use **one and half**, for each temperature. The plate is named `plate#1` and `plate#2`. Each plate has 12 columns and 8 rows. Below is the sample layout of the plate.

| Plate#1 | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  |
| ------- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| A       |     |     |     |     |     |     |     |     |     |     |     |     |
| B       |     |     |     |     |     |     |     |     |     |     |     |     |
| ...     |     |     |     |     |     |     |     |     |     |     |     |     |
| H       |     |     |     |     |     |     |     |     |     |     |     |     |

plate#2:

| Plate#2 | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  |
| ------- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| A       |     |     |     |     |     |     |     |     |     |     |     |     |
| B       |     |     |     |     |     |     |     |     |     |     |     |     |
| C       |     |     |     |     |     |     |     |     |     |     |     |     |
| D       |     |     |     |     |     |     |     |     |     |     |     |     |

### Catalyst $Pd(OAc)_2$:

all 96 wells.

### Substrate `A` & `B`:

> Ideally, we want to group the reagents in a way that both easy to pipette and reduce mental load when setting up the reaction. We have 12 pairs of reagents, 3 ligands, and 4 solvents. So I use the number of columns to group the substrates.

$A_1$ and $B_1$ to both plate `column 1` (wells `A1`-`H1`), add $A_2$ and $B_2$ to `column 2` (`A2`-`H2`), etc. till $A_{12}$ and $B_{12}$ to `column 12` (`A12`-`H12`).

|     | 1          | 2          | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12               |
| --- | ---------- | ---------- | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---------------- |
| A   | $A_1, B_1$ | $A_2, B_2$ | ... |     |     |     |     |     |     |     |     | $A_{12}, B_{12}$ |
| B   | $A_1, B_1$ | ↓          |     |     |     |     |     |     |     |     |     | ...              |
| ... | ...        |            |     |     |     |     |     |     |     |     |     |                  |
| H   | $A_1, B_1$ | $A_2, B_2$ |     |     |     |     |     |     |     |     |     |                  |

### Ligand `XPhos`, `SPhos`, `dppf`:

> Since we'll be using one and half plates for each temperature, I've divided the ligands and solvents in a way that they can be easily pipetted without switching the plates both practically and physically.

Add XPhos to `row A-D`, SPhos to `row E-H`, dppf to `plate#2: row A-D`.

|     | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  |
| --- | ----- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| A   | XPhos | →   |     |     |     |     |     |     |     |     |     |     |
| B   | ↓ →   |     |     |     |     |     |     |     |     |     |     |     |
| C   | ↓ →   |     |     |     |     |     |     |     |     |     |     |     |
| D   | →     |     |     |     |     |     |     |     |     |     |     |     |
| E   | SPhos | →   |     |     |     |     |     |     |     |     |     |     |
| F   | ↓ →   |     |     |     |     |     |     |     |     |     |     |     |
| ... | ...   |     |     |     |     |     |     |     |     |     |     |     |

### Solvent `toluene`, `glyme`, `TBME`, `dichloroethane`:
> And Finally, we've the solvents. This is only logical option after we've decided the layout of the ligands and substrates.

Add toluene to `row A`, glyme to `row B`, TBME to `row C`, dichloroethane to `row D`, and toluene **again** to `row E`. repeat the pattern for the rest of rows and `plate#2`.

|     | 1              | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  |
| --- | -------------- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| A   | toluene        | →   |     |     |     |     |     |     |     |     |     |     |
| B   | glyme          | →   |     |     |     |     |     |     |     |     |     |     |
| C   | TBME           | →   |     |     |     |     |     |     |     |     |     |     |
| D   | dichloroethane | →   |     |     |     |     |     |     |     |     |     |     |
| E   | **toluene**    | →   |     |     |     |     |     |     |     |     |     |     |
| ... | ...            |     |     |     |     |     |     |     |     |     |     |     |


#


# Setup chemicals properties

> Assume all substance are in solid form, except solvent.


In [1]:
from pyplate import Substance, Container, Plate, Recipe
import pandas as pd
import random

random.seed(0)

## create array of {a_1, b_1} to {a_12, b_12} (12), with random molecuLar weights between 100 and 500
substrates = [
    {
        "a": Substance.solid(name=f"a_{i}",
                             mol_weight=random.uniform(100, 500)),
        "b": Substance.solid(name=f"b_{i}",
                             mol_weight=random.uniform(100, 500)),
    }
    for i in range(1, 13)
]

## ligands
# XPhos
ligand_XPhos = Substance.solid(
    name="XPhos",
    mol_weight=476.72,
)

# SPhos
ligand_SPhos = Substance.solid(
    name="SPhos",
    mol_weight=410.53,
)

# dppf
ligand_dppf = Substance.solid(
    name="dppf",
    mol_weight=554.391,
)

## catalyst
catalyst = Substance.solid(name="Pd(OAc)2", mol_weight=224.51)
# solvent
# toluene - probably the only one with real data. @link https://webbook.nist.gov/cgi/fluid.cgi?P=1&TLow=60&THigh=80&TInc=10&Digits=5&ID=C108883&Action=Load&Type=IsoBar&TUnit=C&PUnit=atm&DUnit=g%2Fml&HUnit=kJ%2Fmol&WUnit=m%2Fs&VisUnit=uPa*s&STUnit=N%2Fm&RefState=DEF
toluene_60 = Substance.liquid(name="toluene_60", mol_weight=92.141,
                              density=0.82923)
toluene_80 = Substance.liquid(name="toluene_80", mol_weight=92.141,
                              density=0.80986)
# glyme - below are educational guess. standard state density 0.8683 g/mL
glyme_60 = Substance.liquid(name="glyme_60", mol_weight=90.122, density=0.8683)
glyme_80 = Substance.liquid(name="glyme_80", mol_weight=90.122, density=0.8464)
# TBME - below are bullshit data. bp is 56C, good luck running it at 80C. standard state density 0.7404 g/mL
tbme_60 = Substance.liquid(name="TBME_60", mol_weight=88.15, density=0.7404)
tbme_80 = Substance.liquid(name="TBME_80", mol_weight=88.15, density=0.7208)
# dichloroethane - below are also educational guess. bp is 834C. standard state density 1.253 g/mL
dichloroethane_60 = Substance.liquid(
    name="dichloroethane_60", mol_weight=98.959, density=1.253
)
dichloroethane_80 = Substance.liquid(
    name="dichloroethane_80", mol_weight=98.959, density=1.229
)

## Packing

ligands = [ligand_XPhos, ligand_SPhos, ligand_dppf]
solvents_60 = [toluene_60, glyme_60, tbme_60, dichloroethane_60]
solvents_80 = [toluene_80, glyme_80, tbme_80, dichloroethane_80]

print("substrates", substrates)
print("ligands", ligands)
print("solvents_60", solvents_60)
print("solvents_80", solvents_80)
print("catalyst", catalyst)

substrates [{'a': a_1 (SOLID), 'b': b_1 (SOLID)}, {'a': a_2 (SOLID), 'b': b_2 (SOLID)}, {'a': a_3 (SOLID), 'b': b_3 (SOLID)}, {'a': a_4 (SOLID), 'b': b_4 (SOLID)}, {'a': a_5 (SOLID), 'b': b_5 (SOLID)}, {'a': a_6 (SOLID), 'b': b_6 (SOLID)}, {'a': a_7 (SOLID), 'b': b_7 (SOLID)}, {'a': a_8 (SOLID), 'b': b_8 (SOLID)}, {'a': a_9 (SOLID), 'b': b_9 (SOLID)}, {'a': a_10 (SOLID), 'b': b_10 (SOLID)}, {'a': a_11 (SOLID), 'b': b_11 (SOLID)}, {'a': a_12 (SOLID), 'b': b_12 (SOLID)}]
ligands [XPhos (SOLID), SPhos (SOLID), dppf (SOLID)]
solvents_60 [toluene_60 (LIQUID), glyme_60 (LIQUID), TBME_60 (LIQUID), dichloroethane_60 (LIQUID)]
solvents_80 [toluene_80 (LIQUID), glyme_80 (LIQUID), TBME_80 (LIQUID), dichloroethane_80 (LIQUID)]
catalyst Pd(OAc)2 (SOLID)


## At 60°C


In [2]:
stock_volume = 1  # mL
stock_concentration = 0.5  # mmol/1 mL
str_stock_quantity = f"{stock_volume} mL"
str_stock_concentration = f"{stock_concentration} mmol/mL"
print("str_stock_quantity", str_stock_quantity)
print("stock_concentration", str_stock_concentration)

DEBUG = False

quantity = {  # all unit in mmol
    "a": 0.1,
    "b": 0.11,
    "catalyst": 0.01,
    "ligand": 0.015
}

plate1 = Plate("plate1", max_volume_per_well="500 uL")
plate2 = Plate("plate2", max_volume_per_well="500 uL")

recipe = Recipe()
recipe.uses(plate1, plate2)

# create a pandas 12*8 of 1s using prime multiplication to 
df1 = pd.DataFrame(1, columns=range(1, 13), index=range(1, 9))
df2 = pd.DataFrame(1, columns=range(1, 13), index=range(1, 9))

for i, solvent in enumerate(solvents_60):

    if (DEBUG): df1.loc[i + 1::4] *= 2;  df2.loc[i + 1] *= 2; print(
        f"catalyst:{quantity['catalyst'] / stock_concentration} mL")

    catalyst_solution = recipe.create_solution(
        name=f"{catalyst} in {solvent.name}", solute=catalyst,
        concentration=str_stock_concentration, solvent=solvent,
        total_quantity=str_stock_quantity)

    # we only loop through the 4 solvent, and we can only cover all the plates role-wise. Otherwise, we'll contaminate the wells with the solvent in the catalyst solution.
    # regarding the math: 0.01 mmol per well = 0.0002 mL of 50 mmol/mL solution, so V = M/c
    recipe.transfer(source=catalyst_solution,
                    destination=plate1[i + 1::4],
                    quantity=f"{quantity['catalyst'] / stock_concentration} mL")
    recipe.transfer(source=catalyst_solution,
                    destination=plate2[i + 1],
                    quantity=f"{quantity['catalyst'] / stock_concentration} mL")
    ## we've to reduce concentration since we have two solute to disolve
    for j, substrate in enumerate(substrates):
        substrate_concentration = 0.8  # mmol/mL
        # create a substrate solution for each well, 0.1 mmol A per well = 0.01 mL of 10 mmol/mL solution, 0.11 mmol B per well = 0.011 mL of 10 mmol/mL solution
        substrate_solution = recipe.create_solution(
            name=f"{substrate["a"], substrate["b"]} in {solvent.name}",
            solute=[substrate["a"], substrate["b"]],
            concentration=[f"{substrate_concentration} mmol/mL",
                           f"{substrate_concentration * 1.1} mmol/mL"],
            solvent=solvent,
            total_quantity=str_stock_quantity,
        )

        if (DEBUG): df1.loc[i + 1::4, j + 1] *= 5; df2.loc[
            i + 1, j + 1] *= 5; print(
            f"substrate:{quantity['a'] / substrate_concentration} mL")

        # substrates are column-wise, while solvents are row-wise, thus they're like independent vector. So the nested for loop will iterate through all the wells.
        recipe.transfer(source=substrate_solution,
                        destination=plate1[i + 1::4, j + 1],
                        quantity=f"{quantity['a'] / substrate_concentration} mL")
        recipe.transfer(source=substrate_solution,
                        destination=plate2[i + 1, j + 1],
                        quantity=f"{quantity['a'] / substrate_concentration} mL")

# the ligands are on two plates, one plate at a time
# plate#1 with ligands XPhos and SPhos
for i, solvent in enumerate(solvents_60):
    for k, ligand in enumerate(ligands[:2]):
        ligand_solution = recipe.create_solution(
            name=f"{ligand} in {solvent.name}",
            solute=ligand,
            concentration=str_stock_concentration,
            solvent=solvent,
            total_quantity=str_stock_quantity,
        )
        if (DEBUG): df1.loc[4 * k + 1 + i] *= 3; print(
            f"ligand:{quantity['ligand'] / stock_concentration} mL")

        # 
        recipe.transfer(source=ligand_solution,
                        destination=plate1[4 * k + 1 + i, :],
                        quantity=f"{quantity['ligand'] / stock_concentration} mL")
# plate#2 with ligand dppf
for i, solvent in enumerate(solvents_60):
    for k, ligand in enumerate(ligands[2:]):
        ligand_solution = recipe.create_solution(
            name=f"{ligand} in {solvent.name}",
            solute=ligand,
            concentration=str_stock_concentration,
            solvent=solvent,
            total_quantity=str_stock_quantity,
        )
        if (DEBUG): df2.loc[4 * k + 1 + i] *= 3; print(
            f"ligand:{quantity['ligand'] / stock_concentration} mL")

        # 
        recipe.transfer(source=ligand_solution,
                        destination=plate2[4 * k + 1 + i, :],
                        quantity=f"{quantity['ligand'] / stock_concentration} mL")

if DEBUG: print(df1); print(df2);

## fill all well to 200 uL
for i, solvent in enumerate(solvents_60):
    recipe.fill_to(destination=plate1[i + 1::4], solvent=solvent,
                   quantity="200 uL")
    recipe.fill_to(destination=plate2[i + 1], solvent=solvent,
                   quantity="200 uL")
result = recipe.bake()

plate1 = result["plate1"]
plate2 = result["plate2"]

str_stock_quantity 1 mL
stock_concentration 0.5 mmol/mL


## Check answers

In [10]:
# get every thing
def print_mmol(substance=None):
    print("\n", substance)
    for plate in [plate1, plate2]:
        print(plate.get_volumes(substance=substance, unit="mmol"))


for e in ligands: print_mmol(e)
print_mmol(catalyst)
print_mmol(substrates[0]["a"])
print_mmol(substrates[0]["b"])


 XPhos (SOLID)
[[0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015]
 [0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015]
 [0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015]
 [0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015]
 [0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   ]]
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

 SPhos (SOLID)
[[0.    0.    0.    0.    0.    0.    0.    0.    0.  

## At 80°C 

It'll be the exact copy of the 60°C setup, except we'll use the 80°C solvent `sovlents_80`

# Question 1(b)

(b) As you can see, PyPlate currently allows absolute quantities like 1 mmol to be specified, but not _relative_ quantities like 1.1 equivalents of $B_i$. In fact, one could imagine that it would be ideal to specify something like "10 mol% ligand."

**Without writing any actual code**, please explain how you would modify PyPlate to incorporate this feature. Which parts of the API would have to change? How would you ensure that the relative quantities are logically and chemically reasonable? Can you write docstrings for the new or modified functions?


# complain:

cannot add solid to solution


2. Chromatography is frequently used to determine the outcome of experiments. However, most chromatography instrument manufacturers provide data in proprietary data formats. We’ve developed the [Rainbow](https://github.com/evanyeyeye/rainbow) package to unlock these files and we want to know whether you can extend Rainbow.

[Here](https://drive.google.com/drive/folders/1tyYTM94BdOkCkvZCJ4a1gT5CYYb-EKDj?usp=sharing) are three folders with artificially generated and encoded chromatography data:

(a) **pear** challenge (easy): time vs. intensity data

(b) **scale** challenge (intermediate): time vs. wavelength vs. absorbance data

(c) **sixtysix** (hard): time vs. mass vs. intensity data

In each folder, you will find a `sample/` subfolder and `problemX` subfolders (where X=1,2,3). The sample subfolder contains a matched binary/csv pair. You should examine this pair with a hex editor (or any other tool of your choice) to determine its binary organization. The rest of the folders contain only binaries. Your decoding script should run on these files. I will check that the csv output matches what is expected.

**Note:** your answers should not include any hard-coded magic numbers (other than the lengths of headers, chunks, footers, etc.)

**Please provide a concise and clear explanation for each file structure in markdown format.** What is the format of the header, data, and footer? I suggest writing a couple paragraphs to accompany a table like this:

| Location | Length (bytes) | Endianess | format | Value        |
| -------- | -------------- | --------- | ------ | ------------ |
| 0x180    | 4              | big       | uint   | time[0] (ms) |
| 0x184    | 4              | little    | uint   | intensity[0] |
| ...      |                |           |        |

Please document your code clearly with comments and docstrings. Please provide your answer as one `.py` file per problem (so, one for pear, one for scale, and one for sixtysix). Please provide the decoded `.csv` files so I can check them against the expected results. Place one decoded csv file per problem directory like this: `pear/problem1/pear.csv`.


In [None]:
import rainbow as rb

datedir = rb.read()