# Your Chemistry, Your Data, Your Insights

# Jupyter Setup

Assuming we are in the "dm_public/01_Setup_Intro/notebooks" directory, let's change to the base directory to keep consistency between paths in Jupyter and Python at the command line. If you did not launch Jupyter from the dm_public directory, do not execute this cell. And be sure to only run it once!

In [1]:
%cd ../..

C:\Users\David Pattison\Code\dm_public


# Reading PCML
Specify the PCML file we will work with in this notebook, and read it from disk into a Python object.

In [2]:
from lxml import etree

pcml_recipe_file = './01_Setup_Intro/data/3a_recipe.pcml'

pcml_obj = etree.parse(pcml_recipe_file)


# Extracting Recipe Content

In [3]:
#list all the chemicals used
chem_elem = pcml_obj.find(".//chemicals")
for c in chem_elem:
    print("Chemical: {}".format(c[0].text))
    

Chemical: 1-Naphthoyl chloride
Chemical: N-(4-methylbenzenesulfonyl)naphthalene-1-carbohydrazide
Chemical: 4-dimethylaminopyridine
Chemical: p-toluenesulfonyl hydrazide
Chemical: dichloromethane
Chemical: triethylamine
Chemical: saturated NH₄Cl solution
Chemical: water
Chemical: 10% aqueous citric acid solution
Chemical: saturated sodium chloride solution
Chemical: sodium sulfate
Chemical: dichloromethane
Chemical: hexane
Chemical: 1,3,5-trimethoxybenzene


In [4]:
#search for specific safety code
import itertools

code_to_search = "H318"
has_code = len(pcml_obj.xpath('.//safetycode/code[text()="{}"]'.format(code_to_search))) > 0
print("{} {} code associated with recipe chemicals".format("Found" if has_code else "Did not find", code_to_search))


Found H318 code associated with recipe chemicals


In [5]:
#List off safety codes
safety_elem = pcml_obj.findall(".//safetycode/code")
all_s_codes = [s.text for s in safety_elem]

uniq_s_codes = set(itertools.chain.from_iterable([x.split(" + ") for x in all_s_codes]))
print("Found the following unique safety codes:", sorted(uniq_s_codes))

Found the following unique safety codes: ['H-N/A', 'H-Unknown', 'H225', 'H242', 'H301', 'H302', 'H304', 'H310', 'H311', 'H314', 'H315', 'H318', 'H319', 'H331', 'H335', 'H336', 'H351', 'H361d', 'H373', 'H411', 'H412', 'P-Unknown', 'P201', 'P210', 'P261', 'P264', 'P273', 'P280', 'P301', 'P302', 'P303', 'P304', 'P305', 'P308', 'P310', 'P312', 'P313', 'P330', 'P331', 'P337', 'P338', 'P340', 'P351', 'P352', 'P353', 'P361', 'P370', 'P378', 'R-N/A', 'R-Unknown', 'S-N/A', 'S-Unknown']


In [6]:
#extract and count roles of chemicals
from collections import Counter
import pprint

role_elems = pcml_obj.xpath('.//chemicals/chemical')
role_counts = Counter([r.get("role", None) for r in role_elems])

pp = pprint.PrettyPrinter()
pp.pprint(role_counts)

Counter({'reagent': 4,
         'solvent': 3,
         'washing-solution': 3,
         'starting-material': 1,
         'product': 1,
         'quenching-solution': 1,
         'drying-agent': 1})


In [28]:
#get operation groupings by step
from collections import defaultdict

step_ops = defaultdict(list)
op_elems = pcml_obj.xpath('/pcml/step/group/operation')      
for oe in op_elems:
    step_ops[oe.getparent().getparent().get("type")].append(oe)

            
for step, ops in step_ops.items():
    print("{} has {} operations".format(step, len(ops)))
    

synthesis has 30 operations
isolation has 20 operations
purification has 14 operations
analysis has 14 operations
characterisation has 6 operations


In [25]:
for i, op in enumerate(step_ops.get("synthesis"), 1):
    print("Operation {}: {}".format(i, op.find("text").text))

Operation 1: The NCU should be powered and connected to the internet.
Operation 2: All sensors used (i.e DeviceX, ESP, IKA hot plate) should be on.
Operation 3: Pick up a clean, oven-dried, 1 L three-neck round bottom flask
Operation 4: Secure the flask with a clamp on top of a stirring mantle.
Operation 5: Add a 3 cm rod-shaped, Teflon coated, magnetic stirrer bar to the reaction flask.
Operation 6: Attach a 50 mL dropping funnel to the reaction flask.
Operation 7: Place DeviceX into one of the flask's side necks.
Operation 8: Place a UV torch in a way that it shines on to the UV sensor.
Operation 9: Turn the UV torch on.
Operation 10: Stopper any left over open side necks with a rubber septum.
Operation 11: Evacuate and refill the flask with argon twice.
Operation 12: Weigh 1.83 g of 4-dimethylaminopyridine.
Operation 13: Transfer it into the reaction flask.
Operation 14: Weigh 14.0 g of p-toluenesulfonyl hydrazide.
Operation 15: Transfer it into the reaction flask.
Operation 16: Mea