# Your Chemistry, Your Data, Your Insights

# Jupyter Setup

Assuming we are in the "dm_public/01_Setup_Intro/notebooks" directory, let's change to the base directory to keep consistency between paths in Jupyter and Python at the command line. If you did not launch Jupyter from the dm_public directory, do not execute this cell. And be sure to only run it once!

In [None]:
%cd ../..

# Reading PCML
Specify the PCML file we will work with in this notebook, and read it from disk into a Python object.

In [None]:
from lxml import etree

pcml_recipe_file = './01_Setup_Intro/data/3a_recipe.pcml'

pcml_obj = etree.parse(pcml_recipe_file)


# Extracting Recipe Content

In [None]:
#list all the chemicals used
chem_elem = pcml_obj.find(".//chemicals")
for c in chem_elem:
    print("Chemical: {}".format(c[0].text))
    

In [None]:
#search for specific safety code
import itertools

code_to_search = "H318"
has_code = len(pcml_obj.xpath('.//safetycode/code[text()="{}"]'.format(code_to_search))) > 0
print("{} {} code associated with recipe chemicals".format("Found" if has_code else "Did not find", code_to_search))


In [None]:
#List off safety codes
safety_elem = pcml_obj.findall(".//safetycode/code")
all_s_codes = [s.text for s in safety_elem]

uniq_s_codes = set(itertools.chain.from_iterable([x.split(" + ") for x in all_s_codes]))
print("Found the following unique safety codes:", sorted(uniq_s_codes))

In [None]:
#extract and count roles of chemicals
from collections import Counter
import pprint

role_elems = pcml_obj.xpath('.//chemicals/chemical')
role_counts = Counter([r.get("role", None) for r in role_elems])

pp = pprint.PrettyPrinter()
pp.pprint(role_counts)

In [None]:
#get operation groupings by step
from collections import defaultdict

step_ops = defaultdict(list)
op_elems = pcml_obj.xpath('/pcml/step/group/operation')      
for oe in op_elems:
    step_ops[oe.getparent().getparent().get("type")].append(oe)

            
for step, ops in step_ops.items():
    print("{} has {} operations".format(step, len(ops)))
    

In [None]:
for i, op in enumerate(step_ops.get("synthesis"), 1):
    print("Operation {}: {}".format(i, op.find("text").text))