# Writing Data

In this section, we will show how to add data to an empty database and consider only the "experimentalist" side of the schema. Our goal is to add a row to the table `experiment` with its results in the `xy_data`. We need to fill rows in the following tables: `experiment_machine`, `experiment_type`, `lab`, `synthesis`, `synthesis_machine` and `data_units` to satisfy all the relationship contraints.

The first step is to connect to the database:

In [1]:
import mdb

client = mdb.MDBClient(hostname='localhost',
                       username='postgres',
                       password='',
                       database='mdb')

Then, let's add a lab. A lab has a name and a short_name that have to be unique. The short_name is just 3 letters that are here for convenience when it comes to select your lab.

In [2]:
rec = client.add_lab(name='Mad Lab', short_name='MAD')

All the `add_*` methods return their corresponding row in the [eventstore](05_event_sourcing.ipynb). This row contains all the information regarding the edition of the data in the database:

In [3]:
print(f'event:    {rec.event}')
print(f'type:     {rec.type}')
print(f'uuid:     {rec.uuid}')
print(f'data:     {rec.data}')
print(f'timestamp {rec.timestamp}')

event:    create
type:     lab
uuid:     5541b619-c181-4647-98b6-79516ee24800
data:     {'name': 'Mad Lab', 'short_name': 'MAD'}
timestamp 2020-04-01 09:20:25.145175


Returning this information has the advantage that we do not need to query the database again to know what is the `lab_id` of the lab that was just added. Indeed `rec.uuid` contains this information. Thus we can go on and add a `synthesis_machine`.

In [4]:
rec = client.add_synthesis_machine(name='Mad Machine Doing Chemistry',
                                   make='MadChem',
                                   model='v2048',
                                   metadata={'details': 'somes mad details'},
                                   lab_id=rec.uuid)

Note a `synthesis_machine` could also list a human chemist. In order to add a `synthesis`, it is also needed to add some molecules:

In [5]:
event = client.add_molecule_type('fragment')
client.add_molecule(smiles='A', molecule_type_id=event.uuid)
client.add_molecule(smiles='B', molecule_type_id=event.uuid)
client.add_molecule(smiles='C', molecule_type_id=event.uuid)

event = client.add_molecule_type('new_molecule')
rec = client.add_molecule(smiles='A-D-C', 
                          molecule_type_id=event.uuid,
                          reactant_id=[client.get_id('molecule', smiles='A'),
                                        client.get_id('molecule', smiles='B'),
                                        client.get_id('molecule', smiles='C')])

This section of code can be broken down as follow:

 * line 1 adds the molecule type `fragment`
 * line 2-4 adds molecule A, B and C
 * line 6 adds the molecule type `new_molecule`
 * line 7-11 adds a new molecule 'A-D-C' constitued from reactants A, B and C
 
The attentive reader will have noticed a small mistake: 'A-D-C' should be 'A-B-C'. The following chunk of code shows how to fix such mistake.

In [6]:
df = client.get('molecule')
df

Unnamed: 0,molecule_id,inchi,cid,molecule_type_id,created_on,metadata,iupac_name,cas,smiles,updated_on
0,6da175a0-af06-42b4-95bf-dfb59d87d16e,,,47313a9d-1312-4c22-b57c-38b8bf8dd978,2020-04-01 09:21:14.152737,{},,,A-UGLY-TYPO-C,2020-04-01 09:21:14.152737
1,5762121f-fd3d-4766-bdfd-4396079bb4e6,InChI=1S/CH4/h1H4,297.0,4dcfb667-5c9c-4d4f-a123-97cd8bee0513,2020-04-01 09:20:57.534246,{},methane,74-82-8,C,2020-04-01 09:20:57.534246
2,1338c13b-8ec0-455d-806e-549256c5e2b4,InChI=1S/BH3/h1H3,6331.0,4dcfb667-5c9c-4d4f-a123-97cd8bee0513,2020-04-01 09:20:41.715173,{},borane,13283-31-3,B,2020-04-01 09:20:41.715173
3,0b0a88df-4d69-4eea-88a9-8c3bceeeeba0,,,4dcfb667-5c9c-4d4f-a123-97cd8bee0513,2020-04-01 09:20:25.475240,{},,,A,2020-04-01 09:20:25.475240


The `get` function of the client returns by default a `pandas.DataFrame` which is more convenient to work with. The typo can be fixed directly within this DataFrame and its content can be updated as such:

In [8]:
df.at[0, 'smiles'] = 'ABC'
rec = client.update('molecule', df)

print(f'event:    {rec[0].event}')
print(f'type:     {rec[0].type}')
print(f'uuid:     {rec[0].uuid}')
print(f'data:     {rec[0].data}')
print(f'timestamp {rec[0].timestamp}')

4it [00:00, 34.26it/s]


event:    update
type:     molecule
uuid:     6da175a0-af06-42b4-95bf-dfb59d87d16e
data:     {'cas': None, 'cid': None, 'inchi': None, 'smiles': 'ABC', 'metadata': {}, 'iupac_name': None, 'molecule_type_id': '47313a9d-1312-4c22-b57c-38b8bf8dd978'}
timestamp 2020-04-01 09:21:16.182975


Similarly to the `add_*` methods, the `update` method returns a list of records with the data of the change. Note the event is now `update` as we are just updating an entry and and not creating one.

The `update` method can accept dictionary, list of dictinary and pandas.DataFrames as second argument. A synthesis can be added in a similar fashion.

In [9]:
rec = client.add_synthesis(synthesis_machine_id=client.get_id('synthesis_machine', 
                                                              name='Mad Machine Doing Chemistry'),
                           targeted_molecule_id=client.get_id('molecule', smiles='ABC'),
                           xdl='<xdl><recipe></recipe></xdl>',
                           notes='')

The `synthesis` has a special column `hid` (human-readable id) that is meant for chemist to label their samples. This `hid` is automatically generated using the `short_name` of the lab, the current date and a number.

In [10]:
rec.data['hid']

'MAD_2020-04-01_0'

Eventually, a chemist will have some observations during the synthesis. These observations can be saved in the database as well using the `notes` field.

In [11]:
notes = """
# Synthesis of ABC

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
"""

# This is another way of using the update method:
client.update('synthesis', {'notes': notes}, id=rec.uuid)

1it [00:00, 16.56it/s]


[<sqlalchemy.ext.automap.eventstore at 0x11406d0b8>]

Here-above, markdown was used to format the notes. This markdown has the nice feature that it can be rendered easily.

In [12]:
from IPython.display import HTML
from markdown import markdown

synthesis = client.get('synthesis', filters=[client.models.synthesis.synthesis_id == rec.uuid])

HTML(markdown(synthesis.at[0, 'notes']))

Now that a `synthesis` has been entered in the database, it is possible to add an experiment. However, an experiment will need to have a link to an `experiment_machine` and an `experiment_type`.

In [13]:
# getting the lab_id
lab_id = client.get_id('lab', short_name='MAD')

# Adding an experiment type
exp_type = client.add_experiment_type('NMR')

# Adding a machine
exp_machine = client.add_experiment_machine(name='Mad NMR', 
                                            make='Super NMR',
                                            model='V3000',
                                            metadata={},
                                            experiment_type_id=exp_type.uuid,
                                            lab_id=lab_id)

# Adding an experiment
experiment = client.add_experiment(synthesis_id=synthesis.at[0, 'synthesis_id'],
                                   experiment_machine_id=exp_machine.uuid,
                                   raw_data_path='C:\On\The\Computer\The\Machine\Is\Connected\To.raw',
                                   metadata={'param1': 100, 'knob 22': 'on'},
                                   notes='')


Let's add a dummy `data_type` and the `xy_data`. You might have data that is not (x, y) data, but maybe (x, y1, y2) then you can just add (x, y1) and (x, y2) as two separate entries in the database.

In [15]:
import numpy as np

# Generating some fake data
x = np.linspace(0, 10, num=100)
y = np.cos(x)

units = client.add_data_unit('dummy_units')

client.add_xy_data_experiment(experiment_id=experiment.uuid,
                              name='NMR v1',
                              x=x.tolist(),
                              y=y.tolist(),
                              x_units_id=units.uuid,
                              y_units_id=units.uuid)

<sqlalchemy.ext.automap.eventstore at 0x1142ee4e0>