# Welcome to a crash-course in pyOpenMS
You will learn the basics of accessing mass-spectrometry related fileformats in python and store your results back.
You can use this notebook for example with the container `docker://mwalzer/openms-pyopenms:V2.4.0_Ubuntu1804_py3_AcquisitionInfoFix` (*does not include the jupyter server*)

## Setting the scene
We'll need to import pyOpenMS into python so we can load a peaklist file of type mzML into memory.

In [5]:
import pyopenms as oms
exp = oms.MSExperiment()
wf_url = "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2019/10/PXD012983/TOR_BR4_re_MultiCon.mzML"
fp = "/tmp/wf.mzML"

import urllib.request
with urllib.request.urlopen(wf_url) as filedata:
    datatowrite = filedata.read()
with open(fp, 'wb') as f:
    f.write(datatowrite)

oms.MzMLFile().load(fp,exp)

The data is organised in an MSExperiment object, which can be accessed as a read-only list of MSSpectra. However, they are not python lists but cython vectors, which means looping is a bit more elaborate. (*Also, you can not persitantly write into objects you retrieve that way, only `push_back` new items or swap the whole vector*). The spectra will be mixed MS1 and MS2 spectra. You can keep them apart by inspecting their MSLevel attribute by calling `getMSLevel`. There is more information to be had, e.g. with `getNativeID`. Important is that almost all strings you retrieve from cython objects will be binary encoded, which means you have to decode them first.

In [25]:
for i in range(0,3):
    print("Spectrum {i}: level={l}, nativeID={n}".format(i=i,l=exp[i].getMSLevel(),n=exp[i].getNativeID().decode()))

Spectrum 0: level=2, nativeID=scan=3066 file=99
Spectrum 1: level=2, nativeID=scan=3067 file=99
Spectrum 2: level=2, nativeID=scan=3078 file=99


To get the other way of identifing the spectra, you need to retrieve the RT,MZ, and charge coordinates. RT is pretty straight forward (`getRT`), but the others are to be found in the spectrums precursor objects. *Remember the way MS/MS acquisition works, spectra are recorded by collecting ions, i.e. precursors, of a certain mass, then fragmented to yield the typical peaks.* Since there are techniques that involve more than one precorsor, these are represented as another list of objects. We'll just look at the first one. (*Potential test scenario?*)

In [26]:
print("RT:{},MZ:{},z:{}".format(exp[0].getRT(),exp[0].getPrecursors()[0].getMZ(),exp[0].getPrecursors()[0].getCharge()))

RT:1227.08442,MZ:1122.16284,z:2


We should also keep in mind that matching with these coordinates is not precise, as the rounding behaviour of decimals might be different between the original conversion software, the identification software, and pyOpenMS. So we need to prepare for situations, where there are coordinate clashes (i.e. more than one spectrum matching a certain set of coordinates)! Also, sometimes the charge information is not reliable present. 

## Easy data handover with JSON
Next, we might want to store the information we retrieved, favourably in JSON. For this, it is advisable to store any information in a python dictionary. These can be automatically (de-)serialised with the python json library as long as the keys and values are built-in python data types (with some exceptions like datetime). (What to do, if the data is not immediately serialisable?)

In [21]:
res = dict()
for i in range(0,3):
    res[exp[i].getNativeID().decode()]= {\
        'level':exp[i].getMSLevel(), 
        'RT': exp[i].getRT(), 
        'MZ': exp[i].getPrecursors()[0].getMZ(),
        'c': exp[i].getPrecursors()[0].getCharge()
    }

__N.B.: remember to check that data you want to access is present and plan for the right exceptions!__

In [24]:
import json
with open('/tmp/test.json', 'w') as f:
    json.dump(res,f)

import pprint
pp = pprint.PrettyPrinter(indent=4)
with open('/tmp/test.json', 'r') as f:
    pp.pprint(json.load(f))

{   'scan=3066 file=99': {   'MZ': 1122.16284,
                             'RT': 1227.08442,
                             'c': 2,
                             'level': 2},
    'scan=3067 file=99': {   'MZ': 1114.16504,
                             'RT': 1227.2184,
                             'c': 2,
                             'level': 2},
    'scan=3078 file=99': {   'MZ': 1488.22241,
                             'RT': 1228.81326,
                             'c': 2,
                             'level': 2}}
