The material in this tutorial deals with questions and concerns users have had subsequent to publication of the isotomics paper. Currently, it addresses the following topics:

- generating explicit errors for each ion beam of a simulated dataset.

In [1]:
#global imports
import sys; sys.path.insert(0, '..')

from datetime import date

today = date.today()

import copy
import json

import numpy as np
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
import seaborn as sns

import basicDeltaOperations as op
import calcIsotopologues as ci
import fragmentAndSimulate as fas
import solveSystem as ss
import alanineTest
import spectrumVis
import readInput as ri

In [2]:
#Generating explicit errors for each ion beam of a dataset
'''
In Tutorial Part 3, we specified that the error for each observed ion beam of our simulation had the same (relative) size, for example, 1 per mil. If a 13C ion beam was observed with this error, then a 15N and 2H ion beam would also have the same error. 

This scenario is unrealistic, so we include a way to model error bars more explicitly. To do so, set a dictionary giving the explicit error bars (as in 'explicit errors', below). These are all in relative terms; so in the below dictionary, the full ion beam, 'D' substitution has an error of 5 per mil, while the '15N' substitution has an error of 2 per mil.

Then, include this 'explicitErrors' dictionary as the input to the readComputedData function, below. 

All of the other code is as written in Tutorial part 3, and included here so this solution stands alone and is easy to replicate. 
'''

#Set dictionary of explicit
explicitErrors = {'M1':{'full':{'D':0.005,'15N':0.002,'13C':0.001,'17O':0.010},
                        '44':{'Unsub':0.001,'D':0.005,'15N':0.002,'13C':0.001}}}
###GENERATE SAMPLE DATA
deltasSmp = [-40,-20,0,-10,25,40]
fragSubset = ['full', '44']
df, expandedFrags, fragSubgeometryKeys, fragmentationDictionary = alanineTest.initializeAlanine(deltasSmp, fragSubset)
unresolvedDict = {}
calcFF = True
forbiddenPeaks = {'M1':{'full':['17O','D'],'44':['D']}}
UValueList = ['13C']
unresolvedDict = {'M1':{'full':{'17O':'13C'}}}
predictedMeasurement, MNDict, fractionationFactors = alanineTest.simulateMeasurement(df, fragmentationDictionary, 
                                                                                 expandedFrags, fragSubgeometryKeys, 
                                                   abundanceThreshold = 0.0,
                                                   outputPath = str(today) + " TUTORIAL 3 Sample",
                                                               calcFF = True,
                                                               ffstd = 0.05,
                                                   unresolvedDict = unresolvedDict,
                                                   outputFull = False,
                                                   omitMeasurements = forbiddenPeaks,
                                                   UValueList = UValueList,
                                                   massThreshold = 1)
                                                   
###GENERATE STANDARD DATA
deltasStd = [-30,-30,0,0,0,0]
df, expandedFrags, fragSubgeometryKeys, fragmentationDictionary = alanineTest.initializeAlanine(deltasStd, fragSubset)

predictedMeasurement, MNDict, FF = alanineTest.simulateMeasurement(df, fragmentationDictionary, 
                                                                                 expandedFrags, fragSubgeometryKeys, 
                                                   abundanceThreshold = 0.0,
                                                   outputPath = str(today) + " TUTORIAL 3 Standard",
                                                               calcFF = False,
                                                               ffstd = 0.05,
                                                   fractionationFactors = fractionationFactors,
                                                   unresolvedDict = unresolvedDict,
                                                   outputFull = False,
                                                   omitMeasurements = forbiddenPeaks,
                                                   UValueList = UValueList,
                                                   massThreshold = 1)

###GENERATE FORWARD MODEL
deltas = [-30,-30,0,0,0,0]
fragSubset = ['full','44']
df, expandedFrags, fragSubgeometryKeys, fragmentationDictionary = alanineTest.initializeAlanine(deltas, fragSubset)

forbiddenPeaks = {}

predictedMeasurement, MNDictStd, FF = alanineTest.simulateMeasurement(df, fragmentationDictionary, 
                                                                      expandedFrags, fragSubgeometryKeys, 
                                                   abundanceThreshold = 0,
                                                     unresolvedDict = {},
                                                    outputFull = False,
                                                    omitMeasurements = forbiddenPeaks,
                                                  massThreshold = 1)

###READ AND SOLVE SYSTEM, NOTING THAT WE INPUT THE EXPLICITERRORS DICTIONARY
standardJSON = ri.readJSON(str(today) + " TUTORIAL 3 Standard.json")
processStandard = ri.readComputedData(standardJSON, error = explicitErrors, theory = predictedMeasurement)

sampleJSON = ri.readJSON(str(today) + " TUTORIAL 3 Sample.json")
processSample = ri.readComputedData(sampleJSON, error = explicitErrors)
UValuesSmp = ri.readComputedUValues(sampleJSON, error = 0.0001)

isotopologuesDict = fas.isotopologueDataFrame(MNDictStd, df)
OCorrection = ss.OValueCorrectTheoretical(predictedMeasurement, processSample, massThreshold = 1)

M1Results = ss.M1MonteCarlo(processStandard, processSample, OCorrection, isotopologuesDict,
                            fragmentationDictionary, experimentalOCorrectList = [], 
                            N = 100, GJ = False, debugMatrix = False,
                           perturbTheoryOAmt = 0.001, debugUnderconstrained = True, plotUnconstrained = False)

processedResults = ss.processM1MCResults(M1Results, UValuesSmp, isotopologuesDict, df, GJ = False, 
                                         UMNSub = ['13C'])
ss.updateSiteSpecificDfM1MC(processedResults, df)

Delta 18O
0.0
Calculating Isotopologue Concentrations


100%|██████████| 1512/1512 [00:00<00:00, 79686.72it/s]


Compiling Isotopologue Dictionary


100%|██████████| 1512/1512 [00:00<00:00, 58645.86it/s]


Simulating Measurement
Delta 18O
0.0
Calculating Isotopologue Concentrations


100%|██████████| 1512/1512 [00:00<00:00, 79483.97it/s]


Compiling Isotopologue Dictionary


100%|██████████| 1512/1512 [00:00<00:00, 72866.47it/s]


Simulating Measurement
Delta 18O
0.0
Calculating Isotopologue Concentrations


100%|██████████| 1512/1512 [00:00<00:00, 80793.28it/s]


Compiling Isotopologue Dictionary


100%|██████████| 1512/1512 [00:00<00:00, 69350.84it/s]


Simulating Measurement


100%|██████████| 100/100 [00:00<00:00, 382.51it/s]


Solution is underconstrained
processM1MCResults will not work with GJ Solution
After solving null space:
Actually Constrained:
13C Ccarboxyl
13C Calphabeta
15N Namine
D Hretained


100%|██████████| 100/100 [00:00<00:00, 545.40it/s]


Unnamed: 0,IDS,Number,deltas,full_01,44_01,VPDB etc. Deltas,VPDB etc. Deltas Error,Relative Deltas,Relative Deltas Error,M1 M+N Relative Abundance,M1 M+N Relative Abundance Error,UM1,UM1 Error,Calc U Values,Calc U Values Error
Calphabeta,C,2,-30,1,1,-40.415787,1.665003,-10.737925,1.716498,0.563792,0.000814,0.038057,4.1e-05,0.021456,3.7e-05
Ccarboxyl,C,1,-30,1,x,-19.17198,3.324461,11.162907,3.427279,0.288138,0.001177,0.038057,4.1e-05,0.010966,3.7e-05
Ocarboxyl,O,2,0,1,x,-295.277302,35.97224,-295.277302,35.97224,0.014069,0.000708,0.038057,4.1e-05,0.000535,2.7e-05
Namine,N,1,0,1,1,-10.490522,2.440298,-10.490522,2.440298,0.095578,0.000217,0.038057,4.1e-05,0.003637,9e-06
Hretained,H,6,0,1,1,-8.296931,46.256212,-8.296931,46.256212,0.024353,0.001135,0.038057,4.1e-05,0.000927,4.3e-05
Hlost,H,2,0,1,x,718.824814,87.736608,718.824814,87.736608,0.014069,0.000708,0.038057,4.1e-05,0.000535,2.7e-05


In [3]:
#If you wish to check the explicit error bars, do so by interrogating the processStandard (or Sample) variables
processStandard

{'M1': {'full': {'Subs': ['15N', '13C'],
   'Predicted Abundance': [0.0961908829046088, 0.8513207144292605],
   'Observed Abundance': [0.09783471811682339, 0.9021652818831767],
   'Error': [0.00019566943623364677, 0.0009021652818831767],
   'Perturbed': array([0.09796256, 0.90203744]),
   'Correction Factor': array([1.01841833, 1.05957417])},
  '44': {'Subs': ['Unsub', '15N', '13C'],
   'Predicted Abundance': [0.31180709070203644,
    0.0961908829046088,
    0.5675471429528403],
   'Observed Abundance': [0.3081031402169525,
    0.08801863967348175,
    0.6038782201095657],
   'Error': [0.0003081031402169525,
    0.0001760372793469635,
    0.0006038782201095657],
   'Perturbed': array([0.308197  , 0.08791448, 0.60388852]),
   'Correction Factor': array([0.98842205, 0.91395854, 1.06403234])}}}