# Analyzing GCMS from Agilent

To analyze GCMS data from an Agilent GCMS using `valence.analyze` you will need three things:

1. A `.csv` containing library_ids with corresponding retention times
2. A `.csv` containing area with corresponding retention times
3. A `.csv` with calibration curve information.

After reading in GCMS data us, concentrations can be determined using methods provided in valence.analyze. The following methods are available for analysis:

- *match_area* : 	The method matches the species which have the smallest difference between the two retention times that is smaller than the set threshold.
- *std_curves* : Takes matched_area dataframe (compiled), of species with areas and ids and a standards dataframe to calculate the corresponding response factor (RF)
- *concentrations* : Calculates the concentration of species.
- *concentrations_exp* : Returns only species with unknown concentrations, no standards.
- *concentrations_std* : Returns only species with known concentrations, i.e. standards.

## Preparing Dataframes
For this example we will use the GC - AgilentGcms class to import and create the dataframe for the library data and the area data. This is easily done with the code below. For a full description of the class look [here](https://github.com/blakeboswell/chemtbd/blob/master/example.ipynb).

In [1]:
from valence.build import AgilentGcms
agi = AgilentGcms.from_root('data')
lib = agi.results_lib
area = agi.results_tic

In [2]:
lib.head()

Unnamed: 0_level_0,header=,pk,rt,pct_area,library_id,ref,cas,qual
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FA03.D,1=,1,5.7877,2.0335,Methyl octanoate,17,000000-00-0,96
FA03.D,2=,2,7.3441,3.4015,Methyl decanoate,1,000000-00-0,98
FA03.D,3=,3,8.0364,1.7448,Methyl undecanoate,2,000000-00-0,98
FA03.D,4=,4,8.6715,3.9674,Methyl dodecanoate,3,000000-00-0,98
FA03.D,5=,5,9.2781,1.9607,Methyl tridecanoate,4,000000-00-0,99


In [3]:
area.head()

Unnamed: 0_level_0,header=,peak,rt,first,max,last,pk_ty,height,area,pct_max,pct_total
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
FA03.D,1=,1,5.788,465,473,494,rBV3,257808,1489466,13.12,2.034
FA03.D,2=,2,7.344,733,745,812,rBV,964743,2491449,21.94,3.401
FA03.D,3=,3,8.036,859,866,904,rBV,608418,1277982,11.25,1.745
FA03.D,4=,4,8.672,970,977,1017,rBV,1929049,2905961,25.59,3.967
FA03.D,5=,5,9.278,1077,1083,1116,rBV,882521,1436154,12.65,1.961


The last dataframe is a dataframe containing calibration IDs, corresponding to a GCMS data file, and known concentrations. The dataframe can be imported using pandas and should have a column `library_id` and columns should be the calibration data file names. Each row should correspond to a species in the curve.

**IMPORTANT :**
1. The dataframe headers must have a library_id column and the remaining columns should be the file names for each standard. 
2. Be sure the file names are exactly the same as the files in the subfolder including any extention in the name (e.g. ".D").
3. UNITS UNITS UNITS. The concentrations should be entered in molar (mol/L).

In [4]:
import pandas as pd
stnd = pd.read_csv('data\standards.csv')
stnd.head()

Unnamed: 0,library_id,FA03.D,FA04.D,FA05.D
0,Methyl palmitate,0.25,0.5,1
1,Methyl heptadecanoate,0.25,0.5,1
2,Methyl docosanoate,0.25,0.5,1
3,Methyl undecanoate,0.25,0.5,1
4,"Methyl cis-8,11,14-eicosatrienoate",0.25,0.5,1


## Compiling Library and Area Dataframes
The `lib` and `area` dataframes can now be merged into one dataframe using `match_area`. This method matches rows from each dataframe based on the rentention time.

In [5]:
from valence.analyze import match_area
comp = match_area(lib,area)
comp.head()

Unnamed: 0_level_0,pk,rt,library_id,cas,qual,area,area%
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
FA03.D,1,5.7877,Methyl octanoate,000000-00-0,96,1489466.0,0.020335
FA03.D,2,7.3441,Methyl decanoate,000000-00-0,98,2491449.0,0.034015
FA03.D,3,8.0364,Methyl undecanoate,000000-00-0,98,1277982.0,0.017448
FA03.D,4,8.6715,Methyl dodecanoate,000000-00-0,98,2905961.0,0.039674
FA03.D,5,9.2781,Methyl tridecanoate,000000-00-0,99,1436154.0,0.019607


## Standard Curves

The next step is to create calibration curves. To create calibration curves the following 3 things will be needed, and this information will need to be captured in a dataframe.

- identify files within the subfolder that contain calibration curve data
- provide the known concentrations of each species in each file

We recommend either putting the data into a csv file and importing it using pandas or creating a pandas dataframe. Below is how this can be performed. 

In [6]:
from valence.analyze import std_curves
curves = std_curves(comp,stnd)
curves.head()

Unnamed: 0,library_id,responsefactor,intercept,rvalue,pvalue,stderr,max,min
0,"All cis-4,7,10,13,16,19-docosahexaenoate methy...",1.367304e-07,0.103774,0.999932,0.007421,1.593827e-09,6566581.0,1094384.0
1,Methyl arachidate,7.146232e-08,-0.063537,0.999076,0.027367,3.073928e-09,14785000.0,4222622.0
2,Methyl arachidonate,1.344684e-07,0.067813,0.999319,0.000681,3.51112e-09,7056294.0,1390417.0
3,"Methyl cis-11,14,17-eicosatrienoate",1.306034e-07,0.102132,1.0,0.0,0.0,6874772.0,3046387.0
4,Methyl cis-11-eicosenoate,6.336257e-08,0.008515,1.0,0.00011,1.092154e-11,15647413.0,3810380.0


## Determine Concentrations

Now that we have everything we need, lets load the data into GCQuant and see what analysis is now available.

In [7]:
from valence.analyze import concentrations
conc = concentrations(comp,curves)

conc.head()

Unnamed: 0_level_0,pk,rt,library_id,cas,qual,area,area%,responsefactor,intercept,max,min,conc,conc%
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
FA03.D,1,5.7877,Methyl octanoate,000000-00-0,96,1489466.0,0.020335,1.11362e-07,0.075876,8256723.0,1489466.0,0.241746,0.03515
FA03.D,2,7.3441,Methyl decanoate,000000-00-0,98,2491449.0,0.034015,1.026929e-07,-0.003581,9783363.0,2491449.0,0.252273,0.03668
FA03.D,3,8.0364,Methyl undecanoate,000000-00-0,98,1277982.0,0.017448,1.833527e-07,0.018361,5360876.0,1277982.0,0.252682,0.03674
FA03.D,4,8.6715,Methyl dodecanoate,000000-00-0,98,2905961.0,0.039674,9.68541e-08,-0.037806,10679280.0,2905961.0,0.243649,0.035426
FA03.D,5,9.2781,Methyl tridecanoate,000000-00-0,99,1436154.0,0.019607,1.640559e-07,0.013704,6009837.0,1436154.0,0.249314,0.03625


Nice! We now have the calculated concentrations in the `conc` column.  Addtionally GCQuant has provided the normalized concentration, `conc%`, and normalized area, `area%`.

Save the dataframe to a csv for later.

In [8]:
conc.to_csv('analysis.csv')

But the dataframe isn't always the best way to view the results. Check out the `reporting` notebook to organize and plot your data.