# Reading Agilent GCMS Files with `Valence`

Currently there is a hiearchy of objects for reading GCMS data:

- `AgilentGcmsDir` object will read `RESULTS.csv` and `DATA.MS` from a single Agilent `.D` directory
- `AgilentGcmsDataMs` object will read a `DATA.MS` file
- `AgilentGcmsResults` object will read a `RESULTS.csv` file

These objects are available for import and direct use.  However, the main interface for file reading is the `valence.build.AgilentGcms` object which is a wrapper for the above methods.

To use `valence.build.AgilentGcms`, import it as follows.  The directory that contains the Agilent `.D` folder has to be the working directory

In [1]:
from valence.build import AgilentGcms 

`AgilentGcms` provides three main read functions:

- `from_dir` expects a path to single Agilent `.D` directory as input
- `from_root` expects a path to a parent directory containing only `.D` directories as children

For example, let's load all `.D` folders from the directory `data`:

In [2]:
agi = AgilentGcms.from_root('data')

Let's look at what `.D` folders loaded from the above directory:

In [3]:
agi.keys

dict_keys(['FA03.D', 'FA04.D', 'FA05.D', 'FA12.D', 'FA13.D', 'FA14.D'])

# Accessing all Files

We can access the __RESULTS.CSV__ `lib`, `fid`, and `tic` tables from all Agilent directories as a single pandas DataFrame using the below commands.

In [4]:
agi.results_lib.head()

Unnamed: 0_level_0,header=,pk,rt,pct_area,library_id,ref,cas,qual
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FA03.D,1=,1,5.7877,2.0335,Methyl octanoate,17,000000-00-0,96
FA03.D,2=,2,7.3441,3.4015,Methyl decanoate,1,000000-00-0,98
FA03.D,3=,3,8.0364,1.7448,Methyl undecanoate,2,000000-00-0,98
FA03.D,4=,4,8.6715,3.9674,Methyl dodecanoate,3,000000-00-0,98
FA03.D,5=,5,9.2781,1.9607,Methyl tridecanoate,4,000000-00-0,99


In [5]:
agi.results_fid.head()

Unnamed: 0_level_0,header=,peak,rt,first,end,pk_ty,height,area,pct_max,pct_total
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
FA03.D,1=,1,6.250716,5.93818,6.563252,M,2578080,14894660,1,1.962
FA03.D,2=,2,7.858187,7.465278,8.251096,M,9647430,24914490,1,3.282
FA03.D,3=,3,8.357856,7.939963,8.775749,M,6084180,12779820,1,1.683
FA03.D,4=,4,9.798795,9.308855,10.288735,M,19290490,29059610,1,3.828
FA03.D,5=,5,10.669815,10.136324,11.203306,M,8825210,14361540,1,1.892


We can access the __DATA.MS__ `tme` tables from all Agilent directories as a single pandas DataFrame using the below command.  The same command wil work for the `tic` table. For plotting the chromatograms see the `reporting` notebook.

In [6]:
agi.chromatogram.head()

Unnamed: 0_level_0,tic,tme
key,Unnamed: 1_level_1,Unnamed: 2_level_1
FA03.D,3572944.0,3.0869
FA03.D,2629864.0,3.092633
FA03.D,2000653.0,3.09835
FA03.D,1589731.0,3.104067
FA03.D,1318152.0,3.1098


## Acessing a Single Agilent Directory

By default the `key` or directory name is index of the Agilent dataframes. Therefore, we can access the `RESULTS.CSV` and `DATA.MS` data for each `.D` individually through the standard pandas index selection procedure:

In [7]:
agi.results_lib.loc['FA03.D'].head()

Unnamed: 0_level_0,header=,pk,rt,pct_area,library_id,ref,cas,qual
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FA03.D,1=,1,5.7877,2.0335,Methyl octanoate,17,000000-00-0,96
FA03.D,2=,2,7.3441,3.4015,Methyl decanoate,1,000000-00-0,98
FA03.D,3=,3,8.0364,1.7448,Methyl undecanoate,2,000000-00-0,98
FA03.D,4=,4,8.6715,3.9674,Methyl dodecanoate,3,000000-00-0,98
FA03.D,5=,5,9.2781,1.9607,Methyl tridecanoate,4,000000-00-0,99


In [8]:
agi.results_fid.loc['FA03.D'].head()

Unnamed: 0_level_0,header=,peak,rt,first,end,pk_ty,height,area,pct_max,pct_total
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
FA03.D,1=,1,6.250716,5.93818,6.563252,M,2578080,14894660,1,1.962
FA03.D,2=,2,7.858187,7.465278,8.251096,M,9647430,24914490,1,3.282
FA03.D,3=,3,8.357856,7.939963,8.775749,M,6084180,12779820,1,1.683
FA03.D,4=,4,9.798795,9.308855,10.288735,M,19290490,29059610,1,3.828
FA03.D,5=,5,10.669815,10.136324,11.203306,M,8825210,14361540,1,1.892


For further analysis use the `valence.analyse` module which has examples in the `agilent_analyze` notebook in this repository. To see how to plot results and create pivot tables from `valence` data see the `reporting` notebook in this repository.