# Reading Agilent GCMS Files with `chemtbd`

> __NOTE__: We need a name.  See [issue 3](https://github.com/blakeboswell/chemtbd/issues/3).

Currently there is a hiearchy of objects for reading GCMS data:

- `GcmsDir` object will read `RESULTS.csv` and `DATA.MS` from a single Agilent `.D` directory
- `GcmsData` object will read a `DATA.MS` file
- `GcmsResults` object will read a `RESULTS.csv` file

These objects are available for import and direct use.  However, the main interface for file reading is the `chemtbd.io.Agilent` object which is a wrapper for the above methods.

To use `chemtbd.io.Agilent`, import it as follows.  The directory that contains the `chemtbd` folder has to be the working directory

In [None]:
from chemtbd.io import Agilent

`Agilent` provides three main read functions:

- `from_dir` expects a path to single Agilent `.D` directory as input
- `from_root` expects a path to a parent directory containing only `.D` directories as children
- `from_list` expects a list of paths to Agilent `.D` directories

For example, let's load all `.D` folders from the directory `data/test3`:

In [None]:
agi = Agilent.from_root('data/test3')

Let's look at what `.D` folders loaded from the above directory:

In [None]:
agi.keys()

# Accessing all Files

We can access the __RESULTS.CSV__ `lib`, `fid`, and `tic` tables from all Agilent directories as a single pandas DataFrame using the below commands.

In [None]:
agi.results_lib.head()

In [None]:
agi.results_fid.head()

We can access the __DATA.MS__ `tme` tables from all Agilent directories as a single pandas DataFrame using the below command.  The same command wil work for the `tic` table.

In [None]:
agi.datams.head()

> __NOTE__:  `tme` and `tic` from `DATA.MS` should probably be in the same table.  see [issue 2](https://github.com/blakeboswell/chemtbd/issues/2) for discussion.

## Acessing a Single Agilent Directory

By default the `key` or directory name is index of the Agilent dataframes. Therefore, we can access the `RESULTS.CSV` and `DATA.MS` data for each `.D` individually through the standard pandas index selection procedure:

In [None]:
agi.results_tic.loc['FA01.D'].head()

In [None]:
agi.results_tic.loc['FA05.D'].head()

Calculating aggregate metrics across folders can also be done efficiently using standard pandas methods:

In [None]:
metrics = {'min': 'min', 'max': 'max', 'mean': 'mean'}
agi.results_tic.groupby(level=0).agg({'height': metrics, 'area': metrics})

In [None]:
%matplotlib inline

agi.results_tic.groupby(level=0).agg({'height': metrics, 'area': metrics}).plot()

## Chromatogram?

Below is a temporary interface for accessing data from `DATA.MS` files... not sure what do do with this data yet.

In [1]:
from chemtbd.io import GcmsData

Read directly from single file (no stacking yet because not sure its stackable)

In [2]:
gcms_data = GcmsData('data/test3/FA01.d/DATA.MS')

In [3]:
chrom = gcms_data.chromatogram

The resulting data frame has `index` equal to time and `coloumns` equal to ions.

In [4]:
chrom.head()

Unnamed: 0,204.9,52.1,155.0,104.1,53.1,207.0,155.8,105.0,54.1,207.9,...,326.6,251.2,316.2,354.4,331.6,105.7,268.5,122.1,202.2,126.7
3.086817,185.0,8090.0,225.0,166.0,40736.0,1851.0,193.0,365.0,21416.0,313.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3.092533,0.0,6851.0,0.0,0.0,32048.0,2084.0,0.0,0.0,17080.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3.09825,0.0,5660.0,203.0,0.0,24736.0,1931.0,0.0,568.0,15641.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3.103983,0.0,4772.0,210.0,205.0,23920.0,2675.0,0.0,0.0,14445.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3.1097,0.0,4327.0,0.0,218.0,0.0,2216.0,0.0,506.0,13051.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
