## 3. yadg usage: chromatography example

As of `yadg-4.0.0`, the package contains parsers for several chromatography file formats, and a powerful chromatogram integrator with a calibration interface. Both of these features will be discussed below. 

### 3.1. example without calibration: Fusion `json` files
As the Fusion `json` files contain integrated peak areas and concentrations of detected species, we will use those to illustrate the usage of `yadg` in parsing simple chromatographic data. The data files are placed in the `data/` directory. To process these with `yadg` we need a `dataschema` file:

```json
# data/fusionjson.schema.json
{
    "metadata": {
        "provenance": "manual",
        "schema_version": "4.0.0"
    },
    "steps": [
        {
            "parser": "chromtrace",
            "import": {
                "folders": ["data/fusion/"]
            },
            "parameters": {"tracetype": "fusion.json"}
        }
    ]
}
```

As in the [electrochemistry](electrochemistry.ipynb) example, the `metadata` key contains information about the `provenance` of the `dataschema`, as well as the version information. 

The Fusion `json` files are placed in the `data/fusion/` folder, and every file in that folder will be parsed by `yadg` using the `chromtrace` parser.

The above `dataschema` is saved in [schemas/fusionjson.schema.json](schemas/fusionjson.schema.json), and can be processed as follows:

In [5]:
! yadg process -v schemas/fusionjson.schema.json output/fusionjson.dg.json

INFO:root:yadg process: Reading input json from 'schemas/fusionjson.schema.json'.
INFO:root:schema_validator: Tag not present in step 0. Using '00'
INFO:root:schema_validator: Encoding not present in step 0. Using 'utf-8'
INFO:root:process_schema: processing step 0:
INFO:root:yadg process: Saving datagram to 'output/fusionjson.dg.json'.


Note that `yadg` warns the user that integration/calibration information has not been provided by the user. 

If the above command finished successfully, the output `datagram` will be located in [output/fusionjson.dg.json](output/fusionjson.dg.json).

The `datagram` contains a single `step`, with 4 timesteps (corresponding to the 4 processed files). The trace data is present in the `raw` section of each timestep, containing the $[t, y]$ axes of the chromatogram, and the integrated peak data from the raw data file, if available. See [the parser description](https://dgbowl.github.io/yadg/4.0.0/parsers.chromtrace.html) for more details.

### 3.2. example with calibration: EZChrom `asc` files
The EZChrom ASCII files, exported using the EZChrom utility from Agilent instruments, do not contain integrated peak areas. Therefore, we need to specify a set of integration/calibration parameters in the `dataschema` file:

```json
# data/ezchrom.schema.json
{
    "metadata": {
        "provenance": "manual",
        "schema_version": "4.0.0"
    },
    "steps": [
        {
            "parser": "chromtrace",
            "import": {
                "folders": ["data/agilentcsv/"],
                "prefix": "2019"
                
            },
            "parameters": {
                "tracetype": "ezchrom.asc",
                "calfile": "calibrations/gccal.json"
            }
        }
    ]
}
```

The Agilent `dat.asc` files are placed in the `data/fusion/` folder, and every file in that folder will be parsed by `yadg` using the `chromtrace` parser.

Note that the calibration information is provided in the `calfile` argument to the `parameters` entry. More details about this functionality can be found [in the manual](https://dgbowl.github.io/yadg/4.0.0/yadg.parsers.chromtrace.html#module-yadg.parsers.chromtrace.integration).

The above `dataschema` is saved in [schemas/ezchrom.schema.json](schemas/fusionjson.schema.json), and can be processed as follows:

In [10]:
! yadg process schemas/ezchrom.schema.json output/ezchrom.dg.json



If the above command finished successfully, the output `datagram` will be located in [output/ezchrom.dg.json](output/ezchrom.dg.json).

The `datagram` contains a single `step`, with 5 timesteps (corresponding to the 5 processed files). The trace data is present in the `raw` section of each timestep, containing the [t, y] axes of the chromatogram. The integrated peak data, using the calibration provided in `calfile` of the `schema`, is located in the `derived` section of each timestep, and includes the peak data of every matched peak in each trace, as well as the normalised mole fractions in `xout`.

[Back to index](index.ipynb)