# ChromProcess Introduction Part 2

In ChromProcess Introduction Part 1, peak collections were created from chromatogram files. In Part 2, these peak tables are used to compile series data (e.g. changes in compound concentrations or peak integrals over time).

Note that the results of each operation should be inspected or validated in some way before they are used to evaluate the 'final' data. The software may not be picking up important features of the data. Tuning the analysis parameters shown in this notebook may remedy issues, or further code must be included to achieve satisfactory results. It is up to the reader to choose a sufficient methods for validation and inspection of the results.

In [1]:
import os
from ChromProcess import Classes

Peak collections can be loaded into objects.

In [2]:
peak_collection_directory = 'Examples/Example/ExperimentalData/ExamplePeakCollections'

peak_tables = []
for file in os.listdir(peak_collection_directory):
    peak_tables.append(Classes.PeakCollection(file = f'{peak_collection_directory}/{file}'))

The experiment conditions go hand-in-hand with the data, so they are loaded into an object.

In [3]:
conditions_file = 'Examples/Example/ExperimentalData/example_conditions.csv'
conditions = Classes.Experiment_Conditions(information_file = conditions_file)

The peak tables and experiment conditions are then used to create a series of peak collections in a single object.

In [4]:
# Create series of peak collections
series = Classes.PeakCollectionSeries(
                                    peak_tables, 
                                    name = 'Example',
                                    conditions = conditions.conditions
                                    )

Now, there are a bunch of operations which can be performed on this series of peak collections. First, consider that similar peaks between chromatograms do not have *exactly* the same retention time. Some of this variability can be reduced by aligning each chromatogram according the the position of its internal standard. The internal standard retention time can be set to zero, and all other retention times are adjusted accordingly.

In [5]:
IS_pos = 0.0
series.align_peaks_to_IS(IS_pos)

Next, the integrals of all peaks can be normalised to the internal standard's integral.

In [6]:
series.reference_integrals_to_IS()

If there are low-intensity peaks that are considered to be negligible (your decision), those below a certain integral value threshold can be removed.

In [7]:
peak_removal_limit = 0.05 # 5% of internal standard integral if integrals are normalised to IS
series.remove_peaks_below_threshold(peak_removal_limit)

The above operations clean up the data for finding a series for each peak through the set of peak collections. The following method uses a simple agglomerative clustering algorithm to create clusters of peak retention times. Each cluster will be used to identify which peaks will go into a series.

In [13]:
peak_agglomeration_boundary = 0.025 # distance cutoff 
series.get_peak_clusters(bound = peak_agglomeration_boundary)

These clusters can then be used to output the series data for the peak integrals. First information about the chromatographic method is stored in a dedicated 'calibrations' file. This file is loaded into an object (`Instrument_Calibration`), which is passed on as a source of information for the output. Next, a method is called to assemble the peak clusters into a numpy array. Then, a DataReport object is created from the PeakCollectionSeries object, which is then written to a file.

In [15]:
data_report_directory = 'Examples/Example/ExperimentalData/DataReports'
calibrations_file = 'Examples/example_calibrations.csv'
calib = Classes.Instrument_Calibration(file = calibrations_file)

series.make_integral_series() # create arrays for output
integral_data_report = series.create_integral_DataReport(calib) # create DataReport
integral_data_report.write_to_file(path = data_report_directory) # Save to file