In [23]:
from pyhdx import PeptideCSVFile
from pyhdx.fileIO import read_dynamx
from numpy.lib.recfunctions import stack_arrays


In [9]:

file_path = r'../tests/test_data/ds1.csv'
data1 = read_dynamx(file_path)
len(data1)

702

The data from a DynamX file is read with the `read_dynamx` function. Here, a total of 702 peptide entries are found and
returned. The data is in fhe form of a numpy structured array with field entries as displayed in the ouptut below. The
`start` and `end` entries indicate which residues in the protein this particular peptide represents, where the first 
residue in the protien has index 1, and the `end` entry is inclusive. 

In [13]:
for k in data1.dtype.fields.keys():
    print(k)

Protein
start
end
sequence
modification
fragment
max_uptake
MHP
state
exposure
center
center_sd
uptake
uptake_sd
RT
RT_sd


In [8]:

file_path = r'../tests/test_data/ds2.csv'
data2 = read_dynamx(file_path)
len(data2)

1359

Multiple datasets can be combined with numpy's `merge_arrays` function. 


In [25]:
combined_data = stack_arrays([data1, data2], autoconvert=True,)
len(combined_data)

2061

The first step is to put the data into a ``PeptideCSVFile`` object. Here, we can specify how many n-terminal amino acids 
to ignore (default = 1), as well as split the data into groups by their `state` or apply control measurements. 

In [26]:
combined_data.dtype.fields


mappingproxy({'Protein': (dtype('<U2'), 0),
              'start': (dtype('int32'), 8),
              'end': (dtype('int32'), 12),
              'sequence': (dtype('<U57'), 16),
              'modification': (dtype('<U15'), 244),
              'fragment': (dtype('bool'), 304),
              'max_uptake': (dtype('float64'), 305),
              'MHP': (dtype('float64'), 313),
              'state': (dtype('<U25'), 321),
              'exposure': (dtype('float64'), 421),
              'center': (dtype('float64'), 429),
              'center_sd': (dtype('float64'), 437),
              'uptake': (dtype('float64'), 445),
              'uptake_sd': (dtype('float64'), 453),
              'RT': (dtype('float64'), 461),
              'RT_sd': (dtype('float64'), 469)})

In [27]:

pcf = PeptideCSVFile(combined_data, drop_first=1)
states = pcf.groupby_state()
states.keys()

dict_keys(['FD', 'Native folded', 'PpiA-FD', 'PpiANative', 'folding_4C_10secLabelling'])