Feature/rawdata #54

singjc · 2023-12-15T16:59:33Z

This is a fairly large PR.

This implements raw targeted feature extraction from DIA/diaPASEF mzML files using a spectral library (tsv/pqp) and optionally supplemented with search results (diann tsv/osw). The extracted results can then be visualized in 1D, 2D and 3D.

Note: I do not include any test files in this branch, I will include tests later in a separate branch.

Inputs:

Spectral library - can be either TSV or PQP
Mass Spec Files - can be either DIA or diaPASEF data strictly in mzML format
Search results (Optional) - can be either a DIA-NN report file or a OSW sqlite file

Plotting:

1D plotting - spectrum, chromatogram and mobilogram (diaPASEF) plots
2D plotting - heatmaps of any combination of two dimensions. Option to aggregate all precursors and fragments into a single heatmap, default is to split them into individual heatmaps.
3D plotting - vertical line spectrum-chromatogram plots for DIA and diaPASEF data. Aggregated 3D scatter heatmap plots of precursors and fragments, and contour plots for any two dimensions for diaPASEF data.

Loaders:

reportLoader - TSV based search results loader (based off DIA-NNs report). TODO: need to generalize to other TSV based reports like OSW or Spectronaught
DiaNNLoader - Loads a DIA-NN report file
OSWLoader - Loaders OSW sqlite-based report file
mzMLLoader - Loads an mzML file into an onDiscExperiment
TargetedDIADataAccess - Main class that does the actual targeted extraction
TargetDIALoader - Combines everything together to load the mzML data and perform the extraction and exports the data to a DF or TransitionGroup per precursor and per file.

Preprocess:

transformations - moved min-max scaling to this file from the chromatogram_data_handling.py file

Server:

RawTargetedExtractionAnalysisServer - Main script to perform targeted extraction analysis
OneDimensionPlotterServer (Two and Three) - scripts to prepare and generate the interactive plots calling the individual interactive plotter methods from plotting

UI:

RawTargetedExtractionAnalysisFormUI - front end user input forum for input files
FileInputRawDataUISettings - file input settings for displaying files in the sidebar and extracting list of files if a path if given for raw data.
RawTargetedExtractionAnalysisUI - front end for displaying components in the siderbar, i.e. search results information, extraction parameters, and for displaying plots and dataframes.

Examples

DIA (two runs)

diaPASEF (one run)

…nto feature/rawdata Pull in changes from masterRefactor to avoid further conflicts down the road and to implement refactoring in parallel with targeted extraction

… into feature/rawdata

…er into feature/rawdata

… as arg

… into feature/rawdata

…er into feature/rawdata

… into feature/rawdata

…nto feature/rawdata

… into feature/rawdata

apply same fix as TransitionTSVLoader that cannot access TransitionPQPLoader dataframe from the SpectrumLibraryLoader class

rename columns in returned dataframe

Also minor documentation code linting

implement loadTopTransitionGroupFeatureDf and fix __init__ function

restructed checking for the need to generate the annotaiton column previous structure would pass the check for anntation column being present, but would fail inner if statement checking for NULL or NA assignment in the annotation column. This results in stmt variable never being defined.

SpectralLibraryLoader is mean to be the main class for loading transition files. Use the Individual access class to retrieve the data based on file type, and then return the data as a dataframe and store that as the `data` attribute in the SpectralLibraryLoader class. This avoids instances of `data.data.pd.DataFrame` and avoids issues when caching data in streamlit. On first execution, the `data.data` attribute contains the Df from the access class. However, after caching `data.data` no longer represents the access class, but the actual retrieved pd.Dataframe.

Added caching where necessary, and added checks to see if caches need ot be cleared on new input interaction Changed time_block to MeasureBlock for performance metrics

add show() methods add context property to config

m/z heatmap with fragments split up had wrong axis, fix this

allow for 2 columns on multiplots

Patch/notebook plotting

Josh's Suggestions for rawData refactoring

singjc added 30 commits November 2, 2023 11:00

[ADD] Transition List class

0e4aeaa

[ADD] targeted experiment UI

95ed28d

[MINOR] changes to get targeted extraction working

b1f758e

Merge branch 'dev' of github.com:Roestlab/massseer into feature/rawdata

5ea2f12

[MINOR] changes to get targeted extraction working

a1b29c2

Merge branch 'dev' of github.com:Roestlab/massseer into feature/rawdata

9ef192e

Merge branch 'patch/masterRefactor' of github.com:Roestlab/massseer i…

3deac99

…nto feature/rawdata Pull in changes from masterRefactor to avoid further conflicts down the road and to implement refactoring in parallel with targeted extraction

[ADD] classes for different molecule levels

1880059

Merge branch 'patch/molecule_structs' of github.com:Roestlab/massseer…

1123523

… into feature/rawdata

[FIX] dependencies

f75c1ab

Merge branch 'dev' of github.com:Roestlab/massseer into feature/rawdata

d219e33

[MINOR]

914bcd8

Merge branch 'patch/transition_loaders' of github.com:Roestlab/massse…

80aa74f

…er into feature/rawdata

[ADD] docs and added adders to Protein and Peptide instead of passing…

645c700

… as arg

Merge branch 'patch/molecule_structs' of github.com:Roestlab/massseer…

bce9bb3

… into feature/rawdata

[FIX] imports

8d4c6f5

Merge branch 'patch/molecule_structs' of github.com:Roestlab/massseer…

4183cb5

… into feature/rawdata

[MINOR]

ebad23e

Merge branch 'patch/transition_loaders' of github.com:Roestlab/massse…

21448b4

…er into feature/rawdata

[MOVE] Product to Precursor

1c87c88

[ADD] getters and setters for precursor and product

e7e9c99

Merge branch 'patch/molecule_structs' of github.com:Roestlab/massseer…

99ad6cb

… into feature/rawdata

[MINOR]

81dc20d

[ADD] print statement via __str__ for molecule class objects

60a0b6f

Merge branch 'patch/molecule_structs' of github.com:Roestlab/massseer…

f47d540

… into feature/rawdata

Merge branch 'patch/masterRefactor' of github.com:Roestlab/massseer i…

11ae6ca

…nto feature/rawdata

[MINOR]

0ffa7e9

[ADD] method to get sequence without periods for terminal modificaitons

e7d7a5a

Merge branch 'patch/molecule_structs' of github.com:Roestlab/massseer…

a69993e

… into feature/rawdata

[MINOR]

2d374c8

jcharkow and others added 27 commits December 31, 2023 16:19

[FIX] getting items in TransitionPQPLoader

8930f22

apply same fix as TransitionTSVLoader that cannot access TransitionPQPLoader dataframe from the SpectrumLibraryLoader class

[FIX] clean up TransitionGroupFeature access methods

63aaa18

rename columns in returned dataframe

[TEST] update ionMobilityTest2 to contain IM data

d803920

[FIX] consensusApexIntensity not being recorded

318c09a

[FIX] use featureIM rather than libIM in extraction

cf1980c

[FIX] config object required by loadTransitionGroups

72066bf

Also minor documentation code linting

[FEATURE] implement loadTransitionGroupDf for mzML loader

dca34f0

[FIX] SqMassLoader errors

47e3019

implement loadTopTransitionGroupFeatureDf and fix __init__ function

[FIX] Add check for when no spectra is found, and throw a warning

2bc3a08

[ADD] UI message to inform of failed extraction

0cff531

[ADD] method to update st.write

7317ad2

[ADD] session_state variables to keep track of interactive changes

6957773

[UPDATE] caching resources

b9cf0f9

Added caching where necessary, and added checks to see if caches need ot be cleared on new input interaction Changed time_block to MeasureBlock for performance metrics

[MINOR] lint

663a944

[REMOVE] st.write debuggin statement

42d7005

[CHANGE] default plot smoothing to none instead of sgolay

7c0eb21

Update plotting for jupyter notebook compatibility

3d5cdfb

add show() methods add context property to config

Add plotting directly from TransitionGroup

cbee08e

address circular import

eb5bb5a

ensure other 3D plots work in jupyter context

7a28007

[FIX] bug of m/z heatmap

147501c

m/z heatmap with fragments split up had wrong axis, fix this

change plot sizes

e29b5c3

allow for 2 columns on multiplots

adjust ratio surface plot

a10a206

Merge pull request #63 from Roestlab/patch/notebookPlotting

4cfab95

Patch/notebook plotting

Merge pull request #56 from Roestlab/patch/josh_rawdata_2

c1ca78e

Josh's Suggestions for rawData refactoring

singjc merged commit 78d960f into patch/masterRefactor Jan 4, 2024

singjc deleted the feature/rawdata branch January 7, 2024 05:58

singjc mentioned this pull request Jan 8, 2024

Feature: Raw Data Targeted Extraction #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/rawdata #54

Feature/rawdata #54

singjc commented Dec 15, 2023 •

edited

Loading

Feature/rawdata #54

Feature/rawdata #54

Conversation

singjc commented Dec 15, 2023 • edited Loading

Examples

DIA (two runs)

diaPASEF (one run)

singjc commented Dec 15, 2023 •

edited

Loading