Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/rawdata #54

Merged
merged 215 commits into from
Jan 4, 2024
Merged

Feature/rawdata #54

merged 215 commits into from
Jan 4, 2024

Conversation

singjc
Copy link
Collaborator

@singjc singjc commented Dec 15, 2023

This is a fairly large PR.

This implements raw targeted feature extraction from DIA/diaPASEF mzML files using a spectral library (tsv/pqp) and optionally supplemented with search results (diann tsv/osw). The extracted results can then be visualized in 1D, 2D and 3D.

Note: I do not include any test files in this branch, I will include tests later in a separate branch.

Inputs:

  • Spectral library - can be either TSV or PQP
  • Mass Spec Files - can be either DIA or diaPASEF data strictly in mzML format
  • Search results (Optional) - can be either a DIA-NN report file or a OSW sqlite file

Plotting:

  • 1D plotting - spectrum, chromatogram and mobilogram (diaPASEF) plots
  • 2D plotting - heatmaps of any combination of two dimensions. Option to aggregate all precursors and fragments into a single heatmap, default is to split them into individual heatmaps.
  • 3D plotting - vertical line spectrum-chromatogram plots for DIA and diaPASEF data. Aggregated 3D scatter heatmap plots of precursors and fragments, and contour plots for any two dimensions for diaPASEF data.

Loaders:

  • reportLoader - TSV based search results loader (based off DIA-NNs report). TODO: need to generalize to other TSV based reports like OSW or Spectronaught
  • DiaNNLoader - Loads a DIA-NN report file
  • OSWLoader - Loaders OSW sqlite-based report file
  • mzMLLoader - Loads an mzML file into an onDiscExperiment
  • TargetedDIADataAccess - Main class that does the actual targeted extraction
  • TargetDIALoader - Combines everything together to load the mzML data and perform the extraction and exports the data to a DF or TransitionGroup per precursor and per file.

Preprocess:

  • transformations - moved min-max scaling to this file from the chromatogram_data_handling.py file

Server:

  • RawTargetedExtractionAnalysisServer - Main script to perform targeted extraction analysis
  • OneDimensionPlotterServer (Two and Three) - scripts to prepare and generate the interactive plots calling the individual interactive plotter methods from plotting

UI:

  • RawTargetedExtractionAnalysisFormUI - front end user input forum for input files
  • FileInputRawDataUISettings - file input settings for displaying files in the sidebar and extracting list of files if a path if given for raw data.
  • RawTargetedExtractionAnalysisUI - front end for displaying components in the siderbar, i.e. search results information, extraction parameters, and for displaying plots and dataframes.

Examples

DIA (two runs)


raw_dia_example_1

raw_dia_example_2

diaPASEF (one run)


raw_diapasef_example_1

raw_diapasef_example_3

raw_diapasef_example_4

raw_diapasef_example_5

raw_diapasef_example_6

…nto feature/rawdata

Pull in changes from masterRefactor to avoid further conflicts down the
road and to implement refactoring in parallel with targeted extraction
jcharkow and others added 27 commits December 31, 2023 16:19
apply same fix as TransitionTSVLoader that cannot access TransitionPQPLoader dataframe from the SpectrumLibraryLoader class
rename columns in returned dataframe
Also minor documentation code linting
implement loadTopTransitionGroupFeatureDf and fix __init__ function
restructed checking for the need to generate the annotaiton column
previous structure would pass the check for anntation column being
present, but would fail inner if statement checking for NULL or NA
assignment in the annotation column. This results in stmt variable
never being defined.
SpectralLibraryLoader is mean to be the main class for loading
transition files. Use the Individual access class to retrieve the data
based on file type, and then return the data as a dataframe and store
that as the `data` attribute in the SpectralLibraryLoader class. This
avoids instances of `data.data.pd.DataFrame` and avoids issues when
caching data in streamlit. On first execution, the `data.data` attribute
contains the Df from the access class. However, after caching
`data.data` no longer represents the access class, but the actual
retrieved pd.Dataframe.
Added caching where necessary, and added checks to see if caches need ot
be cleared on new input interaction
Changed time_block to MeasureBlock for performance metrics
add show() methods
add context property to config
m/z heatmap with fragments split up had wrong axis, fix this
allow for 2 columns on multiplots
Josh's Suggestions for rawData refactoring
@singjc singjc merged commit 78d960f into patch/masterRefactor Jan 4, 2024
@singjc singjc deleted the feature/rawdata branch January 7, 2024 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants