Josh's Suggestions for rawData refactoring #56

jcharkow · 2023-12-21T15:39:55Z

Here is my attempt at refactoring the rawData a bit. The main idea of this refactor is to make the loader interface more consistent between SqMassLoader and the Raw data extraction for easier python usage.

Note: This has not been tested and is likely very buggy but I wanted to start the PR early so everyone is up to date.

Major Changes include:

Major refactoring of loaders
- create new "access" folder which is meant to contain more "low level" access methods
  - e.g. this is direct pyopenms methods and SQL queries
- mzMLDataLoader.py - links a mzML file, results file and spectral library to do the heavy lifting of targeted extraction. Main method is loadFeatureMaps(). This provides more consistency with how SqMassLoader is implemented. This replaces OSWLoader, DIANNLoader, TargetedExtractionLoader and MzMLLoader.
- Reporttsv is renamed to ResultsTSVDataAccess. Currently only DIA-NN TSV is supported. Will work on adding OSW .tsv
- Changes to GenericLoader to be parent of both mzMLDataLoader and SqMassDataLoader (and future .d loader)
- TransitionGroupFeature now stores more meta info so that this data structure can be used more widely

Server methods will have to be adjusted based on this which I am working on currently.

given class functionality mzMLDataAccess is a more appropriate name

GenericResultsAcess - abstract class outlining methods that should be implemented in results files current children are OSWDataAccess and ResultsTSVDataAccess ResultsTSVDataAccess - implenmentation of GenericResultsAccess for DIA-NN .tsv file

results TSV data access does not load all memory as per Justin's implementation

loosely based off reportLoader, similar to SqMassLoader but for mzMLFiles

these are more low level loaders so should be access, high level implementation is SpectralLibraryLoader

mzMLLoader links a spectral library, results file and mzMLfile to do on the fly extraction of a given peptide and charge. Note that the peptide must be found in the experiment to know where the feature is.

jcharkow · 2023-12-21T16:11:41Z

@singjc I know that this is still in progress but can you please have a quick look (even just at the description) to let me know if this refactoring sounds ok? E.g. not going to screw everything up?

refactor for usage with new mzMLDataLoader interface

singjc · 2023-12-21T17:13:16Z

@jcharkow Looks fine / makes sense. I renamed the reportLoader already in the main branch, so there may be some conflict there. I just pulled in the most recent changes from the main feature/rawdata branch, but there are some conflicts with some of the changes I was working on. I will fix those and then probably leave it for you to work on the rest of the refactoring.

I did update the oswDataAccess for the get_top_rank_feature methods to use a feature hash table to only index on indices.

jcharkow · 2023-12-21T17:17:45Z

Thanks for looking it over and letting me refactor your code. I'm sorry that it seems we are getting a lot of conflicts but hopefully, this leads to a more unified interface overall :)

…atch/josh_rawdata_2

specifically related to loading DIA-NN file with mzMLLoader

now only load the full data if required, this should speed up the mzML data access a bit?

fix error in everything being ms1 output and no RT. Assign the closest annotations

class saves the plot config, feature map is provided per instance. FeatureMaps saved in two dimension server object

fix bug where crash if annotation column not present. Now will generate annotation column if not present or it is NA

apply same fix as TransitionTSVLoader that cannot access TransitionPQPLoader dataframe from the SpectrumLibraryLoader class

rename columns in returned dataframe

Also minor documentation code linting

implement loadTopTransitionGroupFeatureDf and fix __init__ function

restructed checking for the need to generate the annotaiton column previous structure would pass the check for anntation column being present, but would fail inner if statement checking for NULL or NA assignment in the annotation column. This results in stmt variable never being defined.

SpectralLibraryLoader is mean to be the main class for loading transition files. Use the Individual access class to retrieve the data based on file type, and then return the data as a dataframe and store that as the `data` attribute in the SpectralLibraryLoader class. This avoids instances of `data.data.pd.DataFrame` and avoids issues when caching data in streamlit. On first execution, the `data.data` attribute contains the Df from the access class. However, after caching `data.data` no longer represents the access class, but the actual retrieved pd.Dataframe.

Added caching where necessary, and added checks to see if caches need ot be cleared on new input interaction Changed time_block to MeasureBlock for performance metrics

add show() methods add context property to config

m/z heatmap with fragments split up had wrong axis, fix this

allow for 2 columns on multiplots

Patch/notebook plotting

jcharkow added 10 commits December 20, 2023 12:13

Rename mzMLLoader to mzMLDataAccess

2584010

given class functionality mzMLDataAccess is a more appropriate name

Merge branch 'patch/masterRefactor' into patch/josh_rawdata_2

ae3fcce

Reimplement ResultsTSV data access

64e31e0

results TSV data access does not load all memory as per Justin's implementation

create mzMLDataLoader file

d7a098d

loosely based off reportLoader, similar to SqMassLoader but for mzMLFiles

Move TargetedDIAConfig to own structure

69bdf9d

Combine TargetedDIADataAccess and mzmlDataAccess

69ca890

Move TransitionPQP/TSV loader to access

8c2684a

these are more low level loaders so should be access, high level implementation is SpectralLibraryLoader

[REFACTOR] mzMLLoader

1dbb564

mzMLLoader links a spectral library, results file and mzMLfile to do on the fly extraction of a given peptide and charge. Note that the peptide must be found in the experiment to know where the feature is.

Remove old files

c14bca6

jcharkow requested a review from singjc December 21, 2023 16:10

initial refactoring of rawDataAnalysisServer

9d6f7d6

refactor for usage with new mzMLDataLoader interface

jcharkow added 15 commits December 21, 2023 16:54

Initial attempt adapt rawTargetedExtractionServer

a8ec8e9

Merge branch 'feature/rawdata' of github.com:Roestlab/massseer into p…

934a8c5

…atch/josh_rawdata_2

Merge branch 'patch/masterRefactor' into patch/josh_rawdata_2

79950b0

Merge branch 'feature/rawdata' into patch/josh_rawdata_2

f899c32

minor code linting

9977378

more minor changes

148a452

Merge branch 'dev' into patch/josh_rawdata_2

6c1c2ae

Merge branch 'dev' into patch/josh_rawdata_2

14eb130

Merge branch 'dev' into patch/josh_rawdata_2

f5cf52d

fix bugs related to refactoring

ee120ab

specifically related to loading DIA-NN file with mzMLLoader

[SPEEDUP] mzML data access

323ebca

now only load the full data if required, this should speed up the mzML data access a bit?

[FEATURE][BUGFIX] explore to featureMap

6ab3fee

fix error in everything being ms1 output and no RT. Assign the closest annotations

allow indexing in transitionTSV data access

c566457

update 2d plotter for more pythonic interface

fad6ed3

class saves the plot config, feature map is provided per instance. FeatureMaps saved in two dimension server object

[FIX] loading IM files from OSW

2c9ae07

jcharkow and others added 27 commits December 31, 2023 16:11

[FIX] check annotation column

270ff4d

fix bug where crash if annotation column not present. Now will generate annotation column if not present or it is NA

[FIX] getting items in TransitionPQPLoader

8930f22

apply same fix as TransitionTSVLoader that cannot access TransitionPQPLoader dataframe from the SpectrumLibraryLoader class

[FIX] clean up TransitionGroupFeature access methods

63aaa18

rename columns in returned dataframe

[TEST] update ionMobilityTest2 to contain IM data

d803920

[FIX] consensusApexIntensity not being recorded

318c09a

[FIX] use featureIM rather than libIM in extraction

cf1980c

[FIX] config object required by loadTransitionGroups

72066bf

Also minor documentation code linting

[FEATURE] implement loadTransitionGroupDf for mzML loader

dca34f0

[FIX] SqMassLoader errors

47e3019

implement loadTopTransitionGroupFeatureDf and fix __init__ function

[FIX] Add check for when no spectra is found, and throw a warning

2bc3a08

[ADD] UI message to inform of failed extraction

0cff531

[ADD] method to update st.write

7317ad2

[ADD] session_state variables to keep track of interactive changes

6957773

[UPDATE] caching resources

b9cf0f9

Added caching where necessary, and added checks to see if caches need ot be cleared on new input interaction Changed time_block to MeasureBlock for performance metrics

[MINOR] lint

663a944

[REMOVE] st.write debuggin statement

42d7005

[CHANGE] default plot smoothing to none instead of sgolay

7c0eb21

Update plotting for jupyter notebook compatibility

3d5cdfb

add show() methods add context property to config

Add plotting directly from TransitionGroup

cbee08e

address circular import

eb5bb5a

ensure other 3D plots work in jupyter context

7a28007

[FIX] bug of m/z heatmap

147501c

m/z heatmap with fragments split up had wrong axis, fix this

change plot sizes

e29b5c3

allow for 2 columns on multiplots

adjust ratio surface plot

a10a206

Merge pull request #63 from Roestlab/patch/notebookPlotting

4cfab95

Patch/notebook plotting

singjc merged commit c1ca78e into feature/rawdata Jan 4, 2024

singjc deleted the patch/josh_rawdata_2 branch January 7, 2024 05:58

singjc mentioned this pull request Jan 8, 2024

Feature: Raw Data Targeted Extraction #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Josh's Suggestions for rawData refactoring #56

Josh's Suggestions for rawData refactoring #56

jcharkow commented Dec 21, 2023 •

edited

Loading

jcharkow commented Dec 21, 2023

singjc commented Dec 21, 2023

jcharkow commented Dec 21, 2023

Josh's Suggestions for rawData refactoring #56

Josh's Suggestions for rawData refactoring #56

Conversation

jcharkow commented Dec 21, 2023 • edited Loading

jcharkow commented Dec 21, 2023

singjc commented Dec 21, 2023

jcharkow commented Dec 21, 2023

jcharkow commented Dec 21, 2023 •

edited

Loading