-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Josh's Suggestions for rawData refactoring #56
Conversation
given class functionality mzMLDataAccess is a more appropriate name
GenericResultsAcess - abstract class outlining methods that should be implemented in results files current children are OSWDataAccess and ResultsTSVDataAccess ResultsTSVDataAccess - implenmentation of GenericResultsAccess for DIA-NN .tsv file
results TSV data access does not load all memory as per Justin's implementation
loosely based off reportLoader, similar to SqMassLoader but for mzMLFiles
these are more low level loaders so should be access, high level implementation is SpectralLibraryLoader
mzMLLoader links a spectral library, results file and mzMLfile to do on the fly extraction of a given peptide and charge. Note that the peptide must be found in the experiment to know where the feature is.
@singjc I know that this is still in progress but can you please have a quick look (even just at the description) to let me know if this refactoring sounds ok? E.g. not going to screw everything up? |
refactor for usage with new mzMLDataLoader interface
@jcharkow Looks fine / makes sense. I renamed the reportLoader already in the main branch, so there may be some conflict there. I just pulled in the most recent changes from the main feature/rawdata branch, but there are some conflicts with some of the changes I was working on. I will fix those and then probably leave it for you to work on the rest of the refactoring. I did update the oswDataAccess for the get_top_rank_feature methods to use a feature hash table to only index on indices. |
Thanks for looking it over and letting me refactor your code. I'm sorry that it seems we are getting a lot of conflicts but hopefully, this leads to a more unified interface overall :) |
…atch/josh_rawdata_2
specifically related to loading DIA-NN file with mzMLLoader
now only load the full data if required, this should speed up the mzML data access a bit?
fix error in everything being ms1 output and no RT. Assign the closest annotations
class saves the plot config, feature map is provided per instance. FeatureMaps saved in two dimension server object
fix bug where crash if annotation column not present. Now will generate annotation column if not present or it is NA
apply same fix as TransitionTSVLoader that cannot access TransitionPQPLoader dataframe from the SpectrumLibraryLoader class
rename columns in returned dataframe
Also minor documentation code linting
implement loadTopTransitionGroupFeatureDf and fix __init__ function
restructed checking for the need to generate the annotaiton column previous structure would pass the check for anntation column being present, but would fail inner if statement checking for NULL or NA assignment in the annotation column. This results in stmt variable never being defined.
SpectralLibraryLoader is mean to be the main class for loading transition files. Use the Individual access class to retrieve the data based on file type, and then return the data as a dataframe and store that as the `data` attribute in the SpectralLibraryLoader class. This avoids instances of `data.data.pd.DataFrame` and avoids issues when caching data in streamlit. On first execution, the `data.data` attribute contains the Df from the access class. However, after caching `data.data` no longer represents the access class, but the actual retrieved pd.Dataframe.
Added caching where necessary, and added checks to see if caches need ot be cleared on new input interaction Changed time_block to MeasureBlock for performance metrics
add show() methods add context property to config
m/z heatmap with fragments split up had wrong axis, fix this
allow for 2 columns on multiplots
Patch/notebook plotting
Here is my attempt at refactoring the rawData a bit. The main idea of this refactor is to make the loader interface more consistent between SqMassLoader and the Raw data extraction for easier python usage.
Note: This has not been tested and is likely very buggy but I wanted to start the PR early so everyone is up to date.
Major Changes include:
- e.g. this is direct pyopenms methods and SQL queries
Server methods will have to be adjusted based on this which I am working on currently.