Skip to content

Conversation

@enryH
Copy link
Member

@enryH enryH commented Apr 11, 2023

  • increase interoperability by moving from pkl default to csv (implemented before, but now set as a default to integrate NAguideR)
  • create empty data folder (done by snakemake, but ensure it runs by notebook execution)
  • some import removed which were obsolete

Henry added 2 commits April 11, 2023 17:49
- increase interoperability by moving from pkl default to csv
  (implemented before, but now set as a default to integrate NAguideR)
- create empty data folder (done by snakemake,
   but ensure it runs by notebook execution)
- some import removed which were obsolete
- add model specific config ("train_{model}.yaml")
- add scikit-learn KNNimputer to comparsion
- 🐛 set data splits format to csv (-> for R based methods to work)
- 🎨 remove unused imports
@enryH enryH force-pushed the extend_comparison branch 2 times, most recently from c6c115f to 579a9ea Compare April 13, 2023 13:19
- after playing a lot with R the last week,  add some NAGuideR method
   -> installation issues to fix for some others
- Basic R integration:
  - transfer data to format expected by NAGuideR (01_0_transform_data_to_wide_format.ipynb)
  - run selected R methods based on function provided in NAGuideR shiny app
    (01_1_train_NAGuideR_methods.ipynb)
  - transfers results back into format expected by workflow
     (01_1_transfer_NAGuideR_pred.ipynb)
- workflow updated:
  - base intermediate results on csv format
  - query for predictions
  - add R methods to config
- environment: try to install some packages using conda
@enryH enryH force-pushed the extend_comparison branch 8 times, most recently from d72f317 to 7e0a753 Compare April 14, 2023 08:53
- conda-forge has to be default channel (first), otherwise r packages
  might not work: conda-forge/r-stringi-feedstock#13
  - restrict pandas to version 1
- set channel priority to strict
- check some r imports
@enryH enryH force-pushed the extend_comparison branch from 7e0a753 to 19a5373 Compare April 14, 2023 09:39
Henry added 15 commits April 14, 2023 16:33
- ✨ add NAGuideR methods which need local installation
- 🐛 fix Snakefile to consider all models in comparison
- 🎨 adapt plot (legends) or notebook descriptions
Some packages are not available via conda. Install manuelly
  (as before locally provided packages)
- impseq, impseqrob -> rrcovNA
- qrilc, mindet, minprob
- single train config per model
- added some NAGuideR methods  (some failed)
- collect runtime of entire nb using benchmark directive from snakemake
  (comparison distorted as nb partly extend analysis)
- set patience as input parameter for early stopping
- collect data dumps and figures saved to disk
- save more data used for figures to disk
- only plot performance annotation on bar plots if there is a bar

-> prepare for manuel aggreation of best models plots
- pick best models as defined in default workflow
- needed to change some data handling
- repeat splitting the data with different seeds
  (so not fold splitting, but random splitting)
- added Snakefile for that
- move collecting of metrics to separate notebook (easier to debug)
- permute protein data per feature
  -> best model is no the median model
- in order to pass "None" as argument papermill needs the interpreted
  parameter for the metadata ("-p")
- for both peptides and evidence less than 50 samples are retained
   (-> lower treshold of sample completeness?)
- datasets are the different levels: protein group, peptide and precursor
- change legend of binned errors plot
- for peptides and precursors -> allow less feat. per sample
  -> retain all 50 samples
  -> see second step Fig. S1
- actually give a better impression of figure size (dpi setting)
- set fonts explicitly for Fig. 2
- add meta data to figures: N samples, M feat.
Idea will be to run it for each dataset for diagnosis
- create data visualization plots
- group plotting functions
- make baseline model choice available
- change params and outputfolder structure
- diff analysis should be run one by one,
  then aggregation is in 10_3_ald_compare_methods.py
So far done for features that were shared between the approaches and
that had competing outcomes in the diff. analysis.
- bin features by median (by median's integer values)
- plot prop. missing for each feature in bin
Henry added 10 commits May 5, 2023 16:55
- different kind of data visualizations
- default: example data provided with the package,
  - two configs for two analyzed datasets
- structure and format notebook
- add clustermaps of hierachical clustering (seaborn)
- add heatmaps (based on hierarchial clustering)

-> todo: unify plotting layout (check defaults which are set,
  rather pick up from rc)
 (no imputation added) - 10_1
   Aggregation of scores removed, run one by one
- compare two imputation choices againt
   each other (10_2), aggregate there
- add # samples linear regression is based on
  to differential analysis
- dump results of logistic regression
   -> allow custom plots later

Having a different setup in the ALD study
than others is still a bit tricky. Next is to
generate visualizations for intensities (add more models)
- Add more methods to histogram and  swarmplot of
 measured values vs imputed values

- next: Update workflow, limit models to requested?
- adapt workflow
- and configs

to changes in nbs setup

Next is to add the last notebook.
- for now used for ALD data, but should be
  relatively general already to be used with other data
- new rule added for last notebook
  (10_4_ald_compare_single_pg)
- small adaptions performed
- rules ordered by notebook number
- add optional feature anntotations on scores (10_1_ald_diff_analysis.ipynb)
  - remove from 10_2_ald_compare_methods.ipynb annotations part
- dump qvalues and rejection of null hypothesis (equality)
   for further analysis in 10_4_ald_compare_singe_pg.ipynb
- add new outputs to workflow(s)
- use snakemake to execute snakemake
- aggregate results of repeated workflow runs
- option to not have only a few lables shown in
   case of high-dimensional data
- other customizations
…nd_comparison

- remove conflict in data_splitting notebook
  (removed comments of other configuratoins)
@enryH enryH merged commit d8a1f0f into dev May 17, 2023
@enryH enryH deleted the extend_comparison branch May 18, 2023 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants