Pride submission prep #43

enryH · 2023-03-09T10:49:32Z

Pride data release preparation.

pride submission scripts -> 00_0_*.ipynb
update erda (FTP server) notebooks, build dumps for pride of 7,444 selected files -> erda_*.ipynb
add python version of notebooks for better diffs in the future

- save some intermediate files - visualize some stats

- prepare upload to pride (which unified and annonymized identifiers)

- data curation - allow selection of raw data

- only load intensities - transpose and create mask view in separate document - dump counts for samples and features

- create machine specific subfolders for pride - instrument_name added for subfolders

- don't support long data for now - skip categorical data checking keep old code as comments for now (as a reminder)

- was at some point used to investigate which data to use

- notebook is for exploration of single MaxQuant folder

- erda notebooks create dumps which are then processed in "hela" notebooks - rename and describe

- create folder and put commands for raw files - use -f for using commands read from a file with lftp - start uploading

- sanity checks and upload missing or incomplete files - rename sample names in MQ output

technology type -> indicates that it is not RNA (MAGE-TAB format)

- all 7444 selected files for upload are used to create unified dums - small plotting improvements and minor other changes

- check all files are in list of files (queried from server) - create some dummy files (placeholders) locally for pride submission tool - manuelly annotate the submission.px text file from the submission tool (basically: add files) - 🐛 SDRF file had ontology issues (and cellline template was not enough)

- plots based on metadata - metadata is provided on pride ("pride_metadata.csv") Splitted metadata creation from analysis

- relevant information of mq_summaries.csv also provided in metadata_pride.csv

Henry added 25 commits March 9, 2023 10:35

🎨 sync all erda notebooks

1edda16

🎨 sync and slightly update meta data nbs

55c8d48

- save some intermediate files - visualize some stats

✨ new id and sftp commands

bfdca02

- prepare upload to pride (which unified and annonymized identifiers)

✨ Merge and analyze metadata for report

a19174f

- data curation - allow selection of raw data

🎨🚧 clean-up and multiprocessing

903ddee

🎨 clean-up

5a0c4ff

🐛 pass parameter

009a859

🎨 clean-up data aggregation

7070161

- only load intensities - transpose and create mask view in separate document - dump counts for samples and features

✨ lftp commands and renaming

bae4803

- create machine specific subfolders for pride - instrument_name added for subfolders

🚧 long -> wide data

09a5f52

- don't support long data for now - skip categorical data checking keep old code as comments for now (as a reminder)

🔥 remove old notebook

3a10433

- was at some point used to investigate which data to use

🐛🎨 brief update [to clean-up!]

77b0ad3

- notebook is for exploration of single MaxQuant folder

🎨 labels and save file

4229ee2

🎨 rename and re-order hela dev data

5e7f8af

- erda notebooks create dumps which are then processed in "hela" notebooks - rename and describe

🚧 update lftp commands

ac2d966

- create folder and put commands for raw files - use -f for using commands read from a file with lftp - start uploading

🐛 also create parents in case it's needed

db4cf1d

🎨 add log of current mirror command, remove parallel upload

8bccd2a

✨ check transferred files, rename samples in MQ output

6f6f164

- sanity checks and upload missing or incomplete files - rename sample names in MQ output

✨ create SDRF file outline

602ff7c

🐛 add type

f7c6179

technology type -> indicates that it is not RNA (MAGE-TAB format)

🎨🚧 update dumps for pride upload

3f10223

- all 7444 selected files for upload are used to create unified dums - small plotting improvements and minor other changes

🎨 dump all MQ summaries for curated pride dataset

7d10dca

✨🎨 Separate metadata creation from analysis

d51162b

- plots based on metadata - metadata is provided on pride ("pride_metadata.csv") Splitted metadata creation from analysis

🎨 update to pride file - mq_summaries.csv

c89c769

- relevant information of mq_summaries.csv also provided in metadata_pride.csv

enryH changed the base branch from dev to extend_comparison May 16, 2023 16:39

enryH changed the base branch from extend_comparison to dev May 16, 2023 16:40

enryH changed the title ~~Pride prep~~ Pride submission prep May 16, 2023

enryH merged commit 85c06f6 into dev May 17, 2023

enryH deleted the pride_prep branch May 18, 2023 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pride submission prep #43

Pride submission prep #43

Uh oh!

enryH commented Mar 9, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pride submission prep #43

Pride submission prep #43

Uh oh!

Conversation

enryH commented Mar 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enryH commented Mar 9, 2023 •

edited

Loading