-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Study Submission Process Refactor #753
Labels
Milestone
Comments
8 tasks
8 tasks
hepcat72
changed the title
Automatic stub creation of data submission
Study Submission Process refactor (aka, Automatic stub creation)
Dec 28, 2023
hepcat72
changed the title
Study Submission Process refactor (aka, Automatic stub creation)
Study Submission Process Refactor
Dec 29, 2023
This was referenced Dec 30, 2023
Closed
Closed
Closed
This was referenced Dec 30, 2023
Closed
Closed
This was referenced Jan 18, 2024
8 tasks
8 tasks
All items completed, changed, or transferred to separate issues. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
formerly... "Automatic stub creation"
This issue has been converted to an issue-tracking issue. Requirements and design may be modified (and this issue's contents stale), so refer to the issues linked under dependencies for the latest design/requirements.
FEATURE REQUEST
Inspiration
I noted that issue #705 was similar to my submission process proposal, so it inspired me to codify my proposal in an issue.
Description
Based on my submission process proposal from March and compiled and annotated in my full proposal, I think that we should use the version 3.0 effort as an opportunity to streamline the sheets in the excel file like I have in my (process) proposal:
New "Study" (e.g. Excel) doc
compounds.tsv
.Much of the above will be pre-populated (see discussion) by the validation interface. All the validation interface will require will be:
Optional additional inputs for validation:
The process will go like this:
Alternatives
None
Dependencies
This is an issue-tracking issue for the following issues:
1.
Make every table-based loader take either be a tab-delimited or excel file #820 (For requirements1.
&2.
) Excel/TSV input & access2.
Study Data Loader #821 (For requirements9.
,7.1.
, and8.1.
) Study Tab in excel doc3.
Tracer and infusates loader #822 (For requirement6.
,8.6.
, and8.7.
) Infustaes/Tracers Tab in excel doc4.
Modify Compounds Loader #823 (For requirements5.1.1.
,5.2
, and8.8.
) Compounds Tab in excel doc5.
Sequences Loader #824 (For requirements8.11.
and8.12.
) Sequences and Defaults Tabs in excel doc6.
Modify the accucor loader to use the study doc tabs instead of the LCMS metadata file #825 (For requirements4.
,8.9.
, and8.10.
) Peak Annotation/LCMS Tabs in excel doc7.
Modify Tissues Loader #826 (For requirements5.1.2.
,5.2
, and8.5.
) Tissues Tab in excel doc8.
Modify protocols loader #827 (For requirements5.1.3.
,5.2
, and8.4.
) Treatments Tab in excel doc9.
Break the animal sample loader into separate animal and samples loaders #828 (For requirements7.2.
,7.3.
,8.2.
, and8.3.
) Animals and Samples tabs in excel doc10.
Modifyload_study.py
to account for the new excel sheets #839 Modifyload_study.py
to handle sheets in the study doc11.
Validation interface creates and pre-populates a study excel doc #829 (For requirement3.
) Validation interface auto-population of excel docComment
sample1_pos
being the same assample1
).I created an example version of the Study Excel doc:
animal_sample_table.xlsx
ISSUE OWNER SECTION
Assumptions
None
Limitations
None
Affected Components
accucor_data_loader.py
load_study.py
validation.py
Requirements
NOTE: These requirements are NOT final/complete. They were originally drafted in this issue, but then regrouped and fleshed out in the issues that were created to break up this issue (see the Dependencies section).
1.
Every table-based input file can either be a tab-delimited or excel file2.
All excel tabs are accessed by name (not by index)3.
Validation interface modes3.1.
Peak annotation only (with optional mzXML files) Makeload_study
acceptmzxml
files (in addition to the study doc) #10863.1.1.
Runs with only accucor/isocorr file(s) (currently, it requires the sample table file, I think)3.1.2.
Generates a stubbed-out study doc with the following tabs' pre-populated fields (see8.
for all changed columns)3.1.2.1.
Samples Pre-populated Columns (based on peak annotation file contents)3.1.2.1.1.
Sample Name (a heuristic will be used to remove _scan and _charge suffixes)3.1.2.2.
Treatments (optional - required if any are new) Pre-populated Columns3.1.2.2.1.
Animal Treatment (based on Study doc, Animals tab, Treatment column contents)3.1.2.2.2.
Description (based on database, empty/required if not in DB)3.1.2.3.
Tissues (optional - required if any are new) Pre-populated Columns3.1.2.3.1.
TraceBase Tissue Name (based on Study doc, Animals tab, Tissue column contents)3.1.2.3.2.
Description (based on database, empty/required if not in DB)3.1.2.4.
Infusates Pre-populated Columns3.1.2.4.1.
Infusate Number (based on Study doc, Tracers tab contents)3.1.2.4.2.
Tracer Group Name (if exists in the database)3.1.2.4.3.
Infusate Name (based on Study doc, Infusates tab's Tracer Group Name and Tracer Name columns)3.1.2.5.
Tracers Pre-populated Columns3.1.2.5.1.
Tracer Name3.1.2.6.
Compounds (optional - required if any are new) Pre-populated Columns3.1.2.6.1.
Compound (based on peak annotation file contents)3.1.2.6.2.
Formula (based on peak annotation file contents)3.1.2.6.3.
HMDB ID (if exists in the database)3.1.2.6.4.
Synonyms (if exists in the database)3.1.2.7.
Peak Annotation Files Pre-populated Columns3.1.2.7.1.
Peak Annotation File Name (based on peak annotation file names)3.1.2.7.2.
Peak Annotation File Type (inferred from peak annotation header contents)[ ]Prefix is not necessary, given the Peak Annotation Details sheet explicitly maps sample to sample header3.1.2.7.3.
Sample Name Prefix (if not unique, uses study ID, if still not unique, uses animal ID, if still not unique, uses both. If not unique after that, it will keep both, but an error will prompt the user to manually change it.)3.1.2.8.
Peak Annotation Details Pre-populated Columns3.1.2.8.1.
Sample Name (based on heuristically modified peak annotation file contents)3.1.2.8.2.
Sample Data Header (based on peak annotation file contents)[ ]Decided not to autofill this. It could end up wrong. The default behavior would find the file anyway.3.1.2.8.3.
mzXML File Name (based on peak annotation file contents and omitted if mzXML files supplied and no match)3.1.2.8.4.
Peak Annotation File Name (based on peak annotation file name and sample header)[ ]Polarity now only comes from mzXML files3.1.2.8.5.
Polarity (based on mzXML file content - empty if no matching file)3.1.2.9.
Defaults (optional - required if any data is missing or generates errors/warnings, e.g. researcher name variation) Pre-populated Columns Add the defaults sheet to the downloaded template #1099[ ]3.1.2.9.1.
Researchers Confirmed (True if all are existing, empty/required if warnings/errors)3.2.
Study doc only(with optional mzXML files)Cannot accept mzXMLs in the form. They're too big.3.3.
Full mode: Study doc and Peak annotation (with optional mzXML files)3.4.
Fields in the stub that require manual entry should be highlighted Color excel sheet cells that have errors with cell locations #1105[ ]3.5.
Each pre-population action will be a separate method or a method that takes the tab name, column header, and row4.
The accucor data loader will4.1.
Take the study doc instead of an LCMS Metadata file[ ]4.1.1.
Merge the Peak Annotation Details and Sequences sheet[ ]4.1.2.
Re-use the LCMS metadata processing code with the new merged sheets4.2.
Use the defaults tab instead of command line options5.
The following loaders (in5.1.
) will take the study doc and meet the requirements under5.2.
5.1.
Ancillary Data Loaders5.1.1.
Compounds5.1.2.
Tissues5.1.3.
Treatments5.2.
Ancillary Data Loading Requirements5.2.1
Take either a tab-delimited file or the Study excel file5.2.2.
If no errors and not in validate mode, append rows to the consolidated data file (e.g.compounds.tsv
)5.2.3.
If no errors and not in validate mode, remove the tab from the study doc6.
New loader scripts6.1.
Tracers Loader6.2.
Infusates Loader7.
The animal sample loader will be broken up into7.1.
A study data loader7.2.
An animal loader7.3.
A sample loader8.
New Study doc (augmenting the existing animals/samples table) with the following tabs8.1.
Study Tab8.1.1.
Add Columns[ ]8.1.1.1.
Study ID8.1.1.2.
Name8.1.1.3.
Description8.2.
Animals Tab8.2.1.
Remove Columns[ ]8.2.1.1.
Study Name (moved to Study sheet)8.2.1.2.
Study Description (moved to Study sheet)8.2.1.3.
Tracer Concentrations (moved to Tracers sheet)[ ]8.2.2.
Add Columns[ ]8.2.2.1.
Study ID8.3.
Samples Tab[ ]8.3.1.
Add Columns[ ]Moved to Peak Annotation Details sheet8.3.1.1.
Skip8.4.
TreatmentsTab** (optional - required if any are new)8.5.
Tissues Tab (optional - required if any are new)8.6.
Infusates Tab8.6.1.
Add Columns8.6.1.1.
Infusate Number8.6.1.2.
Tracer Group Name8.6.1.3.
Infusate Name8.6.1.4.
Tracer Number8.6.1.5.
Tracer Concentration8.7.
Tracers Tab8.7.1.
Add Columns8.7.1.1.
Tracer Number8.7.1.2.
Compound Name8.7.1.3.
Element8.7.1.4.
Mass Number8.7.1.5.
Label Count8.7.1.6.
Label Positions8.7.1.7.
Tracer Name (based on Study doc, Tracers tab contents)8.8.
Compounds Tab (optional - required if any are new)8.9.
Peak Annotation Files tab8.9.1.
Add Columns8.9.1.1.
Peak Annotation File Name8.9.1.2.
Peak Annotation File Type8.9.1.3.
Sample Name Prefix8.10.
Peak Annotation Details Tab8.10.1.
Add Columns8.10.1.1.
Sample Name8.10.1.2.
Sample Data Header8.10.1.3.
mzXML File Name8.10.1.4.
Peak Annotation File Name[ ]8.10.1.5.
Polarity8.10.1.6.
Sequence Number8.11.
Sequences Tab8.11.1.
Add Columns8.11.1.1.
Sequence Number8.11.1.2.
Operator8.11.1.3.
Date8.11.1.4.
Instrument[ ]Replaced with sequence name8.11.1.5.
LC Protocol[ ]Replaced with sequence name8.11.1.6.
LC Run Length[ ]Replaced with sequence name8.11.1.7.
LC Description8.11.1.8.
Notes[ ]Columns were changed to sheet, header, and value8.12.
Defaults Tab (optional - required if any data is missing or generates errors/warnings, e.g. researcher name variation)[ ]8.12.1.
Add Columns[ ]8.12.1.1.
Researcher[ ]8.12.1.2.
Researchers Confirmed[ ]8.12.1.3.
Peak Annotation Format[ ]8.12.1.4.
Polarity[ ]8.12.1.5.
Sequence Date[ ]8.12.1.6.
LC Protocol Name[ ]8.12.1.7.
Instrument9.
Add a Study ID field to the Study modelDESIGN
Interface Change description
Code Change Description
I intend to perform the work in phases (with separate PRs):
1.
2.
9.
,7.1.
, and8.1.
8.6.
and8.7.
(adding those tabs and implementing all sub-items, and leaving the other sheets alone)5.1.1.
,5.2
, and8.8.
8.11.
and8.12.
8.10.
8.9.
(Peak Annotation Files Tab),8.10.
(Peak Annotation Details Tab), and4.
(accucor loader)5.1.2.
,5.2
, and8.5.
5.1.3.
,5.2
, and8.4.
7.2.
,7.3.
,8.2.
, and8.3.
3.
Tests
A test for each requirement
The text was updated successfully, but these errors were encountered: