Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating runinfo CSV - #8

Open
ctb opened this issue Oct 15, 2022 · 8 comments
Open

updating runinfo CSV - #8

ctb opened this issue Oct 15, 2022 · 8 comments

Comments

@ctb
Copy link
Owner

ctb commented Oct 15, 2022

search at https://www.ncbi.nlm.nih.gov/sra/ for "METAGENOMIC"[Source] NOT amplicon[All Fields]

direct link courtesy of luiz:

https://www.ncbi.nlm.nih.gov/sra/?term=%22METAGENOMIC%22%5BSource%5D%20NOT%20amplicon%5BAll%20Fields%5D

to download file:

send to... summary.

consider updating file https://osf.io/download/762mk/ referenced over at https://github.com/sourmash-bio/2022-search-sra-with-mastiff/blob/main/interpret-sra-live.ipynb

@ctb
Copy link
Owner Author

ctb commented Oct 15, 2022

planning to keep latest runinfo here on farm:

~ctbrown/transfer/sra-runinfo-latest.tar.gz

@ctb
Copy link
Owner Author

ctb commented Oct 15, 2022

so I am very confused about all of this 😆 - the runinfo file produced by the above contains the following headers:

"Experiment Accession","Experiment Title","Organism Name","Instrument","Submitter","Study Accession","Study Title","Sample Accession","Sample Title","Total Size, Mb","Total RUNs","Total Spots","Total Bases","Library Name","Library Strategy","Library Source","Library Selection"

but I also want these headers which come from ...somewhere else and are per-run. Working on it.

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash

@ctb
Copy link
Owner Author

ctb commented Oct 15, 2022

ok, when sending to file select 'Run Info'. Doesn't work for whole thing but.

@ctb
Copy link
Owner Author

ctb commented Oct 15, 2022

this is useful: https://eaton-lab.org/articles/sra-downloads/

@ctb
Copy link
Owner Author

ctb commented Oct 15, 2022

@ctb
Copy link
Owner Author

ctb commented Oct 15, 2022

aaaaand maybe we should be using bigquery (or NCBI wants us to use it, whichever) - https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery/

@ctb
Copy link
Owner Author

ctb commented Oct 15, 2022

ok finally found my code for doing this on a sample by sample basis -

https://github.com/dib-lab/2022-sra-gather/blob/main/summarize-sample.py

run here:

https://github.com/dib-lab/2022-sra-gather/blob/main/Snakefile

@ctb
Copy link
Owner Author

ctb commented Dec 15, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant