Channels not found with grep to $all_reads #1

vsoch · 2017-09-22T21:18:07Z

hey @amojarro ! I'm working on some singularity images (like Docker but safe for HPC) to go along with a publication for an internal container organization format, and was recommended to use your pipeline by one of the community (do you know Pim?) I'm doing well - I have two versions of the container:

https://github.com/vsoch/carrierseq/tree/singularity

but I am hitting a snag. This call:

grep -Eo '_ch[0-9]+_|ch=[0-9]+' $all_reads > $output_folder/06_poisson_calculation/01_reads_channels.lst

returns nothing. I am using the data that you linked, and thinking either it changed or the call with grep should be adjusted. What happens after nothing is found is the python script obviously gets angry when 0 is given for the denominator.

Thanks for your help with this!

The text was updated successfully, but these errors were encountered:

vsoch · 2017-09-22T23:34:05Z

and see here --> https://github.com/vsoch/carrierseq/blob/singularity/docs/singularity.md for the overall idea.

amojarro · 2017-09-24T14:37:27Z

Hi @vsoch, It looks like the Sequence Read Archive (SRA) has replaced the original read headers.

Normally, the sequence data would contain either the output information from the Albacore basecaller or from a Poretools fastq conversion command (fast5 > fastq).

For example, Albacore would look like [read ID run ID read channel start_time]:

@cc74d4a9-b62f-4274-86d0-7d95370b6aba runid=55268 read=23015 ch=434 start_time=2017-06-22T17:44:34Z

And Poretools [read ID path/to/fast5]:

@channel_434_cc74d4a9-b62f-4274-86d0-7d95370b6aba_template /Users/mojarro/Documents/Sequencing/Low_Input_Sequencing/minknow_1_5_18/fast5/pass/127/VENUSAUR_20170511_FNFAE22530_MN17220_sequencing_run_sample_id_55268_ch434_read23015_strand.fast5

However, the header information has now been replaced with an SRA ID and only the read ID:

>gnl|SRA|SRR5935058.1 895b5243-42d4-4cc6-8b5b-c29c813bf663_Basecall_1D template (Biological)

Thank you for the comment, I will investigate how to preserve the original metadata on NCBI. In the meantime I have uploaded the original fastq file to dropbox.

https://www.dropbox.com/sh/vyor82ulzh7n9ke/AAC4W8rMe4z5hdb7j4QhF_IYa?dl=0

vsoch · 2017-09-24T14:46:43Z

Fantastic! Thanks for your quick response and looking into this - I'll give it another try with the updated file, and will keep a lookout from updates from you here. A similar thing happened to me and a colleague with data URLs, and we ultimately opted to serve the data ourselves.

Update/scif

vsoch closed this as completed Sep 26, 2017

amojarro pushed a commit that referenced this issue Feb 16, 2018

Merge pull request #1 from vsoch/update/scif

65f5fe5

Update/scif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channels not found with grep to $all_reads #1

Channels not found with grep to $all_reads #1

vsoch commented Sep 22, 2017

vsoch commented Sep 22, 2017

amojarro commented Sep 24, 2017

vsoch commented Sep 24, 2017

Channels not found with grep to $all_reads #1

Channels not found with grep to $all_reads #1

Comments

vsoch commented Sep 22, 2017

vsoch commented Sep 22, 2017

amojarro commented Sep 24, 2017

vsoch commented Sep 24, 2017