-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running LRASms_Zymo.Rmd and hung up at learnErrors() #7
Comments
That is strange to me. As dada2 versions are updated there occasionally have been minor alterations to output of the base functions, but this seems beyond that. Can you post your full analysis script up to |
Here you go:
R version 4.3.2 (2023-10-31 ucrt) Matrix products: default locale: time zone: America/Chicago attached base packages: other attached packages: loaded via a namespace (and not attached): |
Also because I was concerned that the fastq.gz file might be different after downloading: |
An update from NCBI:
So, in short, don't use the fastq/fasta download link. Use SRA toolkit to get the sequencing data. |
Thank you for the follow ups. I suppose NCBI needs to find the ways to trim the fat and push the larger files into cloud services.
However, I noticed some of the other original sequences are stuck behind AWS s3 which is paywall-ish. One other thing I caught in the zymo code is that "tax" variable returns a table that has that has "Escherichia" as a "typo" in the silva call which someone serendipitously asked about in Stack Overflow: https://stackoverflow.com/questions/77170403/error-while-running-pacbio-dada2-workflow/78004296#78004296 |
Hi benjjneb,
I've downloaded the fastq.gz from NCBI pertaining to zymo_CCS_99.9.fastq.gz (SRR9089357), and going through the LRASms_Zyme.Rmd and i'm hung up on line ~92, err <- learnErrors(drp, BAND_SIZE=32, multithread=TRUE, errorEstimationFunction=dada2:::PacBioErrfun)
The output is: Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
the dada2 version is 1.30.0
I'm comparing the my running Rmd to your github LRASms_Zymo.html and I'm noticing some differences:
track <- fastqFilter(nop, filt, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2, verbose=TRUE)
Warning: '.\Zymo\noprimers\filtered' already existsOverwriting file:./Zymo//noprimers/filtered/zymo_CCS_99_9.fastq.gz
Read in 73057, output 72940 (99.8%) filtered sequences.
drp <- derepFastq(filt, verbose=TRUE)
Dereplicating sequence entries in Fastq file: ./Zymo//noprimers/filtered/zymo_CCS_99_9.fastq.gz
Encountered 22309 unique sequences from 72940 total sequences read.
err <- learnErrors(drp, BAND_SIZE=32, multithread=TRUE, errorEstimationFunction=dada2:::PacBioErrfun)
Error in getErrors(err, enforce = TRUE) : Error matrix is NULL.
I do not know if higher output reads 72940 (99.8%) vs 69367(94.9%) has anything to do with the ultimate learnErrors() error.
I had a similar situation running the LRASms_fecal.Rmd as well.
thank you for your time,
Mark
The text was updated successfully, but these errors were encountered: