Replies: 3 comments
-
There used to be a problem with fastq-dump that I reported and was fixed here: bioconda/bioconda-recipes#31396 can you run fastq-dump (the one from the environment) and report the version fails
there is also this issue that seems to occur, also with conda based fastq-dump |
Beta Was this translation helpful? Give feedback.
-
I have also noticed that even when it works it takes a very long time to run even the test command that the official SRA page recommends:
takes almost a minute to run. I wonder wether this has to do with some flakiness at NIH. actually, a few seconds later, the same command that before executed fine now exits with this error:
|
Beta Was this translation helpful? Give feedback.
-
I was unable to reproduce the fastq-dump error, but it something that occasionally occurs. I will add a note to the book about manually installing fastq-dump As for the snpEff problem, it seems it is caused by the new version of snpEff, that was recently released, it seems that it does not operate correctly I have opened an issue with snpEff in the meantime the solution is to downgrade to snpEff 5.0
the version lock will be added to the install instructions |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
when testing the ( https://www.biostarhandbook.com/software-installation.html#run-a-realistic-analysis ) run a realistic analysis as of just now, calling fastq-dump install from handbook install sratoolkit, i assume with conda, doesn't work on new install on Windows/Ubuntu20.04.4 LTS
error was:
$ make vcf
mkdir -p reads
fastq-dump -F -X 10000 --split-files -O reads SRR1553425
Failed to call external services.
make: *** [Makefile:110: reads/SRR1553425_1.fastq] Error 64
had to download latest ubuntu version of sratoolkit from ncbi @
http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
i unpacked and put in home/myusername/bin folder created for doctor.py
and added the sra's bin to the path in .bashrc
export PATH=~/bin/sratoolkit.2.11.2-ubuntu64/bin:$PATH
seemed to better, output pasted below, down to a new error:
$ make vcf
mkdir -p reads
fastq-dump -F -X 10000 --split-files -O reads SRR1553425
Read 10000 spots for SRR1553425
Written 10000 spots for SRR1553425
Will generate both adapters.
echo ">illumina" > reads/adapter.fa
echo "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" >> reads/adapter.fa
echo ">nextera" >> reads/adapter.fa
echo "CTGTCTCTTATACACATCTCCGAGCCCACGAGAC" >> reads/adapter.fa
Apply the trimming.
trimmomatic PE -threads 4 -phred33 -basein reads/SRR1553425_1.fastq -baseout reads/SRR1553425.fq
ILLUMINACLIP:reads/adapter.fa:2:30:5 SLIDINGWINDOW:4:15 MINLEN:50
TrimmomaticPE: Started with arguments:
-threads 4 -phred33 -basein reads/SRR1553425_1.fastq -baseout reads/SRR1553425.fq ILLUMINACLIP:reads/adapter.fa:2:30:5 SLIDINGWINDOW:4:15 MINLEN:50
Using templated Input files: reads/SRR1553425_1.fastq reads/SRR1553425_2.fastq
Using templated Output files: reads/SRR1553425_1P.fq reads/SRR1553425_1U.fq reads/SRR1553425_2P.fq reads/SRR1553425_2U.fq
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 10000 Both Surviving: 9834 (98.34%) Forward Only Surviving: 96 (0.96%) Reverse Only Surviving: 30 (0.30%) Dropped: 40 (0.40%)
TrimmomaticPE: Completed successfully
mkdir -p bam
Note how we filter alignment for mapped reads only.
bwa mem -t 4 refs/AF086833.fa reads/SRR1553425_1P.fq reads/SRR1553425_2P.fq | samtools view -b -F 4 | samtools sort -@ 4 > bam/SRR1553425-AF086833.bam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 19668 sequences (1965522 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1054, 7603, 33, 974)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (99, 177, 270)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 612)
[M::mem_pestat] mean and std.dev: (193.76, 119.01)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 783)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (193, 280, 387)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 775)
[M::mem_pestat] mean and std.dev: (298.31, 140.61)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 969)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (39, 98, 259)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 699)
[M::mem_pestat] mean and std.dev: (146.67, 163.09)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 919)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (105, 177, 279)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 627)
[M::mem_pestat] mean and std.dev: (199.78, 122.85)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 801)
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 19668 reads in 1.062 CPU sec, 0.276 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 4[bam_sort_core] merging from 0 files and 4 in-memory blocks...
refs/AF086833.fa reads/SRR1553425_1P.fq reads/SRR1553425_2P.fq
[main] Real time: 0.376 sec; CPU: 1.094 sec
samtools index bam/SRR1553425-AF086833.bam
samtools flagstat bam/SRR1553425-AF086833.bam
20615 + 0 in total (QC-passed reads + QC-failed reads)
19569 + 0 primary
0 + 0 secondary
1046 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
20615 + 0 mapped (100.00% : N/A)
19569 + 0 primary mapped (100.00% : N/A)
19569 + 0 paired in sequencing
9785 + 0 read1
9784 + 0 read2
19478 + 0 properly paired (99.53% : N/A)
19568 + 0 with itself and mate mapped
1 + 0 singletons (0.01% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
mkdir -p vcf
bcftools mpileup -O v -f refs/AF086833.fa bam/SRR1553425-AF086833.bam | bcftools call --ploidy 1 -mv -O z -o vcf/SRR1553425-AF086833.vcf.gz
[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 250
bcftools index vcf/SRR1553425-AF086833.vcf.gz
VCF file: vcf/SRR1553425-AF086833.vcf.gz
Snpeff needs the files in specific folders.
mkdir -p data/AF086833
Download the GenBank file, has to be called genes.gbk.
bio fetch AF086833 > data/AF086833/genes.gbk
Append entry to current genome to the config.
echo "AF086833.genome : AF086833" >> snpEff.config
Build the snpEff database.
snpEff build -v AF086833
00:00:00 SnpEff version SnpEff 5.1 (build 2022-01-21 06:23), by Pablo Cingolani
00:00:00 Command: 'build'
00:00:00 Building database for 'AF086833'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'AF086833'
00:00:00 Reading config file: /home/nbowen/work/snpEff.config
00:00:00 done
00:00:00 Chromosome: 'AF086833.2' length: 18959
00:00:00 Create exons from CDS (if needed):
...........00:00:00 Exons created for 9 transcripts.
00:00:00 Deleting redundant exons (if needed):
00:00:00 Total transcripts with deleted exons: 0
00:00:00 Collapsing zero length introns (if needed):
.
0 00:00:00 Total collapsed transcripts: 1
00:00:00 Adding genomic sequences to genes:
00:00:00 Done (4 sequences added).
00:00:00 Adding genomic sequences to exons:
00:00:00 Done (10 sequences added, 0 ignored).
00:00:00 Finishing up genome
00:00:00 Adjusting transcripts:
00:00:00 Adjusting genes:
WARNING_GENE_COORDINATES: Gene 'Gene_8287_9739' (name:'VP30'), adjusting start coordinate from 8287 to 8508
WARNING_GENE_COORDINATES: Gene 'Gene_8287_9739' (name:'VP30'), adjusting end coordinate from 9739 to 9374
WARNING_GENE_COORDINATES: Gene 'Gene_9884_11517' (name:'VP24'), adjusting start coordinate from 9884 to 10344
WARNING_GENE_COORDINATES: Gene 'Gene_9884_11517' (name:'VP24'), adjusting end coordinate from 11517 to 11099
WARNING_GENE_COORDINATES: Gene 'Gene_11500_18281' (name:'L'), adjusting start coordinate from 11500 to 11580
WARNING_GENE_COORDINATES: Gene 'Gene_11500_18281' (name:'L'), adjusting end coordinate from 18281 to 18218
WARNING_GENE_COORDINATES: Gene 'Gene_3031_4406' (name:'VP35'), adjusting start coordinate from 3031 to 3128
WARNING_GENE_COORDINATES: Gene 'Gene_3031_4406' (name:'VP35'), adjusting end coordinate from 4406 to 4150
WARNING_GENE_COORDINATES: Gene 'Gene_55_3025' (name:'NP'), adjusting start coordinate from 55 to 469
WARNING_GENE_COORDINATES: Gene 'Gene_55_3025' (name:'NP'), adjusting end coordinate from 3025 to 2688
WARNING_GENE_COORDINATES: Gene 'Gene_4389_5893' (name:'VP40'), adjusting start coordinate from 4389 to 4478
WARNING_GENE_COORDINATES: Gene 'Gene_4389_5893' (name:'VP40'), adjusting end coordinate from 5893 to 5458
WARNING_GENE_COORDINATES: Gene 'Gene_5899_8304' (name:'GP'), adjusting start coordinate from 5899 to 6038
WARNING_GENE_COORDINATES: Gene 'Gene_5899_8304' (name:'GP'), adjusting end coordinate from 8304 to 8067
00:00:00 Adjusting chromosomes lengths:
00:00:00 Ranking exons:
00:00:00 Create UTRs from CDS (if needed):
00:00:00 Remove empty chromosomes:
00:00:00 Marking as 'coding' from CDS information:
00:00:00 Done: 0 transcripts marked
00:00:00
00:00:00 Caracterizing exons by splicing (stage 1) :
00:00:00 Caracterizing exons by splicing (stage 2) :
00:00:00 done.
00:00:00 [Optional] Rare amino acid annotations
WARNING_FILE_NOT_FOUND: Rare Amino Acid analysis: Cannot read protein sequence file '/home/nbowen/work/./data/AF086833/protein.fa', nothing done.
ERROR: CDS check file '/home/nbowen/work/./data/AF086833/cds.fa' not found.
00:00:00 Protein check file: '/home/nbowen/work/./data/AF086833/genes.gbk'
00:00:00 Checking database using protein sequences
00:00:00 Comparing Proteins...
Labels:
'+' : OK
'.' : Missing
'*' : Error
++++++++*00:00:00
00:00:00 Protein sequences comparison failed!
ERROR: Database check failed.
00:00:00 Logging
00:00:01 Checking for updates...
00:00:01 Done.
make: *** [Makefile:154: data/AF086833/snpEffectPredictor.bin] Error 255
(bioinfo)
then:
$ ls
Makefile bam data reads refs snpEff.config sratoolkit.tar.gz vcf
(bioinfo)
$ find .
.
./bam
./bam/SRR1553425-AF086833.bam
./bam/SRR1553425-AF086833.bam.bai
./data
./data/AF086833
./data/AF086833/genes.gbk
./Makefile
./reads
./reads/adapter.fa
./reads/SRR1553425_1.fastq
./reads/SRR1553425_1P.fq
./reads/SRR1553425_1U.fq
./reads/SRR1553425_2.fastq
./reads/SRR1553425_2P.fq
./reads/SRR1553425_2U.fq
./refs
./refs/AF086833.fa
./refs/AF086833.fa.amb
./refs/AF086833.fa.ann
./refs/AF086833.fa.bwt
./refs/AF086833.fa.fai
./refs/AF086833.fa.pac
./refs/AF086833.fa.sa
./refs/AF086833.gff
./refs/NC_045512.fa
./refs/NC_045512.fa.amb
./refs/NC_045512.fa.ann
./refs/NC_045512.fa.bwt
./refs/NC_045512.fa.fai
./refs/NC_045512.fa.pac
./refs/NC_045512.fa.sa
./refs/NC_045512.gff
./snpEff.config
./sratoolkit.tar.gz
./vcf
./vcf/SRR1553425-AF086833.vcf.gz
./vcf/SRR1553425-AF086833.vcf.gz.csi
(bioinfo)
Beta Was this translation helpful? Give feedback.
All reactions