Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembling failed #3

Closed
SolayMane opened this issue Jul 23, 2018 · 8 comments
Closed

Assembling failed #3

SolayMane opened this issue Jul 23, 2018 · 8 comments

Comments

@SolayMane
Copy link

I try to assemble chloroplast from raw reads using
get_organelle_reads.py -1 /sanhome2/trimmed/out2045_1.clean.fastq -2 /sanhome2/trimmed/out2045_2.clean.fastq -s pc_ref.fa -w 103 -J 3 -M 5 -o chloro_out -R 5 -k 75,85,95,105 -P 1000000
I have about 70 000 000 150 PE reads.

I got error assembling failed and I thinkj because filtred paired reads files were empty and I don't know why.
Below somes logfiles to undersatnd this issue:

2018-07-20 17:50:45,343 - INFO: Separating filtered fastq file finished!
2018-07-20 17:50:47,679 - INFO: Assembling using SPAdes ...
2018-07-20 17:50:48,293 - ERROR: Error in SPAdes:
== Error == system call for: "['/home1/software/SPAdes-3.11.1-linux/bin/hammer', '/sanhome2/Organnelle/chloro_out/filtered_spades/corrected/configs/config.info']" finished abnormally, err code: 255

2018-07-20 17:50:48,298 - ERROR: Assembling failed.

Total Calc-cost 20067.3784549
Thanks you!
#############
config.info
; = HAMMER =
; input options: working dir, input files, offset, and possibly kmers
dataset /sanhome2/Organnelle/chloro_out/filtered_spades/input_dataset.yaml
input_working_dir /sanhome2/Organnelle/chloro_out/filtered_spades/tmp/hammer_BH7RTS
input_trim_quality 4
input_qvoffset
output_dir /sanhome2/Organnelle/chloro_out/filtered_spades/corrected

; == HAMMER GENERAL ==
; general options
general_do_everything_after_first_iteration 1
general_hard_memory_limit 250
general_max_nthreads 4
general_tau 1
general_max_iterations 1
general_debug 0

; count k-mers
count_do 1
count_numfiles 16
count_merge_nthreads 4
count_split_buffer 0
count_filter_singletons 0

; hamming graph clustering
hamming_do 1
hamming_blocksize_quadratic_threshold 50

; bayesian subclustering
bayes_do 1
bayes_nthreads 4
bayes_singleton_threshold 0.995
bayes_nonsingleton_threshold 0.9
bayes_use_hamming_dist 0
bayes_discard_only_singletons 0
bayes_debug_output 0
bayes_hammer_mode 0
bayes_write_solid_kmers 0
bayes_write_bad_kmers 0
bayes_initial_refine 1

; iterative expansion step
expand_do 1
expand_max_iterations 25
expand_nthreads 4
expand_write_each_iteration 0
expand_write_kmers_result 0

; read correction
correct_do 1
correct_discard_bad 0
correct_use_threshold 1
correct_threshold 0.98
correct_nthreads 4
correct_readbuffer 100000
correct_stats 1

Thank you for youre help!

@Kinggerm
Copy link
Owner

Can you show me the intact log file?
As you said, filtred paired reads files were empty. So it could be the problem with reads extending. I need your intact log to help you.
By the way, you had 10G raw data, that's really too much and unnecessary. But this is another thing and should not be the reason for the failure.

@SolayMane
Copy link
Author

below the content of log file:

GetOrganelle v1.0.3a

This pipeline get organelle reads and genomes from genome skimming data by extending.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

/home1/software/GetOrganelle/get_organelle_reads.py -1 /sanhome2/trimmed/out2045_1.clean.fastq -2 /sanhome2/trimmed/out2045_2.clean.fastq -s pc_ref.fa -w 103 -J 3 -M 5 -o chloro_out -R 5 -k 75,85,95,105 -P 1000000

2018-07-20 12:16:20,997 - INFO: Unzipping reads ...
2018-07-20 12:16:20,997 - INFO: Unzipping reads finished.

2018-07-20 12:16:20,998 - INFO: Reading seeds ...
2018-07-20 12:16:20,998 - INFO: Making seed - bowtie2 index ...
2018-07-20 12:16:21,195 - INFO: Making seed - bowtie2 index finished.
2018-07-20 12:16:21,195 - INFO: Mapping reads to seed - bowtie2 index ...
2018-07-20 12:42:44,225 - INFO: Mapping finished.
2018-07-20 12:42:44,225 - INFO: Reading seeds finished.

2018-07-20 12:42:44,226 - INFO: Pre-reading fastq ...
2018-07-20 13:01:24,167 - INFO: 133898628 candidates in all 154713318 reads
2018-07-20 13:01:24,629 - INFO: Pre-reading fastq finished.

2018-07-20 13:01:24,630 - INFO: Pre-grouping reads...
2018-07-20 13:01:38,120 - INFO: 1000000/9475444 used/duplicated
2018-07-20 13:05:27,871 - INFO: 53791 groups made.

2018-07-20 13:05:56,287 - INFO: Adding initial words ...
2018-07-20 13:16:32,772 - INFO: Adding initial words finished.

2018-07-20 13:16:32,773 - INFO: Extending ...
2018-07-20 13:36:12,155 - INFO: Round 1: 133898628/133898628 AI 20604128 AW 331308344
2018-07-20 14:11:32,280 - INFO: Round 2: 133898628/133898628 AI 30262507 AW 431633632
2018-07-20 14:32:01,445 - INFO: Round 3: 133898628/133898628 AI 37325034 AW 507733592
2018-07-20 14:53:01,108 - INFO: Round 4: 133898628/133898628 AI 42667564 AW 566007848
2018-07-20 15:14:54,873 - INFO: Round 5: 133898628/133898628 AI 46775908 AW 611042474
2018-07-20 15:14:54,875 - INFO: Hit the round limit 5 and terminated ...
2018-07-20 17:46:45,566 - INFO: Extending finished.

2018-07-20 17:46:45,567 - INFO: Separating filtered fastq file ...
2018-07-20 17:50:45,343 - INFO: Separating filtered fastq file finished!
2018-07-20 17:50:47,679 - INFO: Assembling using SPAdes ...
2018-07-20 17:50:48,293 - ERROR: Error in SPAdes:
== Error == system call for: "['/home1/software/SPAdes-3.11.1-linux/bin/hammer', '/sanhome2/Organnelle/chloro_out/filtered_spades/corrected/configs/config.info']" finished abnormally, err code: 255

2018-07-20 17:50:48,298 - ERROR: Assembling failed.

Total Calc-cost 20067.3784549
Thanks you!

@Kinggerm
Copy link
Owner

Kinggerm commented Jul 24, 2018

Sorry for the trouble and thanks a lot for the information!
As I can tell from the log file, the extending is normal. But you mentioned that "filtred paired reads files were empty", can you show me a few reads you have for both out2045_1.clean.fastq and out2045_2.clean.fastq by

head -n 8 /sanhome2/trimmed/out2045_1.clean.fastq
head -n 8 /sanhome2/trimmed/out2045_2.clean.fastq

I want to make sure whether it was the problem with the format detecting.

@SolayMane
Copy link
Author

head -n 8 /sanhome2/trimmed/out2045_1.clean.fastq:

@SRR6062045.201.1 201 length=151
TGGAGAACAAAGGATTTTATGTGCCAGTGGTGATCCTTTTTCAAATCTTGCTTTCTTCTAACTCTGGTTATTGCTTTTGTAGTGGTGGTGAGGTGCTCTGTGATTTTGTACTTCTAACTCTTCTTTCTCGTCTGTATGTGCACGTACAA
+
AFFJJJJJJJJJJAJJ7-FFJFJJJAFJAJ-<--<J<FAFAJFJJJFJJJFAF-FJJJF<FAAFFJJA-JA777<<AF<77A7<JF7JFJFAAFJFJ<F-<7A-FJFJFFJAJFAAFFAFJ7AJJ7FAFJ7AAJJ7A-<7<<FF<FJJF
@SRR6062045.202.1 202 length=151
TGCTTAAAGTTCATTCAAATTACAAAAATTAATTTAAGAAATTATGTAAAAATATCTACACAAAAATTAATTTCTTTCCCCTTTTTTTGTTTTTTAAACTAAACTAACCCTAAATTAACTTGTGCATACTGTCATCTGGAGCAAAAAAG
+
AFFJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJFJJ7FFJ<FJJJJJ<JJFJJJJFFFAJJ-AAFJJ<FJJJJFFFJJJ-<AJJJJJJA

head -n 8 /sanhome2/trimmed/out2045_2.clean.fastq:
@SRR6062045.201.2 201 length=151
AGGAAGCAATAGATCTCTCTCTCAACGACAAGAGAGTGCTCCCTTCCCCTTCTACTTTACAAAAACCGTGAAACGTAAGCATCTGCAAACCACAAATCTACCCCCTGAATTGAAATCAAAATTAAAAGACTAGTTGTACGTGCACATACAG
+
AAFFFJJJJJJ7JFJ<FJFAF-FJJJFJA<AAJFFF7FFJFJJJJAJJJJFJFFJJFJFJJJJJJJFF-AFJJJJJJJFAJAFFFJAJF<FJJFA-<FAJFF7FFAJJAAAFFJF-7<FJAF-<<-<777<<7<F7<<---<-A7F<F7F-
@SRR6062045.202.2 202 length=151
GGTGGGTTCCTTGTGGCAACCGGTCAATGCCAACCCCCTTGGTGGGGCCGCTGGCTTCCTGATTTACCTCTCACACGATGCTCGTGGGGCCGGGTGGTGTGGGCCGGTTAGGGTGTCCGGACAATACACCTTTTTTGCTCCAGATGACAGT
+
A-FFFJJJJJJJJJJFJJJJJJJJFJJJJJJJJJJJJJF<FJFJJJJJJJJJJJJ<JJJJJJJJJJJJJJFJJFJJJJJJJ-FJ<<JFJJJJJJ-7AAJFJJFFJJJAJAJJJ7J<AJJJFJF7-<-<<AA<<FJJ-A-7<FJ<FF-A-A-

@Kinggerm
Copy link
Owner

Thanks!
Now I see the problem. The head is not compatible with this GetOrganelle version.
I am going to fix this latter for your type of data. I would let you know once done. It would be quick.

@SolayMane
Copy link
Author

Thank you very much!

@Kinggerm
Copy link
Owner

Hi, I made a few changes so that it could works for a small testing data. You could easily using git pull to update GetOrganelle.
Let me know if you could go through your data with new version. Also, I strongly suggest you reduce your dataset to 2G per end for plastome assembly. 2G is really enough and much more faster.

@SolayMane
Copy link
Author

Hi thank you for your help, the issue is solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants