WARNING: Please check the final result to confirm whether they are simply flip-flop configurations! #5

agroppi · 2019-03-05T10:13:07Z

I'm trying to assemble the chloroplast of a prunus.
I have set a prunus_cp file in GetOrganelle/Library/SeqReference
and generated the corresponding blast db (makeblastdb in /gpfs/home/ag/GetOrganelle/Library/NotationReference
But i'm facing this error at the desentangling step :
Disentangling failed: list index out of range
Here is the full log :
get_org.log.txt.zip

The text was updated successfully, but these errors were encountered:

Kinggerm · 2019-03-06T01:19:26Z

Hi,

I thought I had made it impossible to use a choice like "prunus_cp", it seemed that I missed adding "exit()" after the printing the ERROR. Thanks for debugging!
The suggested way for you to use your customized blast db is "-F anonym --genes prunus.genes.fasta -s prunus.fasta". The "prunus.genes.fasta" is the fasta file you used to make the blast db. For looking into the error you faced, I need the fasta file you used to make the blast db.
Actually, for species within angiosperms, especially for taxa in Rosaceae which I had a bunch of tests, there is generally no need to using a customized blast db. You can easily use the default plant_cp. Let me know if it does not work for you.
Besides, the automatic word size seemed too small for your data. You can try "-w 65" or "-w 71". Since you used a closely related reference, maximum rounds is no need to be 15. You can try "-R 4" or "-R 5".

Let me know the progress!

agroppi · 2019-03-06T09:20:41Z

Hi,
I first try to run the script with the default options and plant_cp :

get_organelle_reads.py \
-t 20 \
-1 reads_1.fq \
-2 reads_2.fq \
-o /outdir \
-R 15 \
-k 75,85,95,105 \
-F plant_cp

But the resulting assembly was bad :

I will try to
1/ to run the script on plant_cp with w 65 or w 71
2/ to run on prunus_cp by adding the -F option you described

Here is my prunus_cp.fasta
prunus_cp.zip

Kinggerm · 2019-03-06T12:30:50Z

Actually, even if you use the rbcL gene of an angiosperm species Amborella trichopoda (Genbank: NC_005086) as the reference, and try to assemble the dataset sequencing a gymnosperm species Gnetum gnemon (Genbank: ERR268421), you would easily get the right complete circular plastome for Gnetum gnemon. We included similar tests on our upcoming paper. I mean if the data is good enough, the reference should have little effects. However, You may still have the chance to find some reads to close the gap with a more closely related reference, although not a big chance.
If this assembly graph is based on k-mer=95, the base coverage of plastome is really high. It is unusual, but it happens for some data samples (~ 10%). Given this graph, I think you do not need to adjust the -w nor the -R (which is previously suggested for efficiency more than success). You should change the "-k 75,85,95,105" in your options into "-k 35,45,55,65,75".

Let me know the updates!

Kinggerm · 2019-03-06T12:38:27Z

A bit more explanation for using "-F anonym --genes prunus.genes.fasta -s prunus.fasta". (Should be included when preparing for a GetOrganelle manual):

The prunus.fasta file followed with "-s" should be a fasta format sequence file containing your reference(s). Your prunus_cp.fasta file is just in the right form.

The prunus.genes.fasta, however, should be a fasta format file in the following form:

>rbcL gene
ATGC
>matK gene
ATGC

agroppi · 2019-03-06T16:03:37Z

I have run the script on plant_cp with
-k 35,45,55,65,75
I have now a lot of warning like this :
Disentangling failed: Incomplete/Complicated graph

graph in bandage :

Log :
get_org.log.txt.zip

Kinggerm · 2019-03-07T01:36:15Z

Thanks for the graph and log file. There is a big improvement now. Only one break point left (the grey line in the following image).

You could try
further decrease the minimum k-mer, such as "-k 21,35,45,55,65,75" and increase the data used by "--max-reads 3E7" or larger (default is 1.5E7, that is why "Number of reads exceeded 15000000 in NG-6201_A2067_lib13395_1_sequence.good.fq, only top 15000000 reads are used in downstream analysis" is logged).

agroppi · 2019-03-07T15:23:09Z

results with -k 21,35,45,55,65,75 and --max-reads 4E7

graph :

and the log :
get_org.log.txt

One last question :
in the final step I have :
WARNING: More than one circular genome structure produced ...
WARNING: Please check the final result to confirm whether they are simply flip-flop configurations!
(PATH1 and PATH2)
How can I choose the best one ?

Thanks again

Kinggerm · 2019-03-08T01:18:37Z

Looks great now!
For this graph, they are surely flip-flop configurations (Walker et al. 2015). Both of them are right and coexist in plant. For publication, people tend to use just one of them. You can pick the configuration of which the SSC region is in the same direction with most of your data or outgroup, which makes downstream annotation/analysis easier.

Good luck with you study!

agroppi · 2019-03-08T11:05:32Z

Thanks you very much for your help !
I will apply this to chloroplasts close species

Kinggerm closed this as completed Mar 12, 2019

Kinggerm changed the title ~~Disentangling failed: list index out of range~~ A broad range of k-mer gradient would be preferred .. Mar 13, 2019

Kinggerm changed the title ~~A broad range of k-mer gradient would be preferred ..~~ WARNING: Please check the final result to confirm whether they are simply flip-flop configurations! Jun 19, 2019

Kinggerm mentioned this issue Oct 16, 2019

Differences of complete*.1.path_* and complete*.2.path_* output files #22

Closed

Kinggerm mentioned this issue Dec 17, 2019

On the interpretation of visualized *.fastg file #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARNING: Please check the final result to confirm whether they are simply flip-flop configurations! #5

WARNING: Please check the final result to confirm whether they are simply flip-flop configurations! #5

agroppi commented Mar 5, 2019

Kinggerm commented Mar 6, 2019 •

edited

Loading

agroppi commented Mar 6, 2019

Kinggerm commented Mar 6, 2019

Kinggerm commented Mar 6, 2019 •

edited

Loading

agroppi commented Mar 6, 2019

Kinggerm commented Mar 7, 2019 •

edited

Loading

agroppi commented Mar 7, 2019

Kinggerm commented Mar 8, 2019 •

edited

Loading

agroppi commented Mar 8, 2019

WARNING: Please check the final result to confirm whether they are simply flip-flop configurations! #5

WARNING: Please check the final result to confirm whether they are simply flip-flop configurations! #5

Comments

agroppi commented Mar 5, 2019

Kinggerm commented Mar 6, 2019 • edited Loading

agroppi commented Mar 6, 2019

Kinggerm commented Mar 6, 2019

Kinggerm commented Mar 6, 2019 • edited Loading

agroppi commented Mar 6, 2019

Kinggerm commented Mar 7, 2019 • edited Loading

agroppi commented Mar 7, 2019

Kinggerm commented Mar 8, 2019 • edited Loading

agroppi commented Mar 8, 2019

Kinggerm commented Mar 6, 2019 •

edited

Loading

Kinggerm commented Mar 6, 2019 •

edited

Loading

Kinggerm commented Mar 7, 2019 •

edited

Loading

Kinggerm commented Mar 8, 2019 •

edited

Loading