The problem of reconstruct_contig.py #13

crystal0721 · 2017-11-09T08:15:41Z

hi, Dear
when I test reconstruct_contig.py，there is a problem as follows：
reconstruct_contig.py -h
Traceback (most recent call last):
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/bin/reconstruct_contig.py", line 4, in
import('pkg_resources').run_script('Cogent==2.1', 'reconstruct_contig.py')
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/pkg_resources/init.py", line 748, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/pkg_resources/init.py", line 1517, in run_script
exec(code, namespace, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/Cogent-2.1-py2.7.egg/EGG-INFO/scripts/reconstruct_contig.py", line 301, in
args = parser.parse_args()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1701, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1733, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1939, in _parse_known_args
start_index = consume_optional(start_index)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1879, in consume_optional
take_action(action, args, option_string)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1807, in take_action
action(self, namespace, argument_values, option_string)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 996, in call
parser.print_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 2340, in print_help
self._print_message(self.format_help(), file)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 2314, in format_help
return formatter.format_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 281, in format_help
help = self._root_section.format_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 211, in format_help
func(*args)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 211, in format_help
func(*args)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 517, in _format_action
help_text = self._expand_help(action)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 603, in _expand_help
return self._get_help_string(action) % params
ValueError: unsupported format character ')' (0x29) at index 32

The python version is 2.7.14，could you tell be how to solove it，thank you very much.

Crystal

Magdoll · 2017-11-09T20:11:40Z

Hi @crystal0721 ,

reconstruct_contig.py currently does not support the -h option, despite what it says when you directly type the command. If you want to know what options are avail just type the program name itself. Sorry about that. I'll fix it in the next release.

(anaCogent)etseng@login14-biofx02:~$ reconstruct_contig.py
usage: reconstruct_contig.py [-h] [-e EXPECTED_ERROR_RATE]
                             [--nx_cycle_detection] [-k KMER_SIZE]
                             [-p OUTPUT_PREFIX] [-D GMAP_DB_PATH]
                             [-d GMAP_SPECIES] [--small_genome] [--version]
                             [--debug]
                             dirname
reconstruct_contig.py: error: too few arguments

-Liz

crystal0721 · 2017-11-10T04:05:53Z

hi Magdoll,
Thank you very much for your replay and I had soloved. While having something that I want to get help or suggestion from you. There is a way that can combine transcripts produced by Trinity and Full-length transcripts produced by SMRTlink or SMRTanalysis besides CDHIT，the results we want to obtain is GeneID and its corresponding Transcripts ID in order to the following analysis.

Crystal

Magdoll · 2017-11-13T20:21:22Z

Hi @crystal0721 ,

One possibility is to combine {Trinity} + {HQ isoforms from SMRTLink/SMRTAnalysis} into a single fasta file and run it through Cogent. After reconstruction, use the collapse script per this tutorial to collapse any redundant transcripts. The collapse script should not care if the sequence comes from Trinity or HQ. Let me know if you run into issues.

--Liz

crystal0721 · 2017-11-14T07:39:28Z

hi Magdoll,
Thank you for replaying. I would know what you mean，mybe I will try it.While there are some questions I want to discuss.When we use cogent to collapse redundant transcripts to obtain cogent2.renamed.fasta which was considered as genome，why we use Cupcake to redundant transcript again？I thought the cogent2.renamed.fasta was non-redundant sequence which could be as genome sequence of Pacbio data.
Now I had finish Cogent and obtained cogent2.renamed.fasta for Pacbio data. For illumina data I could choose the long transcript as unigene. There is no way could combine cogent2.renamed.fasta and unigene. Or this way is not good than above（combine transcript from Trinity and SMRT using cogent and redundanting by Cupcake.）
I hope the questions was described clearly and doest not bother you. Thank you again.

Crystal

Magdoll · 2017-12-05T22:18:04Z

Hi @crystal0721 ,

cogent2.renamed.fasta is the reconstructed "fake" genome that consists of just the transcribed coding portions of a genome. So yes, you are correct that "cogent2.renamed.fasta was non-redundant sequence which could be as genome sequence of Pacbio data".

I was answering your question in how to combine (Illumina) Trinity data with PacBio HQ isoforms. The reason I suggested combining Trinity + HQ isoforms and run Cogent is to get the union of the genes the two platforms cover.

Another possibility, which you have hinted, is:

run Cogent on HQ (which you've done)
align both HQ + Trinity to cogent2.renamed.fasta and collapse. All HQ will align. Some Trinity data may not. The ones from Trinity that did not map will be genes seen in Illumina only.
extract the Illumina-only Trininty sequences and pick the longest transcript as unigenes, or run CD-HIT, or whatever other tool you find suitable.

The combination of {2} + {3} will then get you the combined final dataset.

--Liz

jayceejiao · 2018-05-23T04:43:57Z

Hi @Magdoll,

I also want to combine the HQ and rnaspades assembly to generate a complete transcriptome. I used the above strategy:

run Cogent on HQ (which you've done)
align both HQ + Trinity to cogent2.renamed.fasta and collapse.
I have concatenated the sequences from both sources and collapsed them using Cupcake. But I can't get the results. I received the following message:

alt. junction found at 1910
alt. junction found at 1957
alt. junction found at 2413
alt. junction found at 3265
alt. junction found at 3277
alt. junction found at 3279
alt. junction found at 230
alt. junction found at 1837
alt. junction found at 3118
alt. junction found at 3154
alt. junction found at 24
alt. junction found at 1893
Traceback (most recent call last):
File "/home/jiaochen/miniconda3/bin/collapse_isoforms_by_sam.py", line 4, in
import('pkg_resources').run_script('cupcake==5.3', 'collapse_isoforms_by_sam.py')
File "/home/jiaochen/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 741, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/jiaochen/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 1502, in run_script
exec(code, namespace, namespace)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 245, in
main(args)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 204, in main
collapse_fuzzy_junctions(f_good.name, f_txt.name, args.allow_extra_5exon, internal_fuzzy_max_dist=args.max_fuzzy_junction)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 149, in collapse_fuzzy_junctions
_size = get_fl_from_id(group_info[pbid])
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in get_fl_from_id
return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in
return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members)
IndexError: list index out of range
Do you know what's wrong?

If I only use the HQ data, I can get the final results without any error. But 10% of the HQ sequences were ignored because of the coverage issues, is that normal?

Thanks!

Chen

Magdoll · 2018-05-23T19:42:05Z

Hi @jayceejiao ,

The collapse scripts expect a certain sequence ID format that the RNA-seq data is not following. It needs to look like this:

>HQ_sample1|cb123_10/f10p0/1234

it's looking for the "f10p0" which needs to be after the right "/" symbol.

YOu can use a Python script to force RNA-seq/Trinity data to have that:

from Bio import SeqIO
f = open('Trinity.modified.fasta', 'w')
i = 1
for r in SeqIO.parse(open('Trinity.fasta'),'fasta'):
    f.write(">HQ_Trinity|cb1_{0}/f2p0/{1} {2}\n".format(i, len(r.seq), r.id))
    f.write("{0}\n".format(r.seq))
    i += 1
f.close()

--Liz

jayceejiao · 2018-05-23T20:24:53Z

Hi @Magdoll,

I have changed the transcript id and got the final representative transcriptome. Thanks!

Magdoll closed this as completed May 23, 2018

Magdoll mentioned this issue May 23, 2018

--max_fuzzy_junction=0 Magdoll/cDNA_Cupcake#34

Closed

mathog mentioned this issue Jun 26, 2018

Excessive clustering observed for one gene #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem of reconstruct_contig.py #13

The problem of reconstruct_contig.py #13

crystal0721 commented Nov 9, 2017

Magdoll commented Nov 9, 2017

crystal0721 commented Nov 10, 2017

Magdoll commented Nov 13, 2017 •

edited

Loading

crystal0721 commented Nov 14, 2017

Magdoll commented Dec 5, 2017

jayceejiao commented May 23, 2018

Magdoll commented May 23, 2018

jayceejiao commented May 23, 2018

The problem of reconstruct_contig.py #13

The problem of reconstruct_contig.py #13

Comments

crystal0721 commented Nov 9, 2017

Magdoll commented Nov 9, 2017

crystal0721 commented Nov 10, 2017

Magdoll commented Nov 13, 2017 • edited Loading

crystal0721 commented Nov 14, 2017

Magdoll commented Dec 5, 2017

jayceejiao commented May 23, 2018

Magdoll commented May 23, 2018

jayceejiao commented May 23, 2018

Magdoll commented Nov 13, 2017 •

edited

Loading