Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of reconstruct_contig.py #13

Closed
crystal0721 opened this issue Nov 9, 2017 · 8 comments
Closed

The problem of reconstruct_contig.py #13

crystal0721 opened this issue Nov 9, 2017 · 8 comments

Comments

@crystal0721
Copy link

hi, Dear
when I test reconstruct_contig.py,there is a problem as follows:
reconstruct_contig.py -h
Traceback (most recent call last):
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/bin/reconstruct_contig.py", line 4, in
import('pkg_resources').run_script('Cogent==2.1', 'reconstruct_contig.py')
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/pkg_resources/init.py", line 748, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/pkg_resources/init.py", line 1517, in run_script
exec(code, namespace, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/Cogent-2.1-py2.7.egg/EGG-INFO/scripts/reconstruct_contig.py", line 301, in
args = parser.parse_args()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1701, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1733, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1939, in _parse_known_args
start_index = consume_optional(start_index)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1879, in consume_optional
take_action(action, args, option_string)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1807, in take_action
action(self, namespace, argument_values, option_string)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 996, in call
parser.print_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 2340, in print_help
self._print_message(self.format_help(), file)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 2314, in format_help
return formatter.format_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 281, in format_help
help = self._root_section.format_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 211, in format_help
func(*args)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 211, in format_help
func(*args)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 517, in _format_action
help_text = self._expand_help(action)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 603, in _expand_help
return self._get_help_string(action) % params
ValueError: unsupported format character ')' (0x29) at index 32

The python version is 2.7.14,could you tell be how to solove it,thank you very much.

Crystal

@Magdoll
Copy link
Owner

Magdoll commented Nov 9, 2017

Hi @crystal0721 ,

reconstruct_contig.py currently does not support the -h option, despite what it says when you directly type the command. If you want to know what options are avail just type the program name itself. Sorry about that. I'll fix it in the next release.

(anaCogent)etseng@login14-biofx02:~$ reconstruct_contig.py
usage: reconstruct_contig.py [-h] [-e EXPECTED_ERROR_RATE]
                             [--nx_cycle_detection] [-k KMER_SIZE]
                             [-p OUTPUT_PREFIX] [-D GMAP_DB_PATH]
                             [-d GMAP_SPECIES] [--small_genome] [--version]
                             [--debug]
                             dirname
reconstruct_contig.py: error: too few arguments

-Liz

@crystal0721
Copy link
Author

hi Magdoll,
Thank you very much for your replay and I had soloved. While having something that I want to get help or suggestion from you. There is a way that can combine transcripts produced by Trinity and Full-length transcripts produced by SMRTlink or SMRTanalysis besides CDHIT,the results we want to obtain is GeneID and its corresponding Transcripts ID in order to the following analysis.

Crystal

@Magdoll
Copy link
Owner

Magdoll commented Nov 13, 2017

Hi @crystal0721 ,

One possibility is to combine {Trinity} + {HQ isoforms from SMRTLink/SMRTAnalysis} into a single fasta file and run it through Cogent. After reconstruction, use the collapse script per this tutorial to collapse any redundant transcripts. The collapse script should not care if the sequence comes from Trinity or HQ. Let me know if you run into issues.

--Liz

@crystal0721
Copy link
Author

hi Magdoll,
Thank you for replaying. I would know what you mean,mybe I will try it.While there are some questions I want to discuss.When we use cogent to collapse redundant transcripts to obtain cogent2.renamed.fasta which was considered as genome,why we use Cupcake to redundant transcript again?I thought the cogent2.renamed.fasta was non-redundant sequence which could be as genome sequence of Pacbio data.
Now I had finish Cogent and obtained cogent2.renamed.fasta for Pacbio data. For illumina data I could choose the long transcript as unigene. There is no way could combine cogent2.renamed.fasta and unigene. Or this way is not good than above(combine transcript from Trinity and SMRT using cogent and redundanting by Cupcake.)
I hope the questions was described clearly and doest not bother you. Thank you again.

Crystal

@Magdoll
Copy link
Owner

Magdoll commented Dec 5, 2017

Hi @crystal0721 ,

cogent2.renamed.fasta is the reconstructed "fake" genome that consists of just the transcribed coding portions of a genome. So yes, you are correct that "cogent2.renamed.fasta was non-redundant sequence which could be as genome sequence of Pacbio data".

I was answering your question in how to combine (Illumina) Trinity data with PacBio HQ isoforms. The reason I suggested combining Trinity + HQ isoforms and run Cogent is to get the union of the genes the two platforms cover.

Another possibility, which you have hinted, is:

  1. run Cogent on HQ (which you've done)
  2. align both HQ + Trinity to cogent2.renamed.fasta and collapse. All HQ will align. Some Trinity data may not. The ones from Trinity that did not map will be genes seen in Illumina only.
  3. extract the Illumina-only Trininty sequences and pick the longest transcript as unigenes, or run CD-HIT, or whatever other tool you find suitable.

The combination of {2} + {3} will then get you the combined final dataset.

--Liz

@jayceejiao
Copy link

Hi @Magdoll,

I also want to combine the HQ and rnaspades assembly to generate a complete transcriptome. I used the above strategy:

  1. run Cogent on HQ (which you've done)
  2. align both HQ + Trinity to cogent2.renamed.fasta and collapse.
    I have concatenated the sequences from both sources and collapsed them using Cupcake. But I can't get the results. I received the following message:

alt. junction found at 1910
alt. junction found at 1957
alt. junction found at 2413
alt. junction found at 3265
alt. junction found at 3277
alt. junction found at 3279
alt. junction found at 230
alt. junction found at 1837
alt. junction found at 3118
alt. junction found at 3154
alt. junction found at 24
alt. junction found at 1893
Traceback (most recent call last):
File "/home/jiaochen/miniconda3/bin/collapse_isoforms_by_sam.py", line 4, in
import('pkg_resources').run_script('cupcake==5.3', 'collapse_isoforms_by_sam.py')
File "/home/jiaochen/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 741, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/jiaochen/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 1502, in run_script
exec(code, namespace, namespace)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 245, in
main(args)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 204, in main
collapse_fuzzy_junctions(f_good.name, f_txt.name, args.allow_extra_5exon, internal_fuzzy_max_dist=args.max_fuzzy_junction)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 149, in collapse_fuzzy_junctions
_size = get_fl_from_id(group_info[pbid])
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in get_fl_from_id
return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members)
File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in
return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members)
IndexError: list index out of range
Do you know what's wrong?

If I only use the HQ data, I can get the final results without any error. But 10% of the HQ sequences were ignored because of the coverage issues, is that normal?

Thanks!

Chen

@Magdoll
Copy link
Owner

Magdoll commented May 23, 2018

Hi @jayceejiao ,

The collapse scripts expect a certain sequence ID format that the RNA-seq data is not following. It needs to look like this:

>HQ_sample1|cb123_10/f10p0/1234

it's looking for the "f10p0" which needs to be after the right "/" symbol.

YOu can use a Python script to force RNA-seq/Trinity data to have that:

from Bio import SeqIO
f = open('Trinity.modified.fasta', 'w')
i = 1
for r in SeqIO.parse(open('Trinity.fasta'),'fasta'):
    f.write(">HQ_Trinity|cb1_{0}/f2p0/{1} {2}\n".format(i, len(r.seq), r.id))
    f.write("{0}\n".format(r.seq))
    i += 1
f.close()    

--Liz

@jayceejiao
Copy link

Hi @Magdoll,

I have changed the transcript id and got the final representative transcriptome. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants