Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mikado serialise: sqlalchemy.exc.InvalidRequestError: No transaction is begun. #206

Closed
alelim-bio opened this issue Aug 5, 2019 · 6 comments
Assignees
Labels
Projects
Milestone

Comments

@alelim-bio
Copy link

alelim-bio commented Aug 5, 2019

Hello @lucventurini ,

I have moved onto mikado serialise and have ran into another error. I was wondering if I could get your help?

Below is the error that I am currently getting.

2019-08-05 18:12:49,628 - main - __init__.py:123 - ERROR - main - MainProcess - Mikado crashed, cause:
2019-08-05 18:12:49,629 - main - __init__.py:124 - ERROR - main - MainProcess - No transaction is begun.
Traceback (most recent call last):
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/__init__.py", line 109, in main
    args.func(args)
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/subprograms/serialise.py", line 336, in serialise
    load_orfs(args, logger)
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/subprograms/serialise.py", line 145, in load_orfs
    serializer()
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/serializers/orf.py", line 317, in __call__
    self.serialize()
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/serializers/orf.py", line 291, in serialize
    self.session.commit()
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 1024, in commit
    raise sa_exc.InvalidRequestError("No transaction is begun.")
sqlalchemy.exc.InvalidRequestError: No transaction is begun.

Here is the command I am currently using:

mikado serialise --json-conf configuration.yaml --xml mikado.blast.xml.gz --orfs ../stringtie/longest_orfs.pep.transdecoder.bed

Additionally, I have attached a toy dataset of what I am working with.

Kind Regards,

Alex Lim

Question unrelated to the error if you have some time:

I was wondering if it was possible to learn what was done to generate the .gff3 file from the Trinity transcripts using GMAP. If my understanding is correct usually you get a standard alignment file in (.sam/.bam) format?

toy_bed.zip

@lucventurini lucventurini self-assigned this Aug 6, 2019
@lucventurini lucventurini added this to the 2.0 milestone Aug 6, 2019
@lucventurini
Copy link
Collaborator

Hi @AsclepiusDoc, many thanks for reporting this. I will try to replicate and solve it ASAP.

@lucventurini
Copy link
Collaborator

Dear @AsclepiusDoc , I was able to reproduce the bug. I will fix it today.
However, the bug was triggered because you provided Mikado serialise with the wrong input file - ie ORFs called on the input transcript assemblies for Mikado prepare (StringTie, if I am not mistaken) rather than the output of Mikado prepare itself.

To avoid this, I would recommend using the pipeline manager we developed - Daijin, included in Mikado - to perform the analysis on cotton. Specifically, Daijin can be used to run all the mikado steps.

In order:
1- run mikado configure, as you did before, but adding the --daijin switch.
a- if you want to use TransDecoder instead of Prodigal, please add --use-transdecoder to the command line
b- if you want to use BLAST+ instead of DIAMOND, please add --use-blast to the command line
2- run daijin to perform the Mikado pipeline:

daijin mikado --jobs {maximum number of jobs} --cores {# of cores} --threads {# of cores} -nd --nolock configuration.yaml

This will ensure that the steps are executed correctly.
Further information can be found on the online documentation.

Thank you for reporting the bug!

@lucventurini
Copy link
Collaborator

PS:

I was wondering if it was possible to learn what was done to generate the .gff3 file from the Trinity transcripts using GMAP. If my understanding is correct usually you get a standard alignment file in (.sam/.bam) format?

gmap can return a GFF3 output if called with the -f 2 switch.

lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 7, 2019
@lucventurini
Copy link
Collaborator

Dear @AsclepiusDoc , now Mikado will die informatively when a situation like yours arises; this should make it easier to spot a problem in the pipeline's usage.

The rationale is that if Mikado had not died during serialisation, the data present in the database would have been useless - Mikado would have had no way to link the ORFs with the transcripts present in the GTF (as their names change due to the juxtaposition of a label).

lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 7, 2019
lucventurini added a commit to lucventurini/mikado that referenced this issue Aug 7, 2019
@alelim-bio
Copy link
Author

alelim-bio commented Aug 7, 2019

Hello @lucventurini ,

Thank you so much for your help! That did the trick and I was able to get Mikado outputs. Additionally, I followed your suggestion and was able to get a Trinity output for Mikado however, I seem to have ran into another parsing issue with the Trinity .gff3 file. I have attached the error here if that is okay?

Error:

2019-08-07 13:05:09,936 - prepare - prepare.py:496 - ERROR - prepare - MainProcess - tr_TRINITY_GG_1628_c45_g1_i1.mrna1 is present multiple times in /pylon5/mc5fr6p/alelim/ACTIVE_ANALYSIS/cotton/transcript_assembly/SA_1766/genome_guided_trinity/gmap_alignment/cleaned_trinity_gmap.gff3(label: tr). This breaks the input parsing.
        Please ensure that the file is properly formatted, with unique IDs.
Traceback (most recent call last):
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/preparation/prepare.py", line 466, in prepare
    max_intron=args.json_conf["prepare"]["max_intron_length"],)
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/preparation/prepare.py", line 373, in load_exon_lines
    _load_exon_lines_single_thread(args, shelve_names, logger, min_length, strip_cds, max_intron)
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/preparation/prepare.py", line 270, in _load_exon_lines_single_thread
    strand_specific=strand_specific or is_reference)
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/preparation/annotation_parser.py", line 438, in load_from_gff
    __raise_invalid(tid, gff_handle.name, label)
  File "/pylon5/mc5fr6p/alelim/CONDA/anaconda3/envs/bio/lib/python3.6/site-packages/Mikado/preparation/annotation_parser.py", line 169, in __raise_invalid
    "(label: {0})".format(label) if label != '' else ""))
Mikado.exceptions.InvalidAssembly: tr_TRINITY_GG_1628_c45_g1_i1.mrna1 is present multiple times in /pylon5/mc5fr6p/alelim/ACTIVE_ANALYSIS/cotton/transcript_assembly/SA_1766/genome_guided_trinity/gmap_alignment/cleaned_trinity_gmap.gff3(label: tr). This breaks the input parsing.
        Please ensure that the file is properly formatted, with unique IDs.

Additionally, I grepped the string that was reported to be non-unique and got the following results:

10000001        SA1766  mRNA    38780180        40626956        .       -       .       ID=TRINITY_GG_1628_c45_g1_i1.mrna1;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.path1;coverage=100.0;identity=99.6;matches=232;mismatches=1;indels=0;unknowns=0
10000001        SA1766  exon    40626876        40626956        100     -       .       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.exon1;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 81 1 .
10000001        SA1766  exon    38785950        38785957        100     -       .       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.exon2;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 97 90 .
10000001        SA1766  exon    38781015        38781024        100     -       .       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.exon3;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 107 98 .
1       SA1766  exon    38780180        38780313        99      -       .       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.exon4;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 243 110 .
10000001        SA1766  CDS     40626876        40626913        100     -       0       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.cds1;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 81 44 .
10000001        SA1766  CDS     38785950        38785957        100     -       1       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.cds2;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 97 90 .
10000001        SA1766  CDS     38781015        38781024        100     -       0       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.cds3;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 107 98 .
10000001        SA1766  CDS     38780296        38780313        100     -       0       ID=TRINITY_GG_1628_c45_g1_i1.mrna1.cds4;Name=TRINITY_GG_1628_c45_g1_i1;Parent=TRINITY_GG_1628_c45_g1_i1.mrna1;Target=TRINITY_GG_1628_c45_g1_i1 127 110 .

Furthermore, I will look into the Daijin pipeline as you have recommended. Once again thank you for all your help!

Kind Regards,

Alex Lim

@lucventurini
Copy link
Collaborator

Dear @AsclepiusDoc, the problem is that one of the lines is on chromosome 1 (rather than 10000001). That is clearly a mistake on GMAP part, as the culprit exon is actually missing from the correct chromosome (it's the starting exon of the model).

Mikado presumes that models should belong to the same scaffold /chromosome, so this is flagged as a clear a mistake. I do not know why it happened; it looks like a bug in GMAP. I would try a different version of the program to see whether the bug represents itself again.

Glad to hear that my suggestions solved the problem for you!

@lucventurini lucventurini added this to Closed in Version 2 Oct 15, 2020
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
* Fix EI-CoreBioinformatics#206. Now Mikado serialise will raise an error and exit when trying to add ORFs for transcripts that have not been serialised (if any has already). This will avoid a common mistake (see in EI-CoreBioinformatics#206) whereupon experimenters will try to use ORFs called on Mikado prepare's **inputs** rather than its output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

2 participants