Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in daijin #209

Closed
asdcid opened this issue Aug 22, 2019 · 8 comments
Closed

Bugs in daijin #209

asdcid opened this issue Aug 22, 2019 · 8 comments
Assignees
Labels
Projects
Milestone

Comments

@asdcid
Copy link

asdcid commented Aug 22, 2019

Hi,

I tried to run the daijin assemble

daijin assemble -nd -C 20 -nd daijin.yaml

And then I got this error:

KeyError in line 40 of /home/raymond/devel/python/thirdparty/anaconda2/envs/mikado1.5/lib/python3.6/site-packages/Mikado-2.0rc4-py3.6-linux-x86_64.egg/Mikado/daijin/tr.snakefile:
'pick'
  File "/home/raymond/devel/python/thirdparty/anaconda2/envs/mikado1.5/lib/python3.6/site-packages/Mikado-2.0rc4-py3.6-linux-x86_64.egg/Mikado/daijin/tr.snakefile", line 40, in <module>

The configure file created by daijin configure is:

#  This is a standard configuration file for Daijin. Fields:
#  - short_reads: this section deals with RNA-Seq short read input data.
#  - name: name of the species under analysis.
#  - reference: reference data to use. A reference genome is required.
align_methods:
  hisat:
  - ''
asm_methods:
  class:
  - ''
  cufflinks:
  - ''
  scallop:
  - ''
  stringtie:
  - ''
  trinity:
  - ''
  trinitydn: false
blastx:
  chunks: 1
  evalue: 1.0e-07
  max_target_seqs: 10
  prot_db:
  - Reference/Aedes_aegypti.fasta
extra:
  #  Options related to indexing.
  star_index: ''
long_read_align_methods: {}
long_reads:
  #  Parameters related to long reads to use for the assemblies.
  files: []
  samples: []
  skip_split: true
  strandedness: []
mikado:
  db_settings:
    #  Settings related to DB connection. Parameters:
    #  db: the DB to connect to. Required. Default: mikado.db
    #  dbtype: Type of DB to use. Choices: sqlite, postgresql, mysql. Default: sqlite.
    #  dbhost: Host of the database. Unused if dbtype is sqlite. Default: localhost
    #  dbuser: DB user. Default:
    #  dbpasswd: DB password for the user. Default:
    #  dbport: Integer. It indicates the default port for the DB.
    db: mikado.db
    dbhost: localhost
    dbpasswd: ''
    dbport: 0
    dbtype: sqlite
    dbuser: ''
  modes:
  - permissive
  use_diamond: true
  use_prodigal: true
name: Dmelanogaster
out_dir: Dmelanogaster
portcullis:
  #  Options related to portcullis
  canonical_juncs: C,S
  do: true
reference:
  genome: Reference/Drosophila_melanogaster.BDGP6.dna.toplevel.fa
  genome_fai: ''
  transcriptome: ''
scheduler: ''
short_reads:
  #  Parameters related to the reads to use for the assemblies. Voices:
  #  - r1: array of left read files.
  #  - r2: array of right read files. It must be of the same length of r1; if one
  #    one or more of the samples are single-end reads, add an empty string.
  #  - samples: array of the sample names. It must be of the same length of r1.
  #  - strandedness: array of strand-specificity of the samples. It must be of the
  #    same length of r1. Valid values: fr-firststrand, fr-secondstrand, fr-unstranded.
  r1:
  - Reference/Reads/ERR1662533_1.fastq.gz
  r2:
  - Reference/Reads/ERR1662533_2.fastq.gz
  samples:
  - ERR1662533
  strandedness:
  - fr-unstranded
tgg:
  #  Options related to genome-guided Trinity.
  coverage: 0.7
  identity: 0.95
  max_mem: 6000
  npaths: 0
threads: 2
transdecoder:
  execute: true
  min_protein_len: 30

The version I used is Mikado v2.0rc4. Compared to the version v1.2.4 on conda, it seems that the v2.0rc4 missed the intron_len, scoring_file and other information.

Also, it seems that there are some bugs in Mikado/daijin/tr.snakefile, such as
line 389
@functools.lru_cahe(maxsize=4, typed=True) (missing a c in cahe),
line 709
output: touch(os.path.join(ALIGN_DIR, "gmap", "index", NAME, "index.done") #os.path.join(ALIGN_DIR, "gmap", "index", NAME, NAME+".sachildguide1024") , missing a ")" after "index.done)".

In the rule asm_map_trinitygg, the variable SAMPLE_MAP[wildcards.sample] and params.strandedness.

Cheers,
Raymond

@lucventurini lucventurini self-assigned this Aug 24, 2019
@lucventurini lucventurini added this to the 2.0 milestone Aug 24, 2019
@lucventurini
Copy link
Collaborator

Dear @asdcid , thank you for reporting this. We are planning to retire daijin assemble soon, but I will try to fix the bugs you found as quickly as possible.

The problem most likely stems from the fact that I reorganised the configuration file recently (as it had become sprawling and with a lot of duplicated values). Hopefully it should not take too long to fix.

@lucventurini
Copy link
Collaborator

Dear @asdcid , I should have solved the small issues you reported. I will keep this report open until I have put a proper testing for daijin assemble in place.

Many thanks for reporting this, we would have released with a bugged Snakefile otherwise.

@lucventurini
Copy link
Collaborator

Dear @asdcid , I have now implemented a proper test for daijin assemble. While doing so, today I fixed a very large number of bugs in the pipeline.

Once the travis check completes successfully, I will merge back into the master branch and close the issue.

Thank you again for reporting and prodding me to clean up the code in this section.

@asdcid
Copy link
Author

asdcid commented Sep 17, 2019

Thanks for your help.
However, I have another question. I tried to run mikado (with permissive mode) with trinity and scallop assembly results, but it seems that the BUSCO complete score in the final mikado result pick/mikado-permissive.loci.gff3 is pretty low (~30%, vs ~90% for original trinity or scallop assemblies). Do you have any idea about that?

Thank you.

@asdcid
Copy link
Author

asdcid commented Sep 17, 2019

I think I found the answer. I am using daijin mikado, it seems that neither blastx nor diamond was run.

Also, it seems that the --use-diamond always is true in the configure file even I set --use-blast in mikado configure.

The dag file for daijin mikado is attached:
dag.pdf

@lucventurini
Copy link
Collaborator

Dear @asdcid, thank you again for your report. May I ask whether you specified one or more protein FASTA files during configuration? If they are missing, that would explain why mikado did not perform a blast run.

I will check and correct the bug regarding --use-blast as soon as possible.

@lucventurini
Copy link
Collaborator

Dear @asdcid , unfortunately I cannot reproduce the bug regarding --use-blast with the latest version of the code. I just trialled and daijin correctly used BLAST+ instead of DIAMOND.

I am now testing it in Travis (see https://travis-ci.org/lucventurini/mikado/jobs/585974482), where I can confirm that the bug does not present itself.

Regarding your more concerning point:

However, I have another question. I tried to run mikado (with permissive mode) with trinity and scallop assembly results, but it seems that the BUSCO complete score in the final mikado result pick/mikado-permissive.loci.gff3 is pretty low (~30%, vs ~90% for original trinity or scallop assemblies). Do you have any idea about that?

This is indeed not great. Please let me know if adding BLAST datasets solves the issue. If it does not, I will create another ticket to investigate the matter.

lucventurini added a commit that referenced this issue Sep 17, 2019
* Solved a number of small bugs. Added files for testing (small fastq and very small PacBio reads).

* Solved outstanding bugs in the environments etc. Ready to implement the test for #209

* Now daijin assemble will be properly tested by the sample_data Snakefile. Also, Trinity runs will not pollute the disk, we will delete all temporary files.

* Solved a couple of small bugs, removed STARlong from the testing aligners as it was constantly crashing on Travis.

* This should fix Travis.
@lucventurini
Copy link
Collaborator

Closing as now daijin performs as expected.
@asdcid , please let me know about BUSCO. If it is still not behaving properly, we will open another ticket.

@lucventurini lucventurini added this to Closed in Version 2 Oct 15, 2020
lucventurini added a commit to lucventurini/mikado that referenced this issue Feb 11, 2021
* Solved a number of small bugs. Added files for testing (small fastq and very small PacBio reads).

* Solved outstanding bugs in the environments etc. Ready to implement the test for EI-CoreBioinformatics#209

* Now daijin assemble will be properly tested by the sample_data Snakefile. Also, Trinity runs will not pollute the disk, we will delete all temporary files.

* Solved a couple of small bugs, removed STARlong from the testing aligners as it was constantly crashing on Travis.

* This should fix Travis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

2 participants