Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatk CreateSequenceDictionary command issue #86

Closed
readline opened this issue Mar 25, 2021 · 10 comments
Closed

gatk CreateSequenceDictionary command issue #86

readline opened this issue Mar 25, 2021 · 10 comments

Comments

@readline
Copy link

I'm prep the ctat mutation library with official singularity image. However an error happened:

`>singularity exec -e -B /gs9,/data,/home,/lscratch /path/to/ctat_mutations.v3.0.0.simg /usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py --genome_lib_dir /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_bui
ld_dir
2021-03-24 22:53:13,315: Generating /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict
Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /path/to/GRCh38_gencode_v37_CTAT_
lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.fa -O /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT
INFO 2021-03-25 02:53:15 CreateSequenceDictionary

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** CreateSequenceDictionary -R /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.fa -O /path/to/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.dict -
VALIDATION_STRINGENCY LENIENT


ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict"
extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools.
The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
Usage example:

java -jar picard.jar CreateSequenceDictionary \
R=reference.fasta \
O=reference.dict`

After I modified the source code of ctat-mutation-lib-integration.py to meet the format of "R=reference.fasta O=reference.dict", it will work.

@joshua-gould
Copy link
Contributor

joshua-gould commented Mar 25, 2021 via email

@joshua-gould
Copy link
Contributor

I updated the singularity image. Please re-download. Thanks for reporting.

@ConcettaDe4
Copy link

Hi! I am building the "mutation lib integration utility" using the singularity image ctat_mutations.v3.0.0.simg with the command:

singularity exec -e -B /path/to/your/ctat_genome_lib_build_dir \
     ctat-mutations.simg \
     /usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py \
     --genome_lib_dir /path/to/your/ctat_genome_lib_build_dir

Unfortunately I got the error ERROR: Invalid argument '-R'.

Do you have any suggestion to fix the problem?
Thank you.

Concetta

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented May 13, 2021 via email

@ConcettaDe4
Copy link

Here the error message:

INFO:    Converting SIF file to temporary sandbox...
2021-05-13 15:28:14,332: Generating /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict
Using GATK jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/src/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict VALIDATION_STRINGENCY=LENIENT
INFO    2021-05-13 15:28:20     CreateSequenceDictionary

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    CreateSequenceDictionary -R /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa -O /home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict -VALIDATION_STRINGENCY LENIENT
**********


ERROR: Invalid argument '-R'.

USAGE: CreateSequenceDictionary [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary

Creates a sequence dictionary for a reference sequence.  This tool creates a sequence dictionary file (with ".dict"
extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools.
The output file contains a header but no SAMRecords, and the header contains only sequence records.

The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
Usage example:

java -jar picard.jar CreateSequenceDictionary \
R=reference.fasta \
O=reference.dict


Version: 4.1.9.0


Options:

--help
-h                            Displays options specific to this tool.

--stdhelp
-H                            Displays options specific to this tool AND options common to all Picard command line
                              tools.

--version                     Displays program version.

OUTPUT=File
O=File                        Output SAM file containing only the sequence dictionary. By default it will use the base
                              name of the input reference with the .dict extension  Default value: null.

GENOME_ASSEMBLY=String
AS=String                     Put into AS field of sequence dictionary entry if supplied  Default value: null.

URI=String
UR=String                     Put into UR field of sequence dictionary entry.  If not supplied, input reference file is
                              used  Default value: null.

SPECIES=String
SP=String                     Put into SP field of sequence dictionary entry  Default value: null.

TRUNCATE_NAMES_AT_WHITESPACE=Boolean
                              Make sequence name the first word from the > line in the fasta file.  By default the
                              entire contents of the > line is used, excluding leading and trailing whitespace.  Default
                              value: true. This option can be set to 'null' to clear the default value. Possible values:
                              {true, false}

NUM_SEQUENCES=Integer         Stop after writing this many sequences.  For testing.  Default value: 2147483647. This
                              option can be set to 'null' to clear the default value.

ALT_NAMES=File
AN=File                       Optional file containing the alternative names for the contigs. Tools may use this
                              information to consider different contig notations as identical (e.g: 'chr1' and '1'). The
                              alternative names will be put into the appropriate @AN annotation for each contig. No
                              header. First column is the original name, the second column is an alternative name. One
                              contig may have more than one alternative name.  Default value: null.

REFERENCE=File
R=File                        Input reference fasta or fasta.gz  Required.

Tool returned:
1
Traceback (most recent call last):
  File "/usr/local/src/ctat-mutations/mutation_lib_prep/ctat-mutation-lib-integration.py", line 66, in <module>
    subprocess.check_call(cmd)
  File "/opt/conda/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gatk', 'CreateSequenceDictionary', '-R', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.fa', '-O', '/home/stefania/RNAseq_project/test_ctat_mutations/GRCh38_gencode_v22_CTAT_lib_Mar012021/ctat_genome_lib_build_dir/ref_genome.dict', 'VALIDATION_STRINGENCY=LENIENT']' returned non-zero exit status 4.
INFO:    Cleaning up image...

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented May 13, 2021 via email

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented May 13, 2021 via email

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented May 13, 2021 via email

@ConcettaDe4
Copy link

Hi!
Sorry for my late reply. Now it works!

Thank you for your help.

Best,

Concetta

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented May 18, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants