Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download failed #83

Closed
snayfach opened this issue May 19, 2022 · 12 comments
Closed

download failed #83

snayfach opened this issue May 19, 2022 · 12 comments

Comments

@snayfach
Copy link
Collaborator

$ midas2 run_species --sample_name ${sample_name} -1 reads/${sample_name}_R1.fastq.gz --midasdb_name uhgg --midasdb_dir my_midasdb_uhgg --num_cores 8 my_midas2_output

1652982326.0: Species abundance estimation in subcommand run_species with args
1652982326.0: {
1652982326.0: "subcommand": "run_species",
1652982326.0: "force": false,
1652982326.0: "debug": false,
1652982326.0: "zzz_worker_mode": false,
1652982326.0: "batch_branch": "master",
1652982326.0: "batch_memory": 378880,
1652982326.0: "batch_vcpus": 48,
1652982326.0: "batch_queue": "pairani",
1652982326.0: "batch_ecr_image": "pairani:latest",
1652982326.0: "midas_outdir": "my_midas2_output",
1652982326.0: "sample_name": "sample1",
1652982326.0: "r1": "reads/sample1_R1.fastq.gz",
1652982326.0: "r2": null,
1652982326.0: "midasdb_name": "uhgg",
1652982326.0: "midasdb_dir": "my_midasdb_uhgg",
1652982326.0: "word_size": 28,
1652982326.0: "aln_mapid": null,
1652982326.0: "aln_cov": 0.75,
1652982326.0: "marker_reads": 2,
1652982326.0: "marker_covered": 2,
1652982326.0: "max_reads": null,
1652982326.0: "num_cores": 8
1652982326.0: }
1652982326.0: Create OUTPUT directory for sample1.
1652982326.0: 'rm -rf my_midas2_output/sample1/species'
1652982326.0: 'mkdir -p my_midas2_output/sample1/species'
1652982326.0: Create TEMP directory for sample1.
1652982326.0: 'rm -rf my_midas2_output/sample1/temp/species'
1652982326.0: 'mkdir -p my_midas2_output/sample1/temp/species'
1652982326.0: MIDAS2::fetch_midasdb_files::start
download failed: s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 to - An error occurred (403) when calling the HeadObject operation: Forbidden
1652982332.0: Sleeping 4.433524636219189 seconds before retry 1 of <function download_reference at 0x155553e93440> with ('s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4', '/global/u1/s/snayfach/test/my_midasdb_uhgg'), {}.
download failed: s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 to - An error occurred (403) when calling the HeadObject operation: Forbidden
1652982337.9: Sleeping 11.755280753849886 seconds before retry 2 of <function download_reference at 0x155553e93440> with ('s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4', '/global/u1/s/snayfach/test/my_midasdb_uhgg'), {}.
download failed: s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 to - An error occurred (403) when calling the HeadObject operation: Forbidden
1652982351.2: Deleting untrustworthy outputs due to error. Specify --debug flag to keep.
Traceback (most recent call last):
File "/global/homes/s/snayfach/.conda/envs/midas2/bin/midas2", line 10, in
sys.exit(main())
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/main.py", line 24, in main
return subcommand_main(subcommand_args)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/run_species.py", line 498, in main
run_species(args)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/run_species.py", line 492, in run_species
raise error
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/run_species.py", line 443, in run_species
midas_db = MIDAS_DB(os.path.abspath(args.midasdb_dir), args.midasdb_name)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 60, in init
self.local_toc = self.fetch_files("table_of_contents")
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 118, in fetch_files
return _fetch_file_from_s3((s3_path, local_path))
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 165, in _fetch_file_from_s3
return download_reference(s3_path, local_dir)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 467, in wrapped_operation
return operation(*args, **kwargs)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 643, in download_reference
command(f"set -o pipefail; aws s3 cp --only-show-errors --no-sign-request {ref_path} - | {uncompress_cmd} > {local_path}")
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 245, in command
return subprocess.run(cmd, shell=shell, **subproc_args)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'set -o pipefail; aws s3 cp --only-show-errors --no-sign-request s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 - | lz4 -dc > /global/u1/s/snayfach/test/my_midasdb_uhgg/genomes.tsv' returned non-zero exit status 1.

@zhaoc1
Copy link
Contributor

zhaoc1 commented May 24, 2022

The server hosting the database just become public. Should work now.

@zhaoc1 zhaoc1 closed this as completed May 24, 2022
@snayfach
Copy link
Collaborator Author

snayfach commented May 25, 2022

Please reopen. Getting same error:

download failed: s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 to - An error occurred (403) when calling the HeadObject operation: Forbidden

@zhaoc1
Copy link
Contributor

zhaoc1 commented May 25, 2022

Hmmm are you using the most updated master branch (version 0.9.8)? The s3 path should not show up.

@zhaoc1 zhaoc1 reopened this May 25, 2022
@snayfach
Copy link
Collaborator Author

Just installed the newest version.

$ midas2 database --list

uhgg 286997 genomes from 4644 species version 1.0

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
1653450519.0:  Sleeping 4.107558893345466 seconds before retry 1 of <function download_tarball at 0x155553c864d0> with ('http://midasdb.pollard.gladstone.org/gtdb/md5sum.json.tar.gz', '/global/cscratch1/sd/snayfach/test/midasdb_gtdb'), {}.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
1653450523.1:  Sleeping 15.292327766595005 seconds before retry 2 of <function download_tarball at 0x155553c864d0> with ('http://midasdb.pollard.gladstone.org/gtdb/md5sum.json.tar.gz', '/global/cscratch1/sd/snayfach/test/midasdb_gtdb'), {}.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Traceback (most recent call last):
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/bin/midas2", line 8, in <module>
    sys.exit(main())
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/__main__.py", line 24, in main
    return subcommand_main(subcommand_args)
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/database.py", line 140, in main
    list_midasdb(args)
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/database.py", line 14, in list_midasdb
    midasdb = MIDAS_DB(os.path.join(args.midasdb_dir, f"midasdb_{dbname}"), dbname, 1)
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/models/midasdb.py", line 106, in __init__
    self.md5sum = load_json(self.fetch_files("md5sum")) if self.has_md5sum else None
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/models/midasdb.py", line 158, in fetch_files
    return self.fetch_tarball(filename, list_of_species)
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/models/midasdb.py", line 222, in fetch_tarball
    _fetched_file = _fetch_file_from_s3(self.construct_file_tuple(filename))
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/models/midasdb.py", line 318, in _fetch_file_from_s3
    return download_tarball(s3_path, local_dir)
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 468, in wrapped_operation
    return operation(*args, **kwargs)
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 662, in download_tarball
    command(f"set -o pipefail; wget -q -O- {ref_path} | {uncompress_cmd} {local_dir}")
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 246, in command
    return subprocess.run(cmd, shell=shell, **subproc_args)
  File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'set -o pipefail; wget -q -O- http://midasdb.pollard.gladstone.org/gtdb/md5sum.json.tar.gz | tar -xz -C /global/cscratch1/sd/snayfach/test/midasdb_gtdb' returned non-zero exit status 2.

@zhaoc1
Copy link
Contributor

zhaoc1 commented May 25, 2022

Yep, only UHGG reference genome is on the GIDB server for this moment.

@snayfach
Copy link
Collaborator Author

midas2 database --download --midasdb_name uhgg

Traceback (most recent call last):
File "/global/homes/s/snayfach/.conda/envs/midas2.0/bin/midas2", line 8, in
sys.exit(main())
File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/main.py", line 24, in main
return subcommand_main(subcommand_args)
File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/database.py", line 144, in main
download_midasdb(args)
File "/global/homes/s/snayfach/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/database.py", line 31, in download_midasdb
assert args.species is not None or args.species_list is not None, f"Need to provide --species or --species_list for download task."
AssertionError: Need to provide --species or --species_list for download task.

I'm going to wait until there's clean quickstart / install documentation to resume testing.

@zhaoc1
Copy link
Contributor

zhaoc1 commented May 26, 2022

Try this:

midas2 database --download --midasdb_name uhgg --midasdb_dir midasdb_uhgg --species all 

Alternatively, this is the work in progress documentation page for download database: https://midas2-tutorial.readthedocs.io/en/stable/download_midasdb.html

@snayfach
Copy link
Collaborator Author

https://midas2-tutorial.readthedocs.io/en/stable/download_midasdb.html

This requires a large amount of data transfer and storage: 93 GB for MIDASDB-uhgg and 539 GB for MIDASDB-gtdb

We previously discussed versions of these databases that were much smaller. Was there any progress made on that?

@zhaoc1
Copy link
Contributor

zhaoc1 commented May 26, 2022

This is the final size of the databases. For MIDAS 2.0, we recommend user to download database for selected species, instead of all.

@snayfach
Copy link
Collaborator Author

Do you need an AWS account, IAM credentials, and an IAM access key pair to download the database and run MIDAS 2.0?

@zhaoc1
Copy link
Contributor

zhaoc1 commented May 26, 2022

No. We are in the process of migrating the database from AWS to GIDB server, and only UHGG is ready for testing at this moment. Users don't need AWS account to run MIDAS 2.0.

@zhaoc1
Copy link
Contributor

zhaoc1 commented Jun 1, 2022

Both MIDAS DBs are alive on the server now.

@snayfach snayfach closed this as completed Jul 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants