Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛[BUG]: CorrDiff dataset is unreadable when downloaded via wget #431

Closed
gideonite opened this issue Apr 9, 2024 · 4 comments · May be fixed by #574
Closed

🐛[BUG]: CorrDiff dataset is unreadable when downloaded via wget #431

gideonite opened this issue Apr 9, 2024 · 4 comments · May be fixed by #574
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working external Issues/PR filed by people outside the team

Comments

@gideonite
Copy link

Version

latest

On which installation method(s) does this occur?

No response

Describe the issue

Following the link on the CorrDiff README, e.g. this link, I selected the wget option and ran the command below

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.1/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip

The result was a 98G zip archive

$ du -hc cwa_dataset_v1.zip
98G     cwa_dataset_v1.zip

When I tried to unzip, I got the following error

$ unzip cwa_dataset_v1.zip
Archive:  cwa_dataset_v1.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of cwa_dataset_v1.zip or
        cwa_dataset_v1.zip.zip, and cannot find cwa_dataset_v1.zip.ZIP, period.

I did a quick search and the top results indicate one of two possibilities:

  1. The file is not actually a zip file
  2. The file has been corrupted

I tried downloading the dataset again (~ 1 hr) but same error.

Pinging you as we discussed. Please let me know if I can provide any more info @nbren12

Minimum reproducible example

No response

Relevant log output

No response

Environment details

No response

@gideonite gideonite added ? - Needs Triage Need team to review and classify bug Something isn't working labels Apr 9, 2024
@mnabian mnabian self-assigned this Apr 9, 2024
@NickGeneva NickGeneva added the external Issues/PR filed by people outside the team label Apr 19, 2024
@mnabian
Copy link
Collaborator

mnabian commented May 7, 2024

Please use this link: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/modulus/resources/modulus_datasets_cwa

Please note that you must download the image only via NGC CLI. Direct/wget download won't work.

@mnabian mnabian closed this as completed May 7, 2024
@gideonite
Copy link
Author

Just ran the following command:

ngc  registry resource download-version "nvidia/modulus/modulus_datasets_cwa:v1"

Resulting in a 467.8 GB download which I am now waiting to complete. Note that this does not match what is documented as a 97.65 GB dataset when compressed. I guess what is available online is no longer compressed? Specifically, I see a status bar as follows:

⠼ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 0.7/467.8 GiB • Remaining: 1:22:16 • 101.6 MB/s • Elapsed: 0:00:09 • Total: 1 - Completed: 0 - Failed: 0

@nbren12
Copy link
Collaborator

nbren12 commented May 7, 2024 via email

@mnabian
Copy link
Collaborator

mnabian commented May 7, 2024

@gideonite you are on the right track. The compressed size is 467.8 GB, as you see in the status bar. UI shows the compressed size inaccurately, and that's a known bug with NGC catalog.

chychen added a commit to chychen/modulus that referenced this issue Jul 3, 2024
@chychen chychen mentioned this issue Jul 3, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working external Issues/PR filed by people outside the team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants