-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wget command retrieves index.html of all study directories before downloading study images #1012
Comments
We are hoping to improve how to download many file datasets such as this one on PhysioNet - for now I'd recommend using GCP to download the images. |
I can confirm that the wget will actually download the images; but only after it's downloaded ~500,000 index files or so. Clearly not ideal! |
I hope there is a way to download sub-packages instead of one file after another. Maybe you can pack all the files into 50 small zip files |
I'd suggest using GCP in that instance. You can use their command line tools to filter much more easily using wildcards in the |
I started with physionet url to download but it gives index.html files as mentioned in the issue. As per the suggested solution, I used GCP for download but it always add .gstmp extension for all the files (For eg. 174413ec-4ec4c1f7-34ea26b7-c5f994f8-79ef1962.dcm.gstmp). Is there a method to read these files or avoid this extension? |
gstmp means the download is incomplete. it's a new feature of gsutil allowing download resume I believe |
The command listed on the dataset page is only downloading the index.html of each study directory, and not the dicom files.
wget -r -N -c -np --user <PHYSIONETUSERNAME> --ask-password https://alpha.physionet.org/files/mimic-cxr/2.0.0/
That is, the command is downloading
/physionet.org/files/mimic-cxr/2.0.0/files/p**/p******/s*******/index.html
instead of the two dicom files under that directory
/physionet.org/files/mimic-cxr/2.0.0/files/p**/p******/s*******/*****.dcm
MIMIC-CXR-JPG has the same problem.
Am I missing something obvious?
(I interrupted the wget command after seeing all of the 200+ downloaded studies lacking image files)
The text was updated successfully, but these errors were encountered: