Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Built-in download of data #108

Open
jeblundell opened this Issue Jul 10, 2016 · 4 comments

Comments

Projects
None yet
3 participants
Contributor

jeblundell commented Jul 10, 2016

Once #103 is finalised, something that I think is worth incorporating is download of data from PhysioNetWorks. Reason I suggest we wait for that to be merged is so we can just do it as a build target.

On PNW there's an automated download command:
wget --user YOURUSERNAME --ask-password -A csv.gz -m -p -E -k -K -np http://MIMICURL

We can literally just indent that and put data: before it and then we can do "make data" to get the data. I'd suggest we also include variables for where to store the data and (obviously) the username. An alternative approach is to include each csv.gz as a build target.

I'd suggest also throwing in "make verify" or modifying mimic-check to run the MD5 checksums.

Two issues:
(1) Is it alright to actually include the URL itself? I can't really see why not as it's protected with username/password, but thought I'd check anyway
(2) Again, we run into the cross-platform issue. Given that @alistairewj was pondering including a gzip binary, might be worth considering including a wget binary too for this reason. I forget whether OS X comes with it as standard. I suppose an objection is that we're creeping various binaries into this, which is not ideal. Maybe a better approach is to only including wget or curl and download gzip etc. from official sources.

Owner

tompollard commented Sep 15, 2016

I added a basic recipe for downloading MIMIC-III from PhysioNet with two variables - physionetuser and datadir.

I didn't do anything with the checksums or look into support for non-unix systems, so I'm leaving this issue open in case we want to deal with these things later.

Will this issue be resolved soon? I think I have a related problem. After running make help, I'm having trouble setting make mimic datadir=~/my/path/to/data/. I keep getting the response Unable to find ~/my/path/to/data/ADMISSIONS.csv - exiting before build.

On physionet, there's no ADMISSIONS.csv, but there is a admissiondrug.csv and admissionDx.csv

Owner

tompollard commented Jun 19, 2017

@herroannekim there are several projects on PhysioNet, so please could you explain which project you are looking at? (e.g. provide the URL).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment