Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized filenames #45

Open
jsoma opened this issue Oct 7, 2014 · 7 comments
Open

Standardized filenames #45

jsoma opened this issue Oct 7, 2014 · 7 comments

Comments

@jsoma
Copy link
Contributor

jsoma commented Oct 7, 2014

It could be helpful to run with a consistent naming convention like country-datasource-YYYY-MM-DD.csv, e.g. liberia-case_reports-2014-09-29.csv - I think it might make sorting and browsing a little easier.

It might also help with #37 so that guinea-report-2014-09-29.pdf could sit next to guinea-report-2014-09-29.csv and you'd have a better idea what still needs to be digitized.

@cmrivers
Copy link
Owner

cmrivers commented Oct 7, 2014

Totally agree, wish I had standardized both the file names and the variable names. At this point though I'm worried that changing it would be burdensome for users, e.g. might break scripts.

Anyone else want to weigh in?

@chendaniely
Copy link
Collaborator

burdensome for users, e.g. might break scripts.

do you know how many people are actually using those data? and potentially how many people will be affected?

In the long run, it may actually be more beneficial to standardize names.
Since sorting and tracking will be much easier to do when things are properly sorted.

I want to say nicely named files are probably better for scripts anyway, compare this to trying to parse the current file name.

If you've ever worked with census data, the fixed file naming convention (including how many characters) is a godsend.

@chendaniely
Copy link
Collaborator

you can always supply a renaming script :p

@cmrivers
Copy link
Owner

cmrivers commented Oct 8, 2014

I don't know exactly how many people use them, but I don't think it's a trivial number.

I'm leaning towards standardized renaming, but I think we should leave open this issue until maybe Monday the 13th to give people time to comment. At the very least I think we can sed out the spaces in the filenames, since that annoys even me (and I'm the one who put them there...).

@chendaniely
Copy link
Collaborator

may I suggest 2 digit numbers (e.g., sept 01 vs sept 1)

@donpdonp
Copy link
Contributor

donpdonp commented Oct 8, 2014

Standardizing the filenames would help anyone who will be importing all the data into another tool.

/countries
  /liberia
    2014-09-29.csv
    2014-09-19.csv
  /sierra_leone
    2014-09-29.csv

If there are really different categories from 'casedata', a third level of directories might work, too. The date should follow iso8601.

A filename reorganization is a step along the way to get all the csv into a single data source such as a sqlite database.

@samccone
Copy link
Contributor

samccone commented Oct 8, 2014

👍 @donpdonp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants