Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined ApiConn pull requests #286

Merged
merged 31 commits into from May 23, 2017
Merged

Combined ApiConn pull requests #286

merged 31 commits into from May 23, 2017

Conversation

NealHumphrey
Copy link
Collaborator

This pull request combines code from #278, #283, and #284, and then adds on to it to standardize the method signature of the ApiConn classes. It also replaces the get_api_data.py with a new approach. This resolves issues #274, #187, #260, and #198. Issue #16 is included but still outstanding because we are waiting to find out when opendata.dc.gov will update their 2017 building permits data set.

Overview of changes in this PR:

  • Base class has _available_unique_data_ids list which needs to be included in all classes. Even if there is only one id it should be a list, to ensure consistent behavior between classes that produce one file vs. those that produce multiple.
  • Base class now has an @property called output_paths that returns a dictionary of unique_data_id:absolute_local_output_filepath for all the _available_unique_data_ids. This automatically builds in datestamping.
  • All files are saved to the same datestamped folder using their unique_data_id + ".csv" as the filename. This should simplify the code that needs to manage how to update the manifest.csv to reflect updated data sources - every datestamped folder can be searched for the unique_data_ids that it has in it, and the load_data code can only update the data for the files that it finds.
  • Every ApiConn class now should have a get_data() method. It should be able to run without any mandatory parameters, and it must contain the following (user) optional params: unique_data_ids (to allow for flexibility of not always downloading all data sets), sample (for easier testing by downloading only a few rows), and output_type (normally should be 'csv' but can be 'stdout').
  • Cleaned up the opendata.py class so that it is not necessary to inherit from base class, new 'normal' data sources (those that don't need special processing) can be added just by adding them to the list.

@jkwening Can you review this?
@ajhalani Check this out, maybe you can do a code review as well? It doesn't look like it because I had to rewrite a bunch of stuff so to work with the new approach to get_api_data.py but your code was a very useful influence.

en9inerd and others added 27 commits May 9, 2017 22:58
…esults_to_csv from base class and adding optional parameters to choose which database to use (still working on that)
…same. changed wmata_distcalc.py so that the headers match that
…ent method that will play well with our future need to run all the api classes automatically.
… classes through their get_data method.

- A few edits to the DCHousing class to get it to work with the current system (rename method and switch to use consistent params as other get_data methods)
@NealHumphrey
Copy link
Collaborator Author

@emkap01 Heads up that when this PR is approved it means that the crime, building permits, and tax data sets are all done with the 'write api code' step and need to move on to the 'cleaner/ingestion' part of the process.

…ethod

- updated the get_data signature to include a **kwargs so that any additional arguments only needed by some apiconn objects can be ignored by others (e.g. database choice `db`)
- Some general cleanup of wmataapiconn as noted in the code review.
@ajhalani
Copy link
Collaborator

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants