New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined ApiConn pull requests #286

Merged
merged 31 commits into from May 23, 2017

Conversation

Projects
None yet
7 participants
@NealHumphrey
Copy link
Collaborator

NealHumphrey commented May 23, 2017

This pull request combines code from #278, #283, and #284, and then adds on to it to standardize the method signature of the ApiConn classes. It also replaces the get_api_data.py with a new approach. This resolves issues #274, #187, #260, and #198. Issue #16 is included but still outstanding because we are waiting to find out when opendata.dc.gov will update their 2017 building permits data set.

Overview of changes in this PR:

  • Base class has _available_unique_data_ids list which needs to be included in all classes. Even if there is only one id it should be a list, to ensure consistent behavior between classes that produce one file vs. those that produce multiple.
  • Base class now has an @property called output_paths that returns a dictionary of unique_data_id:absolute_local_output_filepath for all the _available_unique_data_ids. This automatically builds in datestamping.
  • All files are saved to the same datestamped folder using their unique_data_id + ".csv" as the filename. This should simplify the code that needs to manage how to update the manifest.csv to reflect updated data sources - every datestamped folder can be searched for the unique_data_ids that it has in it, and the load_data code can only update the data for the files that it finds.
  • Every ApiConn class now should have a get_data() method. It should be able to run without any mandatory parameters, and it must contain the following (user) optional params: unique_data_ids (to allow for flexibility of not always downloading all data sets), sample (for easier testing by downloading only a few rows), and output_type (normally should be 'csv' but can be 'stdout').
  • Cleaned up the opendata.py class so that it is not necessary to inherit from base class, new 'normal' data sources (those that don't need special processing) can be added just by adding them to the list.

@jkwening Can you review this?
@ajhalani Check this out, maybe you can do a code review as well? It doesn't look like it because I had to rewrite a bunch of stuff so to work with the new approach to get_api_data.py but your code was a very useful influence.

eng1nerd and others added some commits May 10, 2017

Made modifications discussed in pull request #259 EXCEPT FOR: using r…
…esults_to_csv from base class and adding optional parameters to choose which database to use (still working on that)
changed meta.json so the source names and sql names in wmata are the …
…same. changed wmata_distcalc.py so that the headers match that
Refactored all the opendata classes into one that uses a new, consist…
…ent method that will play well with our future need to run all the api classes automatically.
- Replaced the get_api_data with a method that can handle all ApiConn…
… classes through their get_data method.

- A few edits to the DCHousing class to get it to work with the current system (rename method and switch to use consistent params as other get_data methods)
@NealHumphrey

This comment has been minimized.

Copy link
Collaborator Author

NealHumphrey commented May 23, 2017

@emkap01 Heads up that when this PR is approved it means that the crime, building permits, and tax data sets are all done with the 'write api code' step and need to move on to the 'cleaner/ingestion' part of the process.

@NealHumphrey NealHumphrey referenced this pull request May 23, 2017

Closed

Access Census data via API instead of Fact Finder #152

0 of 3 tasks complete

NealHumphrey added some commits May 23, 2017

- Updated the WmataApiConn object to work with the new get_api_data m…
…ethod

- updated the get_data signature to include a **kwargs so that any additional arguments only needed by some apiconn objects can be ignored by others (e.g. database choice `db`)
- Some general cleanup of wmataapiconn as noted in the code review.
@ajhalani

This comment has been minimized.

Copy link
Collaborator

ajhalani commented May 23, 2017

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment