New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Make proj-addre one-to-many data and load into database. #497

Merged
merged 3 commits into from Aug 28, 2017

Conversation

Projects
None yet
2 participants
@NealHumphrey
Copy link
Collaborator

NealHumphrey commented Aug 27, 2017

  • Adds an "apiconn" object for the prescat. Currently this just loads the file from disk, but in the future this can be the class that checks for newly uploaded data in an online folder that admins can put data into

  • Adds a function to the base_project.py apiconn that uses the split address function to create a csv file of the one-to-many project to addresses.

  • Moves the split address function out of ingestion and into tools, since it's not used only in ingestion

  • Fixes bug in the MarApiConn - adds a method to query api using address string instead of generic string. Helps eliminate false positives.

  • Changes references to find_location to instead be find_addr_string to eliminate false positives

  • Fixes bug in address splitting code that found and in Marland or Randolph. Changed split string to ' and '.

  • Adds proj_addre table to database

  • DHCD and DCHousing ApiConn methods also create _addre.csv files for upload

  • Entity resolution now done via the one to many mapping of addresses - if any of the addresses match, assumes the whole project matches.

To test

  • Create the prescat_addre.csv file: from the python/housinginsights/sources/ folder run python prescat.py (needs to be added to the options list in get_api_data.py, but should be ready to work there)
  • Add the table, from /python/scripts: python load_data.py docker --update-only proj_addre
  • Get DHCD and DCHousing data sets: python get_api_data.py with the code options edited to request the DCHousing and dhcd modules.
- Adds an "apiconn" object for the prescat. Currently this just loads…
… the file from disk, but in the future this can be the class that checks for newly uploaded data in an online folder that admins can put data into

- Adds a function to the base_project.py apiconn that uses the split address function to create a csv file of the one-to-many project to addresses.
- Moves the split address function out of ingestion and into tools, since it's not used only in ingestion
- Fixes bug in the MarApiConn - adds a method to query api using address string instead of generic string. Helps eliminate false positives.
- Changes references to find_location to instead be find_addr_string to eliminate false positives

TODO
- Starts to add the one-to-many mapping to the create_project_subsidy_csv method for the dhcd and dchousing data sets. This is not complete so it instead breaks this code. Need to load the newly created prescat_addr.csv file into the database first and then resume coding this method.

@NealHumphrey NealHumphrey changed the title Make proj-addre one-to-many data and load into database. WIP: Make proj-addre one-to-many data and load into database. Aug 27, 2017

- Adds proj_addre table to the database.
- in the section of the DHCD and DCHousing apiconn that makes the _subsidy and _project tables, it also makes an _addre table
- Entity resolution in that section that checks for an existing NLHIC_id in the database now uses the _addre table, checking first for an exact address match, then looking for a fuzzy match for that address in the mar, and looking for an exact mar_id match; if neither address or mar_id matches, it assumes it's new. If any of the addresses match any of the addresses in the existing database table, it assumes the whole project matches
- Adds the optional parameters to all of the ApiConn objects so that we can consistently pass database_choice to __init__, whether or not it is used
- Small fix to how address splitting occurs to solve error - @jkwening would be good to check logic.
@NealHumphrey

This comment has been minimized.

Copy link
Collaborator Author

NealHumphrey commented Aug 27, 2017

Updated description to reflect progress w/ most recent commit

@NealHumphrey NealHumphrey requested a review from jkwening Aug 27, 2017

@jkwening
Copy link
Collaborator

jkwening left a comment

Looks good.

@jkwening jkwening merged commit d7f2fbc into dev Aug 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment