New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update project table #466

Merged
merged 13 commits into from Aug 15, 2017

Conversation

Projects
None yet
2 participants
@NealHumphrey
Copy link
Collaborator

NealHumphrey commented Aug 14, 2017

This branch contains code fixes needed to get all 3 project data sources loaded into the database.
This builds off of the PR #465 and includes all that code too. It also fixes a few bugs with that code.

It also switches the manifest.csv to use the new S3 bucket from the Code for DC AWS account, instead of the one from Neal's personal account that we had been using.

To test:
python load_data.py docker --remove-tables project --update-only project prescat_project prescat_subsidy dchousing_project dchousing_subsidy dhcd_dfd_properties_project dhcd_dfd_properties_subsidy

NealHumphrey added some commits Aug 12, 2017

When using the --update-only flag, if the removal of data matching th…
…e unique_data_id results in an empty table, delete the table before loading the data. This allows for the addition/removal of columns in a single table, as the code will recreate the table after it is deleted, using the current version of meta.json

Note this will not work for tables that have more than one unique_data_id in them, because the table will never be empty

Also handles errors due to missing or not found data in the zone_facts table - if a data field is not found, we want to log the error but don't want to terminate the code, for when this is run on a server.
Adds tool to explicitly remove a list of tables before rebuilding / u…
…pdating the database.

This handles the situation where there are multiple unique_data_ids in a database table but we want to update the table structure without rebuilding the entire database.
Adds the MAR to the database
Merge remote-tracking branch 'jkwening/437-building-permit-update' into merge-projects-enhancements

# Conflicts:
#	python/housinginsights/ingestion/SQLWriter.py
Looks for a matching MAR ID for the records in the DHCD data set. Cur…
…rently it uses a simple string-matching implementation; TODO is to add more advanced info

Splits the DHCD data into 'project' and 'subsidy' tables per the same logic used for the DCHousing data set
Adds a second level test if no match is found in the MAR table of the…
… database, by querying the MAR API

Refactors the methods off of the 'BaseApiConn' into a new 'ProjectBaseApiConn' so that the Project api connection can use the MarApiConn without causing circular import issues when the MarApiConn tries to load the BaseApiConn
Changes both DHCD and DCHousing apiconn objects to use the `ProjectBaseApiConn`
- Updates to use new Prescat project table from late July - adds 3 fi…
…elds, renames Proj_zip field (changed case from Proj_Zip), updates manifest

- Updates all files in manifest to use new S3 bucket on the codefor dc account: `housing-insights` (instead of without the dash, which was bucket name on Neal's personal S3 account)
- Fixes project loading bug re: incomplete refactor of fields needed for rename-census-tracts
- Fixes bugs related to new ProjectBaseApiConn - split into separate file to avoid more circular import issues.
Fixes bug - couldn't download DHCD data if asking for both properties…
… and projects tables due to missing if statement
Adds data to the 'program' and 'portfolio' fields - not as detailed a…
…s we will want, but gets these fields usable for now.
@jkwening
Copy link
Collaborator

jkwening left a comment

Lines 259 - 263 in Cleaner.py was throwing a KeyError. I believe the issue is that lat/lon and x/yxcoord lookups return a json object that has 'Table1' as the first level key so replacing 'returnDataset' key value and removing 'sourceOperation' validation fixed the error. Regarding the latter - it looks like you're trying verify that if address lookup it actually returns an address but this will throw a 'KeyError' if lat/lon or x/ycoord lookup returns empty result so you will need an additional condition for that to work correctly.

@NealHumphrey

This comment has been minimized.

Copy link
Collaborator Author

NealHumphrey commented Aug 15, 2017

Hmm, so it looks like mar_api.find_location() returns a different format than the .reverse_lat_lon and .reverse_geocode() methods, due to the MAR api returning a different result format. I lifted the that set of conditionals from my other code that only used .find_location().

This also means that this line row['mar_id'] = result['Table1'][0]["ADDRESS_ID"] would never have worked if the .find_location() method were implemented - I think it just never got called during the DCHousing data set since we always had lat/lon available.

For now I'll hotfix this by putting the return method underneath each conditional. But, we'll need to consider broader solutions as part of our enhanced dedupe code - and we might want to consider adding some wrapper methods to the MarApiConn that return a consistent format and handle things like finding intersections instead of addresses.

@NealHumphrey

This comment has been minimized.

Copy link
Collaborator Author

NealHumphrey commented Aug 15, 2017

Made change; testing and it works on my machine. Going to merge.

@NealHumphrey NealHumphrey merged commit 7ea5073 into dev Aug 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment