New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove table during update #462

Merged
merged 3 commits into from Aug 15, 2017

Conversation

Projects
None yet
2 participants
@NealHumphrey
Copy link
Collaborator

NealHumphrey commented Aug 12, 2017

Provides 2 ways for a table to be dropped (along w/ associated sql manifest rows) when updating the database. This allows for the addition/removal of columns while updating a single table, instead of only being able to do so by rebuilding the whole database.

Also includes a few try/except blocks for the zone_facts table. Add logging.error when a field could not be found/calculated due to missing data - this means the rebuilding of the database will still work even if a necessary table (building_permits, census, etc. ) is missing. Allows for faster testing of things that only apply to certain sections of the database b/c you can add 'skips' to any tables you're not currently working on.

NealHumphrey added some commits Aug 12, 2017

When using the --update-only flag, if the removal of data matching th…
…e unique_data_id results in an empty table, delete the table before loading the data. This allows for the addition/removal of columns in a single table, as the code will recreate the table after it is deleted, using the current version of meta.json

Note this will not work for tables that have more than one unique_data_id in them, because the table will never be empty

Also handles errors due to missing or not found data in the zone_facts table - if a data field is not found, we want to log the error but don't want to terminate the code, for when this is run on a server.
Adds tool to explicitly remove a list of tables before rebuilding / u…
…pdating the database.

This handles the situation where there are multiple unique_data_ids in a database table but we want to update the table structure without rebuilding the entire database.

@NealHumphrey NealHumphrey requested a review from jkwening Aug 12, 2017

@NealHumphrey

This comment has been minimized.

Copy link
Collaborator Author

NealHumphrey commented Aug 12, 2017

To test:

Before running the command, check the 'crime' table and sql manifest table - there should be 3 crime data sets: 2015, 2016 and 2017.

From the /python/scripts folder, run this command:
python load_data.py docker --remove-tables crime --update-only crime_2016 crime_2017

What should happen:
This will first drop the crime table, and then update the database with the 2016 and 2017 crime data.
The manifest sql table will now only have two crime entries, matching these two fields, as will the crime data set.

To restore your database to original state by adding in the missing data set:
python load_data.py docker --update-only crime_2015

@jkwening
Copy link
Collaborator

jkwening left a comment

Looks good - I went ahead and resolved merge conflict. Approving this since other PR has issue which needs resolving.

@jkwening jkwening merged commit eb5d8b0 into dev Aug 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment