Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Django 1.9 and store parameters as JSON object #822

Merged
merged 118 commits into from Feb 26, 2018

Conversation

hdoupe
Copy link
Collaborator

@hdoupe hdoupe commented Feb 6, 2018

This PR resolves #811 and implements the plan proposed in this comment.

Upgrading to Django 1.9 gives us access to the JSON column type which was included with Postgres 9.4 (the version used by PolicyBrain). Several important pieces of data that are either already JSON objects or are more usefully stored as JSON objects are:

  1. raw input from the GUI
  2. raw file input
  3. parsed and cleaned GUI input
  4. results from the model run

The goal of this PR is to store the raw input from the GUI as a JSON object.

See Django release notes for 1.8 and 1.9 for further context on the changes requireed for the Django update.

This PR is based off of changes in #814.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Feb 7, 2018

Commit 80d715a removes the jsonfield import. This library raises deprecation warnings for Django >= 1.9 and is incompatible with Django >= 1.0.

Even when using the jsonfields.fields.JSONField class, if you have a postgres database of version less than 9.4, then the underlying postgres column is saved as type text instead of json or jsonb. See rpkilby/jsonfield#140 for more information.

That means that you have to do add a new migration that specifies that you want to use django.contrib.postgres.fields.JSONField in order to actually change the underlying postgres type to json or jsonb. I did this in commit d647e32

This was tested with the following procedure:

  1. Duplicate the current production version of the postgres database
  2. Apply these migrations to the duplicated database
  3. Make sure that the data did not change
    Steps 1 and 2:
(aei_dropq) HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ createdb -T pb_prod_fixed pb_prod_fixed_jsonfields_to_text_fix
(aei_dropq) HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ echo $DATABASE_URL 
postgresql://localhost/pb_test_fixed_jsonfields_to_text_fix
(aei_dropq) HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ export DATABASE_URL=postgresql://localhost/pb_prod_fixed_jsonfields_to_text_fix
(aei_dropq) HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ python manage.py runserver
Performing system checks...

System check identified no issues (0 silenced).

You have unapplied migrations; your app may not work properly until they are applied.
Run 'python manage.py migrate' to apply them.

February 07, 2018 - 18:34:14
Django version 1.9, using settings 'webapp.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
^C(aei_dropq) HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ python manage.py migrate
Operations to perform:
  Apply all migrations: account, sessions, admin, flatblocks, register, dynamic, sites, auth, taxbrain, contenttypes, btax
Running migrations:
  Rendering model states... DONE
  Applying account.0004_auto_20170416_1821... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying btax.0003_auto_20180205_2151... OK
  Applying dynamic.0017_auto_20180205_2004... OK
  Applying dynamic.0018_auto_20180205_2025... OK
  Applying sites.0002_alter_domain_unique... OK
  Applying taxbrain.0062_auto_20180118_2120... OK
  Applying taxbrain.0063_auto_20180118_2125... OK
  Applying taxbrain.0064_auto_20180118_2133... OK
  Applying taxbrain.0065_auto_20180118_2137... OK
  Applying taxbrain.0066_taxsaveinputs_ald_alimonyreceived_hc... OK
  Applying taxbrain.0067_auto_20180119_1513... OK
  Applying taxbrain.0068_auto_20180119_1515... OK
  Applying taxbrain.0069_auto_20180205_2004... OK
  Applying taxbrain.0070_auto_20180205_2025... OK

Step 3:
Script:

from sqlalchemy import create_engine
import pandas as pd
import numpy as np
import json
import traceback

def get_isnull_df(engine):
    isnull_query = """
    select url.id url_id, tsi.id id, url.unique_inputs_id unique_inputs_id,
    url.model_pk model_pk, tsi.tax_result is null as result_isnull, tsi.tax_result as tax_result,
    tsi.creation_date creation_date,
    url.taxcalc_vers taxcalc_vers, url.webapp_vers webapp_vers
    from taxbrain_taxsaveinputs tsi
    join taxbrain_outputurl url
    on tsi.id = url.unique_inputs_id
    where tsi.tax_result is not null
    and url.id %% 20 = 0
    order by url.id desc
    """
    print("running query...")
    isnull_df = pd.read_sql_query(isnull_query, con=engine)
    isnull_df.sort_values("unique_inputs_id", inplace=True)
    # isnull_df.to_csv('isnull_df.csv', index=False)
    print("returning...")
    return isnull_df

nonmigrated_engine = create_engine("postgresql://localhost/pb_prod_fixed")
migrated_engine = create_engine("postgresql://localhost/pb_prod_fixed_jsonfields_to_text_fix")

print("querying databases...")
nonmigrated_df = get_isnull_df(nonmigrated_engine)
migrated_df = get_isnull_df(migrated_engine)

print("quick assertion...")
nonmig_isnull, mig_isnull = nonmigrated_df.result_isnull.sum(), migrated_df.result_isnull.sum()
print("num not null for (nonmigrated, migrated): ",len(nonmigrated_df) - nonmig_isnull, len(migrated_df) - mig_isnull)
assert nonmig_isnull == mig_isnull

print("parsing json data...")
nonmigrated_df["topython_tax_result"] = nonmigrated_df.tax_result.apply(json.loads)
migrated_df["topython_tax_result"] = migrated_df.tax_result.apply(lambda x: json.loads(json.dumps(x)))
# print(nonmigrated_df[["tax_result", "topython_tax_result"]])
# print(migrated_df[["tax_result", "topython_tax_result"]])

print("starting assertion loop...")
for url_id, exp, act in zip(nonmigrated_df.url_id.values, nonmigrated_df.topython_tax_result.values, migrated_df.topython_tax_result.values):
    try:
        np.testing.assert_equal(act, exp)
        # print("passed: ", url_id)
    except AssertionError as ae:
        print("failed: ", url_id)
        traceback.print_exc()

Output:

(aei_dropq) HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ python results_same_after_migration.py 
querying databases...
running query...
returning...
running query...
returning...
quick assertion...
('num not null for (nonmigrated, migrated): ', 1334, 1334)
parsing json data...
starting assertion loop...
(aei_dropq) HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ 

I also checked the types in the database, pb_prod_fixed_jsonfields_to_text_fix to make sure that the type was changed from text to jsonb.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Feb 22, 2018

PR #822 is on the test app at http://ospc-taxes7.herokuapp.com/.

I encourage everyone to do their best to try to break the changes pushed by this PR. This is a really big release and a large portion of the TaxBrain internals were either changed or refactored.

Run 1533 is a good example for checking out the new parameter deprecation error. Note that only parameters that have changed since the previous version of PolicyBrain qualify for this message. The policy up to now has been to rename columns to what we thought was the parameter's new name.

Note that CCC is down on the test app until we figure out how to update the anaconda package.

Also, I am now having trouble loading runs submitted before 2016, but the problems described in #812 and #813 are resolved.

I will update the RELEASES.md file with a comprehensive list of the changes going into this release.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Feb 23, 2018

@MattHJensen @martinholmer commit f95a420 adds a document describing PolicyBrain's approach to parsing parameters and displaying deprecated parameters. I'm interested in any thoughts on this that you may have.

@martinholmer
Copy link
Contributor

@hdoupe said:

commit f95a420 adds a document describing PolicyBrain's approach to parsing parameters and displaying deprecated parameters. I'm interested in any thoughts on this that you may have.

Look reasonable to me.
Thanks for the massive amount of work you've done on this highly sensible improvement!

@MattHJensen
Copy link
Contributor

commit f95a420 adds a document describing PolicyBrain's approach to parsing parameters and displaying deprecated parameters. I'm interested in any thoughts on this that you may have.

This looks good to me, too.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Feb 26, 2018

@martinholmer and @MattHJensen Thanks for the review. I plan to merge this and do a release as soon as I resolve some minor issues with CCC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants