Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constrain uniqueness of column name and move geocoding column creation #4151

Merged
merged 6 commits into from
Jul 19, 2023

Conversation

nllong
Copy link
Member

@nllong nllong commented Jul 18, 2023

Any background context you want to provide?

In #4146 there were two of the exact same columns (geocoded_county) created during the geocoding column creation. They were created within 0.003 seconds of each other. I attempted to recreate the issue but was unable as this was probably a race condition. To fix the issue, a db admin will need to remove the redundant column manually from the database. @RDmitchell - I did this for your LBNL 405 organization if you want to continue testing.

What's this PR do?

To prevent this from happening in the future 2 things were done:

  1. I moved the initial call of the geocoded column creation to before the celery_chain call. Therefore, if this was a parallelization race condition, then the columns should exist when the geocoding runs the get_or_create call.
  2. I added a constraint on the Column to enforce a unique name across the column_name, org, table_name, is_extra_data, and units_pint columns. Using the production database, there were no conflicts when running.

I debated on adding the units_pint to the constraint and ended up adding it because I can see the use case where there are two mapping profiles where one set of data are coming in as m2 and another is coming in as ft2; both of which should be allowed to exist. IF we remove units_pint then the production database will have to resolve some naming conflicts which exist due to old orgs not having the units_pint and defaulting to None. @haneslinger -- I know we discussed this constraint in the past but I don't remember the context.

Other items in the PR:

  • Update shapely to 2.0.1. It was segfaulting on WKT translation with Python 3.9 on osx
  • Remove unneeded print statements

How should this be manually tested?

  • verify unit tests
  • Add geocoding MapQuest key
  • With celery running (non-eager mode)
    • import a file with address (no lat/long) and verify additional geocoding fields (e.g., geocoded_neighborhood)
    • in the inventory page, select a few properties and then select geocode
    • via swagger, go to the geocode_by_ids and add in a property id and org.

What are the relevant tickets?

#4146

Screenshots (if appropriate)

@nllong nllong added the Bug label Jul 18, 2023
@nllong nllong requested a review from axelstudios July 18, 2023 17:19
@nllong nllong force-pushed the 4146-geocode-race-condition branch from f79bb10 to 4e21f3e Compare July 19, 2023 02:27
@axelstudios axelstudios merged commit 358552e into develop Jul 19, 2023
7 checks passed
@axelstudios axelstudios deleted the 4146-geocode-race-condition branch July 19, 2023 21:36
dhaley pushed a commit that referenced this pull request Jul 21, 2023
#4151)

* constrain uniqueness of column name and move geocoding column creation

* update shapely, remove unnecessary prints

* use geos-dev to build shapely

* try to fix chrome install

* do not constrain the units_pint column

* get_or_create for the salesforce tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants