Skip to content

Conversation

@lainsworth8801
Copy link
Contributor

@lainsworth8801 lainsworth8801 commented Nov 12, 2019

Issue #1480
Added leading zeros for postal codes under 5 digits;
Added leading zeros for postal codes with 4 digits extension;
Added test excel sheet with postal codes added (fake column name for user mapping purpose).

original uploaded:
Screen Shot 2019-11-12 at 10 29 31 AM

Map Data Property Saved
Screen Shot 2019-11-12 at 10 30 01 AM Screen Shot 2019-11-12 at 10 36 41 AM

@nllong
Copy link
Member

nllong commented Nov 12, 2019

@lainsworth8801 -- thanks for the PR. Can you add the issue ID into the PR description? That way we know what issue to test. Thanks!

Copy link
Member

@nllong nllong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. This seems like it should work. Few changes:

  • Add tests
  • update this branch with current develop

Then re-tag me to review.

Thanks!

@coveralls
Copy link

coveralls commented Nov 13, 2019

Coverage Status

Coverage increased (+0.03%) to 75.724% when pulling 86af8b5 on zip into 321745c on develop.

nllong
nllong previously approved these changes Nov 18, 2019
Copy link
Member

@nllong nllong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I made a small change to support the owner postal code as well.

Is a zip code of 0 suppose to stay 0 or resolve to 00000?

image

Also, note that the mapping screen does not pad the zip codes; however, I think this is right because the user is seeing the raw data from the spreadsheet. Once they choose to map to postal code then it will get converted.

image

Also, do you think we should go back and update all the zip codes in the database with a migration (and rehash)? I think this is mostly a question for @adrian-lara, @axelstudios and/or @ClearlyEnergy.

if '-' in ts.postal_code:
self.assertEqual(len(ts.postal_code.split('-')[0]), 5)
self.assertEqual(len(ts.postal_code.split('-')[1]), 4)
self.assertEqual(ts.postal_code.split('-')[0].lstrip('0'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making these explicit checks would be nice, (i.e., something like self.assertEqual(ts.postal_code.split('-')[0].lstrip('0'), '05005'))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change that. Was thinking about doing a general format check instead of spot checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For whether '0' zip should stay '0' or be packed with 0s, my reason of packing it with 0s is to show users that this is supposed to be a zip code but it's 0. Although I'm not sure if this is a better idea than leaving it as it is which is just a single digit 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we want this to be packed, that is 0 -> 00000. Right now it doesn't appear to do that.

table_name, mapped_column_name, display_name, is_extra_data = mapping.get(raw_column_name)

# special postal case:
if mapped_column_name == 'postal_code':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

@nllong
Copy link
Member

nllong commented Nov 18, 2019 via email

if mapped_column_name in ['postal_code', 'owner_postal_code']:
if column_value:
if '-' in str(column_value):
postal = str(column_value).split('-')[0].zfill(5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching the owner_postal_code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nllong I recalled wrong...the reason it's not packing 0 zip with 0s is because i checked column_value here before I convert. Will change that.

@adrian-lara
Copy link
Contributor

Also, do you think we should go back and update all the zip codes in the database with a migration (and rehash)? I think this is mostly a question for @adrian-lara, @axelstudios and/or @ClearlyEnergy.

I think a migration to update zip codes on pre-existing records and recalculating hash_object would probably be needed.

ext = str(column_value).split('-')[1].zfill(4)
column_value = postal + '-' + ext
column_value = str(column_value).zfill(5)
if '-' in str(column_value):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2019-11-18 at 11 04 44 AM

@axelstudios
Copy link
Member

Yeah, I like the idea of a migration to fix existing data and then rehash

@RDmitchell
Copy link

@axelstudios -- I would like testing the migration on existing data to be something that is in the Test column of the Dec 19 Github project. Not sure how to do that, but if a PR comes through for that migration, I will add it to the Test column.

@nllong
Copy link
Member

nllong commented Nov 19, 2019

@lainsworth8801 -- can you add in a migration for this? See this as an example: https://github.com/SEED-platform/seed/blob/develop/seed/migrations/0111_rehash.py
Thanks!

@nllong
Copy link
Member

nllong commented Dec 6, 2019

@lainsworth8801 -- can you add in a migration for this? See this as an example: https://github.com/SEED-platform/seed/blob/develop/seed/migrations/0111_rehash.py
Thanks!

Hey @lainsworth8801 -- can you implement this rehashing migration before we merge this PR?

@lainsworth8801
Copy link
Contributor Author

@nllong yes sorry was gonna work on this next

Copy link
Member

@nllong nllong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lainsworth8801 Thanks! Did you test running this migration on a dump of the existing production database?

@lainsworth8801
Copy link
Contributor Author

I only ran it on my local db which is a very small size db. i'll add a run on the production.

@nllong
Copy link
Member

nllong commented Dec 10, 2019

I only ran it on my local db which is a very small size db. i'll add a run on the production.

Thanks. Mainly concerned because you are updating the postal codes here and want to make sure that it works. What if postal code is a number/string/double/etc.

@adrian-lara
Copy link
Contributor

I think a migration to update zip codes on pre-existing records and recalculating hash_object would probably be needed.

Back tracking on this previous comment after conversations with Lin and a closer look at the proposed migration. I don't think an additional rehashing is needed after updating the postal_code values since the .save() is being run for each -State object. SEED already triggers an update of hashes given this command.

@nllong @axelstudios What do you guys think?

@nllong
Copy link
Member

nllong commented Dec 11, 2019

I think you are right @adrian-lara. Since we are calling .save(), then we only need to find the records that need to be updated. I think we should run with SQL (or Django)... something like:

from django.db.models import Q
PropertyState.objects.filter(Q(postal_code__iregex=r'\b\d{4,5}-\d{1,4}\b/') | Q(postal_code__iregex='\b\d{4,5}\b'))

I did not test the above command... but this would save iterating over all the records.


property_sql = (
"UPDATE seed_propertystate " +
"SET created = seed_propertyauditlog.created, updated = seed_propertyauditlog.created " +
Copy link
Contributor

@adrian-lara adrian-lara Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd only need to restore the updated field and not the created field here and in the tax lot query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we don't need to SET the created nor updated field in either of these... just the postal_code, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated my comment (meant to switch created and updated). I'd think we'd need to "revert" the updated field since changing postal_codes one at a time using save() will change the updated time for each record.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, I'm just assuming we wouldn't actually want the updated times to change, but I could definitely understand wanting those values to change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. I think it is okay if the updated times change as it really is a change.

@adrian-lara
Copy link
Contributor

I think you are right @adrian-lara. Since we are calling .save(), then we only need to find the records that need to be updated. I think we should run with SQL (or Django)... something like:

from django.db.models import Q
PropertyState.objects.filter(Q(postal_code__iregex=r'\b\d{4,5}-\d{1,4}\b/') | Q(postal_code__iregex='\b\d{4,5}\b'))

I did not test the above command... but this would save iterating over all the records.

Yeah something like that would help decrease the number of records getting updated.

@nllong nllong dismissed their stale review December 14, 2019 03:11

We need to handle the rehash better. See comments in the PR.

@nllong
Copy link
Member

nllong commented Dec 14, 2019

@lainsworth8801 -- did you have a chance to update this rehash method to filter down to only the fields that are going to be updated? Let me know if you can get to it otherwise I'll work on it.

@lainsworth8801
Copy link
Contributor Author

@nllong was trying to fix the Xcode version issue for timescales on Friday to work on this but didn’t get far. I’ll be back on Monday it could wait through the weekend. My apologies.

@nllong
Copy link
Member

nllong commented Dec 16, 2019

@lainsworth8801 I just updated the migration to fix the dependency error. I think this PR is really close can you update the migration to only find records that have 4 zip code characters and update only those records? As Adrian points out you only need to run .save() to have the hash get updated.

Note that you probably will need to recreate your local database since I merged in some other db migrations.

@lainsworth8801
Copy link
Contributor Author

@nllong definitely. Thanks nick.

@nllong nllong merged commit 1746131 into develop Dec 19, 2019
@nllong nllong deleted the zip branch December 19, 2019 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants