Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2226 lost data on import #2235

Merged
merged 6 commits into from
Jun 5, 2020
Merged

2226 lost data on import #2235

merged 6 commits into from
Jun 5, 2020

Conversation

adrian-lara
Copy link
Contributor

@adrian-lara adrian-lara commented May 28, 2020

Any background context you want to provide?

Data could potentially be lost on import (see attached issue).

What's this PR do?

Fixes a bug where multiple Column objects were being created for the same raw import file header.

This caused multiple ColumnMapping objects to be created when only one was expected in order to initially create incoming import records.

How should this be manually tested?

Import a file twice. The first time, don't map incoming columns to unit-aware SEED columns. The second time, map to unit-aware columns. See that the data persists throughout import.

Import a file once, but map twice. The first time mapping, don't map incoming columns to unit-aware SEED columns. The second time (by clicking "Back to Mapping"), map to unit-aware columns. See that the data persists throughout import.

What are the relevant tickets?

#2226

Screenshots (if appropriate)

@coveralls
Copy link

coveralls commented May 28, 2020

Coverage Status

Coverage decreased (-0.08%) to 76.556% when pulling 39d8d96 on 2226_lost_data_on_import into d68b456 on develop.

@adrian-lara
Copy link
Contributor Author

Leaving a note here - we should discuss if we should patch this to prod.

Copy link
Contributor

@macintoshpie macintoshpie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and it worked as expected, though I'm not totally clear on how this resolved the issue

Comment on lines 992 to 1007
all_from_cols = Column.objects.filter(
organization=organization,
table_name__in=[None, ''],
column_name=field['from_field'],
is_extra_data=False
)

ColumnMapping.objects.filter(column_raw__id__in=models.Subquery(all_from_cols.values('id'))).delete()
all_from_cols.delete()

from_org_col, _ = Column.objects.get_or_create(
organization=organization,
table_name__in=[None, ''],
column_name=field['from_field'],
is_extra_data=False # data from header rows in the files are NEVER extra data
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some comments in this area would be very helpful at a higher level. e.g. "Since multiple were found, we're deleting these for such and such reason"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - I'll do that.

organization=organization,
table_name__in=[None, ''],
column_name=field['from_field'],
is_extra_data=False # data from header rows in the files are NEVER extra data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that data from header rows are never extra data? I thought I could make up an arbitrary column in my csv/excel file and map it as extra data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of Column object this is referring to is a representation of the "raw" header row we use for the ColumnMapping model. The ColumnMapping model takes a "PropertyState" or "TaxLotState" Column object (see table_name attribute), and associates that to one of these "raw" Column objects to create the mapping.

The comment could be improved, so I'll do that for this line and the original line in the try block above here.

ColumnMapping.objects.filter(column_raw__id__in=models.Subquery(all_from_cols.values('id'))).delete()
all_from_cols.delete()

from_org_col, _ = Column.objects.get_or_create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity's sake, should this just be Column.objects.create since we deleted above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Nice catch!

@adrian-lara
Copy link
Contributor Author

I tested and it worked as expected, though I'm not totally clear on how this resolved the issue

Short version: The units_pint Column attribute triggered an unexpected "create" when get_or_create'ing. This impacted downstream processes involving ColumnMappings.
See my comment here for the slightly longer version: #2226 (comment)

Copy link
Contributor

@macintoshpie macintoshpie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 🚀

@adrian-lara
Copy link
Contributor Author

Per our conversation, I'll be building a test to add more confirmation that this is working.

Copy link
Contributor

@macintoshpie macintoshpie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional test looks good 👍

@adrian-lara adrian-lara merged commit 7c539d7 into develop Jun 5, 2020
@adrian-lara adrian-lara deleted the 2226_lost_data_on_import branch June 5, 2020 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants