2226 lost data on import #2235

adrian-lara · 2020-05-28T19:15:33Z

Any background context you want to provide?

Data could potentially be lost on import (see attached issue).

What's this PR do?

Fixes a bug where multiple Column objects were being created for the same raw import file header.

This caused multiple ColumnMapping objects to be created when only one was expected in order to initially create incoming import records.

How should this be manually tested?

Import a file twice. The first time, don't map incoming columns to unit-aware SEED columns. The second time, map to unit-aware columns. See that the data persists throughout import.

Import a file once, but map twice. The first time mapping, don't map incoming columns to unit-aware SEED columns. The second time (by clicking "Back to Mapping"), map to unit-aware columns. See that the data persists throughout import.

What are the relevant tickets?

#2226

Screenshots (if appropriate)

coveralls · 2020-05-28T19:33:36Z

Coverage decreased (-0.08%) to 76.556% when pulling 39d8d96 on 2226_lost_data_on_import into d68b456 on develop.

adrian-lara · 2020-05-29T18:36:44Z

Leaving a note here - we should discuss if we should patch this to prod.

macintoshpie

I tested and it worked as expected, though I'm not totally clear on how this resolved the issue

macintoshpie · 2020-06-04T15:37:23Z

seed/models/columns.py

+                all_from_cols = Column.objects.filter(
+                    organization=organization,
+                    table_name__in=[None, ''],
+                    column_name=field['from_field'],
+                    is_extra_data=False
+                )
+
+                ColumnMapping.objects.filter(column_raw__id__in=models.Subquery(all_from_cols.values('id'))).delete()
+                all_from_cols.delete()
+
+                from_org_col, _ = Column.objects.get_or_create(
+                    organization=organization,
+                    table_name__in=[None, ''],
+                    column_name=field['from_field'],
+                    is_extra_data=False  # data from header rows in the files are NEVER extra data
+                )


I think some comments in this area would be very helpful at a higher level. e.g. "Since multiple were found, we're deleting these for such and such reason"

Agree - I'll do that.

macintoshpie · 2020-06-04T15:38:18Z

seed/models/columns.py

+                    organization=organization,
+                    table_name__in=[None, ''],
+                    column_name=field['from_field'],
+                    is_extra_data=False  # data from header rows in the files are NEVER extra data


Is it true that data from header rows are never extra data? I thought I could make up an arbitrary column in my csv/excel file and map it as extra data

The type of Column object this is referring to is a representation of the "raw" header row we use for the ColumnMapping model. The ColumnMapping model takes a "PropertyState" or "TaxLotState" Column object (see table_name attribute), and associates that to one of these "raw" Column objects to create the mapping.

The comment could be improved, so I'll do that for this line and the original line in the try block above here.

macintoshpie · 2020-06-04T15:39:20Z

seed/models/columns.py

+                ColumnMapping.objects.filter(column_raw__id__in=models.Subquery(all_from_cols.values('id'))).delete()
+                all_from_cols.delete()
+
+                from_org_col, _ = Column.objects.get_or_create(


For clarity's sake, should this just be Column.objects.create since we deleted above?

Yup! Nice catch!

adrian-lara · 2020-06-04T17:16:53Z

I tested and it worked as expected, though I'm not totally clear on how this resolved the issue

Short version: The units_pint Column attribute triggered an unexpected "create" when get_or_create'ing. This impacted downstream processes involving ColumnMappings.
See my comment here for the slightly longer version: #2226 (comment)

macintoshpie

Looks good to me! 🚀

adrian-lara · 2020-06-04T23:44:55Z

Per our conversation, I'll be building a test to add more confirmation that this is working.

macintoshpie

The additional test looks good 👍

Adrian Lara added 2 commits May 28, 2020 12:50

during import, ignore units_pint during get_or_create for raw Column

c5f87c8

during import, raw Columns with the same name are deduped

1b1f907

adrian-lara force-pushed the 2226_lost_data_on_import branch from 30c529b to 1b1f907 Compare May 28, 2020 19:31

adrian-lara requested review from nllong and axelstudios May 28, 2020 19:57

adrian-lara requested a review from macintoshpie June 2, 2020 22:45

macintoshpie reviewed Jun 4, 2020

View reviewed changes

Adrian Lara and others added 2 commits June 4, 2020 16:27

Clarify Column's _column_fields_to_columns dedup raw col objects logic

9d78506

Merge branch 'develop' into 2226_lost_data_on_import

07502b9

adrian-lara requested a review from macintoshpie June 4, 2020 22:40

macintoshpie approved these changes Jun 4, 2020

View reviewed changes

adrian-lara added the DO NOT MERGE label Jun 4, 2020

Test for data persistence when remapping w/ and w/o unit-aware fields

9cef37f

adrian-lara removed the DO NOT MERGE label Jun 5, 2020

adrian-lara requested a review from macintoshpie June 5, 2020 21:26

Merge branch 'develop' into 2226_lost_data_on_import

39d8d96

macintoshpie approved these changes Jun 5, 2020

View reviewed changes

adrian-lara merged commit 7c539d7 into develop Jun 5, 2020

adrian-lara deleted the 2226_lost_data_on_import branch June 5, 2020 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2226 lost data on import #2235

2226 lost data on import #2235

adrian-lara commented May 28, 2020 •

edited

Loading

coveralls commented May 28, 2020 •

edited

Loading

adrian-lara commented May 29, 2020

macintoshpie left a comment

macintoshpie Jun 4, 2020

adrian-lara Jun 4, 2020

macintoshpie Jun 4, 2020

adrian-lara Jun 4, 2020

macintoshpie Jun 4, 2020

adrian-lara Jun 4, 2020

adrian-lara commented Jun 4, 2020

macintoshpie left a comment

adrian-lara commented Jun 4, 2020

macintoshpie left a comment

2226 lost data on import #2235

2226 lost data on import #2235

Conversation

adrian-lara commented May 28, 2020 • edited Loading

Any background context you want to provide?

What's this PR do?

How should this be manually tested?

What are the relevant tickets?

Screenshots (if appropriate)

coveralls commented May 28, 2020 • edited Loading

adrian-lara commented May 29, 2020

macintoshpie left a comment

Choose a reason for hiding this comment

macintoshpie Jun 4, 2020

Choose a reason for hiding this comment

adrian-lara Jun 4, 2020

Choose a reason for hiding this comment

macintoshpie Jun 4, 2020

Choose a reason for hiding this comment

adrian-lara Jun 4, 2020

Choose a reason for hiding this comment

macintoshpie Jun 4, 2020

Choose a reason for hiding this comment

adrian-lara Jun 4, 2020

Choose a reason for hiding this comment

adrian-lara commented Jun 4, 2020

macintoshpie left a comment

Choose a reason for hiding this comment

adrian-lara commented Jun 4, 2020

macintoshpie left a comment

Choose a reason for hiding this comment

adrian-lara commented May 28, 2020 •

edited

Loading

coveralls commented May 28, 2020 •

edited

Loading