refactor(bsync): use tasks for importing #2144

macintoshpie · 2020-03-16T19:44:51Z

Any background context you want to provide?

Previously BSync files were imported at their own endpoint and parsed in-process (ie blocking).

What's this PR do?

Moves the BSync backend import flow to use the flow of other files (ie background tasks). Additionally it adds a mapping page for buildingsync files, though it doesn't currently do anything other than show the mapping from column name to an xpath.

How should this be manually tested?

Go through this flow with these different files from the directory seed/building_sync/tests/data:

a supported BuildingSync xml file (v2.0-pr1): ex_1.xml
a zip of supported BSync files: valid_xml_ex1_ex2.zip
a broken BSync file (e.g. missing address info): ex_1_no_street_address.xml

Navigate to the upload data modal, and click 'Upload BuildingSync Data'.
Select the file
Click 'continue to mapping', and you should be directed to a mapping page
It should show the set of fields from BuildingSync.BRICR_STRUCT and their XPath. Nothing should be editable on this page! Users are not allowed to change mappings currently
After clicking "May Your Data", you should be redirected to the mapped properties page. If there was an invalid BSync file included, you should see a modal describing why it failed. If all files failed to be mapped (e.g. a zip with all invalid files, or a single invalid xml file), then you should not see the "save mapping" button, nor should you see any table (it should just say "no data").
Go to your properties and the BuildingSync properties should be there.

What are the relevant tickets?

#2137

seed/lib/xml_mapping/reader.py

macintoshpie · 2020-03-18T17:19:31Z

seed/data_importer/tasks.py

@@ -151,7 +155,7 @@ def do_checks(org_id, propertystate_ids, taxlotstate_ids, import_file_id=None):
        # specify the chord as an immutable with .si
        chord(tasks, interval=15)(finish_checking.si(progress_data.key))
    else:
-        finish_checking.s(progress_data.key)
+        progress_data.finish_with_success()


Could someone verify this is ok to do?

If all uploaded bsync files were bad, the front-end would spin forever on checking the progress of this because the backend would always say it was not started. I couldn't figure out why calling finish_checking.s(progress_data.key) wasn't setting the progress data to finished (is it not actually getting called for some reason?), so I just did it here.

Nothing seems wrong with what you have; however, I am concerned with 1) always eager vs not-always eager 2) making sure that this is not orthogonal to #2055

I just ran some tests locally:

Set CELERY_TASK_ALWAYS_EAGER = True CELERY_TASK_EAGER_PROPAGATES = True and not running celery then uploaded a single buildingsync file that was invalid and it still worked as expected.

Using celery it also worked. Let me know if these tests aren't sufficient

excellent. Thanks for verifying!

macintoshpie · 2020-03-18T17:36:33Z

As a note, I ran into issue #1928 while testing. It only occurred once when uploading a large (~250 buildings) file, and from the looks of it it doesn't seem to be specific to this refactoring, but more generally an issue with timescaledb.

seed/models/models.py

coveralls · 2020-03-27T23:05:37Z

Coverage increased (+0.06%) to 74.926% when pulling 0b680d9 on wip/bsync-upload-flow into 8a01627 on bricr-dev.

adrian-lara

Comments on just the backend code. A few really small changes, otherwise this looks good so far 😄

adrian-lara · 2020-03-31T05:26:47Z

seed/lib/xml_mapping/reader.py

+            with zipfile.ZipFile(file_, 'r', zipfile.ZIP_STORED) as openzip:
+                filelist = openzip.infolist()
+                for f in filelist:
+                    if '.xml' in f.filename and '__MACOSX' not in f.filename:


What's the __MACOSX condition for? Quick Google - it looks like this is used to avoid a file that certain versions of Mac's create in their zip files.

This is probably fine, but mostly just curious, is it a .xml file as well? If so, is there any other identifier for BuildingSync files? Thinking about if there were OS-specific files that need to be accounted for (Windows, Linux, etc.). Or, is Mac is the only popular OS to do something like this?

I'm not sure what other OS-specific files there might be. I copy/pasted this from another place where we were parsing BSync zips so I guess it's working so far. I can look further into it if you'd like

I think it's fine. At worst, this catches a corner case.

adrian-lara · 2020-03-31T05:49:23Z

seed/data_importer/tasks.py

-                        new_chunk[key] = unidecode(v)
+                        try:
+                            new_chunk[key] = unidecode(v)
+                        except Exception:


I tried a few files (.xml and .zip), and I couldn't hit this except block.

What's the exception we're hoping to catch? Thinking we should be more specific and explicitly only except on that one.

EDIT:
Not sure how I missed it but it looks like we're actually trying to convert unicode to ASCII here. I think I might have added this try/except block before realizing the zipfiles's read method was returning bytes and that's why I was getting an exception. I will remove the try except because I'm calling .decode() on the bytestring earlier

seed/data_importer/views.py

adrian-lara · 2020-03-31T06:40:00Z

seed/lib/xml_mapping/mapper.py

+}
+
+
+def build_column_mapping(import_file):


I'm not sure I have enough context to review the paths, so just reviewing this method - this file looks good 👍

adrian-lara · 2020-03-31T06:57:33Z

seed/data_importer/tasks.py

+    # TODO: get the custom mapping for the organization
+    try:
+        with transaction.atomic():
+            raw_property_states = PropertyState.objects.filter(id__in=ids).only('extra_data').iterator()


Nice! I haven't used this before, but looking into iterator() on a QS, sounds like this helps speed things up 👍

adrian-lara · 2020-03-31T07:08:54Z

seed/data_importer/tasks.py

+                p_status, property_state, property_view, messages = building_file.process(org.id, import_file.cycle)
+                if not p_status or len(messages.get('errors', [])) > 0:
+                    # failed to create the property, save the messages and skip this file
+                    progress_data.add_file_info(os.path.basename(filename), messages)


I like this pattern of saving the file info to the progress_data 👍

seed/data_importer/models.py

adrian-lara

Changes look good and things work as expected! 👍
Mapping page with everything disabled

Mapping page when trying to map columns/parse the BuildingSync records fails for all records

Mapping page when trying to map columns/parse the BuildingSync records fails for some records

adrian-lara · 2020-03-31T19:57:39Z

seed/static/seed/js/seed.js

+        name: 'mapping_xml',
+        url: '/data/mapping_xml/{importfile_id:int}',
+        templateUrl: static_url + 'seed/partials/mapping_xml.html',
+        controller: 'mapping_controller',


Definitely good for now - just sharing some thoughts on reusing the mapping_controller.

If the xml-specific controller logic expands, we might want to break this out to use it's own controller so we can avoid hitting the API for something like cycles just to avoid bugs.

EDIT: I could see an argument for using this approach, since there's mapping preset logic we might want from the mapping_controller once we get this on develop. Definitely worth thinking about later

adrian-lara · 2020-03-31T20:28:12Z

seed/static/seed/partials/mapping_xml.html

+                  <tbody id="mapped-table">
+                      <tr ng-repeat="col in mappings">
+                          <td style="text-align: right;" ng-class="{'danger': col.is_duplicate || col.suggestion === ''}" ng-attr-id="mapped-row-type-{$:: $index $}">
+                              <select ng-model="col.suggestion_table_name" ng-change="updateInventoryTypeDropdown(); change(col)" ng-disabled="true">


I reviewed this thinking that this file is basically a copy of the existing mapping partial except the ng-disabled="true" was added on these table items.

Given we're not sure what'll need later, so I think keeping all of this is fine.

refactor(bsync): use tasks for importing

1b0df65

macintoshpie added the DO NOT MERGE label Mar 16, 2020

macintoshpie commented Mar 16, 2020

View reviewed changes

seed/lib/xml_mapping/reader.py Show resolved Hide resolved

macintoshpie requested a review from nllong March 17, 2020 17:23

macintoshpie added DO NOT MERGE and removed DO NOT MERGE labels Mar 17, 2020

macintoshpie added 2 commits March 18, 2020 10:17

feat(bsync): track errors when mapping and display in modal

020059a

fix(tasks): finish successfully if no tasks in do_checks

bed2865

macintoshpie commented Mar 18, 2020

View reviewed changes

macintoshpie requested a review from adrian-lara March 18, 2020 17:37

macintoshpie removed the DO NOT MERGE label Mar 18, 2020

chore(bsync): use basname of file path in file_info

b4dee0e

adrian-lara reviewed Mar 24, 2020

View reviewed changes

seed/models/models.py Show resolved Hide resolved

macintoshpie and others added 4 commits March 24, 2020 15:25

chore: add migration for bsync column mapping source type

706ceea

Merge branch 'bricr-dev' into wip/bsync-upload-flow

0f82f68

Merge branch 'bricr-dev' into wip/bsync-upload-flow

850fe79

chore(migrations): merge migrations

cc520d2

adrian-lara reviewed Mar 31, 2020

View reviewed changes

chore: small changes for pr requests

c3b096c

adrian-lara approved these changes Mar 31, 2020

View reviewed changes

chore: pr suggestion cleanup

0b680d9

adrian-lara merged commit 3e360d0 into bricr-dev Mar 31, 2020

adrian-lara deleted the wip/bsync-upload-flow branch March 31, 2020 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(bsync): use tasks for importing #2144

refactor(bsync): use tasks for importing #2144

macintoshpie commented Mar 16, 2020 •

edited

Loading

macintoshpie Mar 18, 2020 •

edited

Loading

nllong Mar 18, 2020

macintoshpie Mar 27, 2020

nllong Mar 27, 2020

macintoshpie commented Mar 18, 2020

coveralls commented Mar 27, 2020 •

edited

Loading

adrian-lara left a comment

adrian-lara Mar 31, 2020

macintoshpie Mar 31, 2020 •

edited

Loading

adrian-lara Mar 31, 2020

adrian-lara Mar 31, 2020

macintoshpie Mar 31, 2020 •

edited

Loading

adrian-lara Mar 31, 2020

adrian-lara Mar 31, 2020

adrian-lara Mar 31, 2020

adrian-lara left a comment

adrian-lara Mar 31, 2020

adrian-lara Mar 31, 2020

refactor(bsync): use tasks for importing #2144

refactor(bsync): use tasks for importing #2144

Conversation

macintoshpie commented Mar 16, 2020 • edited Loading

Any background context you want to provide?

What's this PR do?

How should this be manually tested?

What are the relevant tickets?

macintoshpie Mar 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macintoshpie commented Mar 18, 2020

coveralls commented Mar 27, 2020 • edited Loading

adrian-lara left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macintoshpie Mar 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macintoshpie Mar 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrian-lara left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macintoshpie commented Mar 16, 2020 •

edited

Loading

macintoshpie Mar 18, 2020 •

edited

Loading

coveralls commented Mar 27, 2020 •

edited

Loading

macintoshpie Mar 31, 2020 •

edited

Loading

macintoshpie Mar 31, 2020 •

edited

Loading