Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(bsync): use tasks for importing #2144

Merged
merged 10 commits into from
Mar 31, 2020
Merged

Conversation

macintoshpie
Copy link
Contributor

@macintoshpie macintoshpie commented Mar 16, 2020

Any background context you want to provide?

Previously BSync files were imported at their own endpoint and parsed in-process (ie blocking).

What's this PR do?

Moves the BSync backend import flow to use the flow of other files (ie background tasks). Additionally it adds a mapping page for buildingsync files, though it doesn't currently do anything other than show the mapping from column name to an xpath.

How should this be manually tested?

Go through this flow with these different files from the directory seed/building_sync/tests/data:

  • a supported BuildingSync xml file (v2.0-pr1): ex_1.xml
  • a zip of supported BSync files: valid_xml_ex1_ex2.zip
  • a broken BSync file (e.g. missing address info): ex_1_no_street_address.xml
  1. Navigate to the upload data modal, and click 'Upload BuildingSync Data'.
  2. Select the file
  3. Click 'continue to mapping', and you should be directed to a mapping page
  4. It should show the set of fields from BuildingSync.BRICR_STRUCT and their XPath. Nothing should be editable on this page! Users are not allowed to change mappings currently
  5. After clicking "May Your Data", you should be redirected to the mapped properties page. If there was an invalid BSync file included, you should see a modal describing why it failed. If all files failed to be mapped (e.g. a zip with all invalid files, or a single invalid xml file), then you should not see the "save mapping" button, nor should you see any table (it should just say "no data").
  6. Go to your properties and the BuildingSync properties should be there.

What are the relevant tickets?

#2137

@@ -151,7 +155,7 @@ def do_checks(org_id, propertystate_ids, taxlotstate_ids, import_file_id=None):
# specify the chord as an immutable with .si
chord(tasks, interval=15)(finish_checking.si(progress_data.key))
else:
finish_checking.s(progress_data.key)
progress_data.finish_with_success()
Copy link
Contributor Author

@macintoshpie macintoshpie Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could someone verify this is ok to do?

If all uploaded bsync files were bad, the front-end would spin forever on checking the progress of this because the backend would always say it was not started. I couldn't figure out why calling finish_checking.s(progress_data.key) wasn't setting the progress data to finished (is it not actually getting called for some reason?), so I just did it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing seems wrong with what you have; however, I am concerned with 1) always eager vs not-always eager 2) making sure that this is not orthogonal to #2055

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran some tests locally:

Set CELERY_TASK_ALWAYS_EAGER = True CELERY_TASK_EAGER_PROPAGATES = True and not running celery then uploaded a single buildingsync file that was invalid and it still worked as expected.

Using celery it also worked. Let me know if these tests aren't sufficient

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent. Thanks for verifying!

@macintoshpie
Copy link
Contributor Author

As a note, I ran into issue #1928 while testing. It only occurred once when uploading a large (~250 buildings) file, and from the looks of it it doesn't seem to be specific to this refactoring, but more generally an issue with timescaledb.

@coveralls
Copy link

coveralls commented Mar 27, 2020

Coverage Status

Coverage increased (+0.06%) to 74.926% when pulling 0b680d9 on wip/bsync-upload-flow into 8a01627 on bricr-dev.

Copy link
Contributor

@adrian-lara adrian-lara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on just the backend code. A few really small changes, otherwise this looks good so far 😄

with zipfile.ZipFile(file_, 'r', zipfile.ZIP_STORED) as openzip:
filelist = openzip.infolist()
for f in filelist:
if '.xml' in f.filename and '__MACOSX' not in f.filename:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the __MACOSX condition for? Quick Google - it looks like this is used to avoid a file that certain versions of Mac's create in their zip files.

This is probably fine, but mostly just curious, is it a .xml file as well? If so, is there any other identifier for BuildingSync files? Thinking about if there were OS-specific files that need to be accounted for (Windows, Linux, etc.). Or, is Mac is the only popular OS to do something like this?

Copy link
Contributor Author

@macintoshpie macintoshpie Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what other OS-specific files there might be. I copy/pasted this from another place where we were parsing BSync zips so I guess it's working so far. I can look further into it if you'd like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. At worst, this catches a corner case.

new_chunk[key] = unidecode(v)
try:
new_chunk[key] = unidecode(v)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few files (.xml and .zip), and I couldn't hit this except block.

What's the exception we're hoping to catch? Thinking we should be more specific and explicitly only except on that one.

Copy link
Contributor Author

@macintoshpie macintoshpie Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT:
Not sure how I missed it but it looks like we're actually trying to convert unicode to ASCII here. I think I might have added this try/except block before realizing the zipfiles's read method was returning bytes and that's why I was getting an exception. I will remove the try except because I'm calling .decode() on the bytestring earlier

seed/data_importer/views.py Show resolved Hide resolved
}


def build_column_mapping(import_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have enough context to review the paths, so just reviewing this method - this file looks good 👍

# TODO: get the custom mapping for the organization
try:
with transaction.atomic():
raw_property_states = PropertyState.objects.filter(id__in=ids).only('extra_data').iterator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I haven't used this before, but looking into iterator() on a QS, sounds like this helps speed things up 👍

p_status, property_state, property_view, messages = building_file.process(org.id, import_file.cycle)
if not p_status or len(messages.get('errors', [])) > 0:
# failed to create the property, save the messages and skip this file
progress_data.add_file_info(os.path.basename(filename), messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this pattern of saving the file info to the progress_data 👍

seed/data_importer/models.py Show resolved Hide resolved
Copy link
Contributor

@adrian-lara adrian-lara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good and things work as expected! 👍
Mapping page with everything disabled
Screen Shot 2020-03-31 at 2 34 55 PM

Mapping page when trying to map columns/parse the BuildingSync records fails for all records
Screen Shot 2020-03-31 at 2 35 50 PM

Mapping page when trying to map columns/parse the BuildingSync records fails for some records
Screen Shot 2020-03-31 at 2 36 33 PM

name: 'mapping_xml',
url: '/data/mapping_xml/{importfile_id:int}',
templateUrl: static_url + 'seed/partials/mapping_xml.html',
controller: 'mapping_controller',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely good for now - just sharing some thoughts on reusing the mapping_controller.

If the xml-specific controller logic expands, we might want to break this out to use it's own controller so we can avoid hitting the API for something like cycles just to avoid bugs.

EDIT: I could see an argument for using this approach, since there's mapping preset logic we might want from the mapping_controller once we get this on develop. Definitely worth thinking about later

<tbody id="mapped-table">
<tr ng-repeat="col in mappings">
<td style="text-align: right;" ng-class="{'danger': col.is_duplicate || col.suggestion === ''}" ng-attr-id="mapped-row-type-{$:: $index $}">
<select ng-model="col.suggestion_table_name" ng-change="updateInventoryTypeDropdown(); change(col)" ng-disabled="true">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this thinking that this file is basically a copy of the existing mapping partial except the ng-disabled="true" was added on these table items.

Given we're not sure what'll need later, so I think keeping all of this is fine.

@adrian-lara adrian-lara merged commit 3e360d0 into bricr-dev Mar 31, 2020
@adrian-lara adrian-lara deleted the wip/bsync-upload-flow branch March 31, 2020 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants