-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(bsync): use tasks for importing #2144
Conversation
@@ -151,7 +155,7 @@ def do_checks(org_id, propertystate_ids, taxlotstate_ids, import_file_id=None): | |||
# specify the chord as an immutable with .si | |||
chord(tasks, interval=15)(finish_checking.si(progress_data.key)) | |||
else: | |||
finish_checking.s(progress_data.key) | |||
progress_data.finish_with_success() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could someone verify this is ok to do?
If all uploaded bsync files were bad, the front-end would spin forever on checking the progress of this because the backend would always say it was not started. I couldn't figure out why calling finish_checking.s(progress_data.key)
wasn't setting the progress data to finished (is it not actually getting called for some reason?), so I just did it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing seems wrong with what you have; however, I am concerned with 1) always eager vs not-always eager 2) making sure that this is not orthogonal to #2055
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ran some tests locally:
Set CELERY_TASK_ALWAYS_EAGER = True CELERY_TASK_EAGER_PROPAGATES = True
and not running celery then uploaded a single buildingsync file that was invalid and it still worked as expected.
Using celery it also worked. Let me know if these tests aren't sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent. Thanks for verifying!
As a note, I ran into issue #1928 while testing. It only occurred once when uploading a large (~250 buildings) file, and from the looks of it it doesn't seem to be specific to this refactoring, but more generally an issue with timescaledb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on just the backend code. A few really small changes, otherwise this looks good so far 😄
with zipfile.ZipFile(file_, 'r', zipfile.ZIP_STORED) as openzip: | ||
filelist = openzip.infolist() | ||
for f in filelist: | ||
if '.xml' in f.filename and '__MACOSX' not in f.filename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the __MACOSX condition for? Quick Google - it looks like this is used to avoid a file that certain versions of Mac's create in their zip files.
This is probably fine, but mostly just curious, is it a .xml file as well? If so, is there any other identifier for BuildingSync files? Thinking about if there were OS-specific files that need to be accounted for (Windows, Linux, etc.). Or, is Mac is the only popular OS to do something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what other OS-specific files there might be. I copy/pasted this from another place where we were parsing BSync zips so I guess it's working so far. I can look further into it if you'd like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. At worst, this catches a corner case.
seed/data_importer/tasks.py
Outdated
new_chunk[key] = unidecode(v) | ||
try: | ||
new_chunk[key] = unidecode(v) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a few files (.xml and .zip), and I couldn't hit this except block.
What's the exception we're hoping to catch? Thinking we should be more specific and explicitly only except
on that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT:
Not sure how I missed it but it looks like we're actually trying to convert unicode to ASCII here. I think I might have added this try/except block before realizing the zipfiles's read method was returning bytes and that's why I was getting an exception. I will remove the try except because I'm calling .decode()
on the bytestring earlier
} | ||
|
||
|
||
def build_column_mapping(import_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I have enough context to review the paths, so just reviewing this method - this file looks good 👍
# TODO: get the custom mapping for the organization | ||
try: | ||
with transaction.atomic(): | ||
raw_property_states = PropertyState.objects.filter(id__in=ids).only('extra_data').iterator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I haven't used this before, but looking into iterator() on a QS, sounds like this helps speed things up 👍
p_status, property_state, property_view, messages = building_file.process(org.id, import_file.cycle) | ||
if not p_status or len(messages.get('errors', [])) > 0: | ||
# failed to create the property, save the messages and skip this file | ||
progress_data.add_file_info(os.path.basename(filename), messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this pattern of saving the file info to the progress_data 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: 'mapping_xml', | ||
url: '/data/mapping_xml/{importfile_id:int}', | ||
templateUrl: static_url + 'seed/partials/mapping_xml.html', | ||
controller: 'mapping_controller', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely good for now - just sharing some thoughts on reusing the mapping_controller.
If the xml-specific controller logic expands, we might want to break this out to use it's own controller so we can avoid hitting the API for something like cycles
just to avoid bugs.
EDIT: I could see an argument for using this approach, since there's mapping preset logic we might want from the mapping_controller once we get this on develop. Definitely worth thinking about later
<tbody id="mapped-table"> | ||
<tr ng-repeat="col in mappings"> | ||
<td style="text-align: right;" ng-class="{'danger': col.is_duplicate || col.suggestion === ''}" ng-attr-id="mapped-row-type-{$:: $index $}"> | ||
<select ng-model="col.suggestion_table_name" ng-change="updateInventoryTypeDropdown(); change(col)" ng-disabled="true"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this thinking that this file is basically a copy of the existing mapping partial except the ng-disabled="true" was added on these table items.
Given we're not sure what'll need later, so I think keeping all of this is fine.
Any background context you want to provide?
Previously BSync files were imported at their own endpoint and parsed in-process (ie blocking).
What's this PR do?
Moves the BSync backend import flow to use the flow of other files (ie background tasks). Additionally it adds a mapping page for buildingsync files, though it doesn't currently do anything other than show the mapping from column name to an xpath.
How should this be manually tested?
Go through this flow with these different files from the directory
seed/building_sync/tests/data
:ex_1.xml
valid_xml_ex1_ex2.zip
ex_1_no_street_address.xml
What are the relevant tickets?
#2137