Performance boost for bulk delete, and additional abstraction in utils #99

afrancis13 · 2015-08-05T23:00:10Z

I added bulk delete functionality around a couple weeks ago, but I noticed that this could be done with significantly improved efficiency. Instead of serializing the entire document over batches, you can just specify the primary key for elasticsearch (often integers) along with the operation type ('delete') and delete in that fashion (see streaming_bulk in https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/helpers/__init__.py)
Since things were getting a little cluttered inside update, I pulled some sub-functionality out of there, put them in helper functions, and added documentation. I'll comment on one other thing below.

afrancis13 · 2015-08-05T23:04:18Z

bungiesearch/utils.py

+def filter_model_items(index_instance, model_items, model_name, start_date, end_date):
+    ''' Filters the model items queryset based on start and end date.'''
+    if index_instance.updated_field is None:
+        logging.warning("No updated date field found for {} - not restricting with start and end date".format(model_name))


This is a logger warning in Haystack instead of a hard error, and I think we should follow suit. The problem with the ValueError from before was that it was difficult to apply the update functionality across varied types of model indices. For example, if you want to apply update with start and end dates over all your model indices, you need to split the ones with update_field from the ones without in some way. This is possible for users like us, but seems a bit more cumbersome than necessary. Let me know what you think about this!

Yes, that's a valid point.

ChristopherRabotin · 2015-08-06T09:05:48Z

tests/core/test_bungiesearch.py

@@ -279,7 +279,10 @@ def test_time_indexing(self):
        except Exception as e:


While working on this test, can we remove the exception handling here and let the testing framework catch it. Instead of an assertion fail, it's probably better for the test to raise the exception. Hence, this function would become:

def test_time_indexing(self): update_index(Article.objects.all(), 'Article', start_date=datetime.strftime(datetime.now(), '%Y-%m-%d %H:%M')) update_index(NoUpdatedField.objects.all(), 'NoUpdatedField', end_date=datetime.strftime(datetime.now(), '%Y-%m-%d'))

What do you think?

Yes, I think this would be a better way of writing this, now that you mention it.

Okay, I'll fix that up soon then.

Performance boost for bulk delete, and additional abstraction in utils

afrancis13 reviewed Aug 5, 2015
View reviewed changes

Performance boost for bulk delete, and additional abstraction in utils

0d1d5c0

afrancis13 force-pushed the bulk_delete branch from b2952a6 to 0d1d5c0 Compare August 5, 2015 23:09

ChristopherRabotin reviewed Aug 6, 2015
View reviewed changes

ChristopherRabotin added a commit that referenced this pull request Aug 6, 2015

Merge pull request #99 from afrancis13/bulk_delete

4df82e7

Performance boost for bulk delete, and additional abstraction in utils

ChristopherRabotin merged commit 4df82e7 into ChristopherRabotin:master Aug 6, 2015

afrancis13 pushed a commit to afrancis13/bungiesearch that referenced this pull request Aug 6, 2015

Merge pull request ChristopherRabotin#99 from afrancis13/bulk_delete

0c5b4b6

Performance boost for bulk delete, and additional abstraction in utils

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance boost for bulk delete, and additional abstraction in utils #99

Performance boost for bulk delete, and additional abstraction in utils #99

afrancis13 commented Aug 5, 2015

afrancis13 Aug 5, 2015

ChristopherRabotin Aug 6, 2015

ChristopherRabotin Aug 6, 2015

afrancis13 Aug 6, 2015

ChristopherRabotin Aug 6, 2015

		@@ -279,7 +279,10 @@ def test_time_indexing(self):
		except Exception as e:

Performance boost for bulk delete, and additional abstraction in utils #99

Performance boost for bulk delete, and additional abstraction in utils #99

Conversation

afrancis13 commented Aug 5, 2015

afrancis13 Aug 5, 2015

Choose a reason for hiding this comment

ChristopherRabotin Aug 6, 2015

Choose a reason for hiding this comment

ChristopherRabotin Aug 6, 2015

Choose a reason for hiding this comment

afrancis13 Aug 6, 2015

Choose a reason for hiding this comment

ChristopherRabotin Aug 6, 2015

Choose a reason for hiding this comment