Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a --nocommit arg to the update_index, clear_index and rebuild_index management command. #1090

Merged
merged 1 commit into from Jan 14, 2015

Conversation

andrewschoen
Copy link
Contributor

This is helpful when doing bulk indexing and you'd rather have the backend handle commits, not the management command. We had an instance where the amount of hard commits passed by update_index during a bulk index was causing stability issues with our solr cluster that was already doing soft and hard commits often.

I attempted to add this flag to clear_index and the --remove option of update_index, but it seems that solr was not respecting commit=false for those two code paths. I can look into it some more, but wasn't able to find a solution.

@domenkozar
Copy link
Contributor

@acdha anything missing here? Tests are failing on pypy, but that seems to be because of pypy upgrade on travis-ci side

@andrewschoen
Copy link
Contributor Author

@iElectric it looks like those tests are failing on all of the recent PRs

@domenkozar
Copy link
Contributor

indeed. cc @toastdriven

@acdha
Copy link
Contributor

acdha commented Oct 24, 2014

Yeah, Travis appears to have recently upgraded PyPy and since then every build has been failing.

@acdha
Copy link
Contributor

acdha commented Oct 24, 2014

I'm +1 on merging this

@andrewschoen
Copy link
Contributor Author

@acdha any clue why the test in this patch would fail? Add the same commit flag to --remove of update_index doesn't seem to work either.

https://gist.github.com/andrewschoen/30884fab2ec54057dc13

@acdha
Copy link
Contributor

acdha commented Oct 24, 2014

@andrewschoen I noticed that both the Whoosh and ElasticSearch backends do not appear to implement commit support for clear().

Whoosh always commits on update and remove:
https://github.com/toastdriven/django-haystack/blob/2972b2dcced6ca5b4ed75f2aa579da14c15eae04/haystack/backends/whoosh_backend.py#L218-L220

https://github.com/toastdriven/django-haystack/blob/2972b2dcced6ca5b4ed75f2aa579da14c15eae04/haystack/backends/whoosh_backend.py#L222-L235

I'm wondering whether this test should actually hit the backends or just do a mock to confirm that backend.clear is called with commit=False

@andrewschoen
Copy link
Contributor Author

@acdha Solr should support it. You're right, I could try to write a mocked test though.

@acdha
Copy link
Contributor

acdha commented Oct 24, 2014

@andrewschoen Solr definitely does support it – I was wondering whether alternative would be something like a Python warning when the ES/Whoosh/simple backends are called with a non-default commit value in case someone has built logic around commits not happening, but that seems a bit crazy since anything can trigger one.

@acdha
Copy link
Contributor

acdha commented Oct 24, 2014

For posterity, the test code works great but there's an unexpected feature in clear which calls self.conn.optimize, which does a commit:

https://github.com/toastdriven/django-haystack/blob/2972b2dcced6ca5b4ed75f2aa579da14c15eae04/haystack/backends/solr_backend.py#L107

I think we should just wrap that in if commit:
https://gist.github.com/d81613fdce534f2dd3ad

@andrewschoen andrewschoen changed the title Adds a --nocommit arg to the update_index management command. Adds a --nocommit arg to the update_index, clear_index and rebuild_index management command. Oct 24, 2014
@andrewschoen
Copy link
Contributor Author

This should now work for the update_index, clear_index and rebuild_index commands. I've tried to make the docs clear on what backends support the --nocommit flag.

@andrewschoen
Copy link
Contributor Author

@acdha Anything else I need to do to get this ready to merge?

@@ -231,9 +235,9 @@ def update_backend(self, label, using):
end = min(start + batch_size, total)

if self.workers == 0:
do_update(backend, index, qs, start, end, total, self.verbosity)
do_update(backend, index, qs, start, end, total, self.verbosity, self.commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could commit (and verbosity) be passed as keyword args here and below to avoid having quite so many positional arguments? Other than that, this looks ready to go.

@andrewschoen
Copy link
Contributor Author

@acdha Thanks for the review. I've got those kwargs added to both do_update and do_remove now.

@domenkozar
Copy link
Contributor

Failures on travis do not seem to be related to this PR.

@domenkozar
Copy link
Contributor

@acdha @toastdriven anything left to do before merging?

@acdha
Copy link
Contributor

acdha commented Nov 5, 2014

I don't believe so

@domenkozar
Copy link
Contributor

@acdha let's merge it? cc @toastdriven

@domenkozar
Copy link
Contributor

Another friendly ping. PR has docs and tests.

acdha added a commit that referenced this pull request Jan 14, 2015
Adds a --nocommit arg to the update_index, clear_index and rebuild_index management command.
@acdha acdha merged commit 3249597 into django-haystack:master Jan 14, 2015
@domenkozar
Copy link
Contributor

🍻

acdha added a commit to acdha/django-haystack that referenced this pull request Jan 30, 2015
rebuild_index builds its option list by combining the options from
clear_index and update_index. This previously had a manual exclude list
for options which were present in both commands to avoid conflicts but
the nocommit option wasn't in that list.

This wasn't tested because our test suite uses call_command rather than
invoking the option parser directly.

This commit also adds tests to confirm that --nocommit will actually
pass commit=False to clear_index and update_index.

Closes django-haystack#1140
See django-haystack#1090
@viorels
Copy link

viorels commented Mar 6, 2015

This now crashes for some backends, for example xapian, as they do not understand a "commit" argument for the update command. i think "commit" should be passed as an argument only to Solr (who makes sense of it) and not the other backends.

https://github.com/django-haystack/django-haystack/blame/master/haystack/management/commands/update_index.py#L91

Should I open a new bug/patch for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants