New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add atomic/zero-downtime index rebuilding #358
Conversation
This resolves #314 |
Thanks for the Pull request, I will review it and add my concern into it. @saadmk11 Can you take a look as well? |
Looks like there is a similar PR #284 that does this as a part of it but has been stale for a long time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pull request 💯 , added some small suggestions.
@safwanrahman I have not tested this locally yet will do it when I get some time.
I tried out the changes in that PR, and unless I misunderstood what was documented, it didn't do exactly what I was looking for. Regardless, yea, it's been stale for a while. |
Co-authored-by: Maksudul Haque <saad.mk112@gmail.com>
Co-authored-by: Maksudul Haque <saad.mk112@gmail.com>
Co-authored-by: Maksudul Haque <saad.mk112@gmail.com>
Co-authored-by: Maksudul Haque <saad.mk112@gmail.com>
Co-authored-by: Maksudul Haque <saad.mk112@gmail.com>
This seems like a good approach. Though I was thinking to different way to achieve it, but it will achieve the purpose with minimal changes. I will test it locally before mege. I have added a little fix over. |
First of all, thanks for tackling this one @oehrlein About the continuous updates to an index issue, I think we can mitigate it running combined your solution with populate command, this way: First we run search_index --rebuild --atomic One caveat is that items inserted during the migration will be inconsistent for a brief moment, the items will not be available for search until we run the populate command. Another caveat is that we will have to write documents 2 times in this can be an performance bottleneck for bigger clusters. |
@safwanrahman Sounds good, I appreciate it! |
@antunesleo No problem! I think for now (and considering it doesn't require any changes to the code), the combination of those two commands works well enough. Of course, it would be good to eventually try to have it similar to the article I linked in my original comment. Although, it's admittedly not a need of mine at the moment, so I won't be able to get around to it for the time being. |
Codecov Report
@@ Coverage Diff @@
## master #358 +/- ##
==========================================
- Coverage 67.85% 58.90% -8.96%
==========================================
Files 12 12
Lines 756 910 +154
==========================================
+ Hits 513 536 +23
- Misses 243 374 +131
Continue to review full report at Codecov.
|
@safwanrahman @saadmk11 I made the changes for all your guys' suggestions, so it should all be good to merge! |
Thanks a lot @oehrlein for your patience and patch that is very good design. @saadmk11 Can you test it with any application and check if it is working perfectly? I am being too busy with personal things so maybe it would be better if you can test it and let me know. I will merge it after you test it. |
@safwanrahman No problem, and thanks! @saadmk11 Let me know if anything needs to be changed! |
Hi @oehrlein, Thanks for updating the PR with the requested changes. 💯 I tried to test these changes locally (by running I am getting this error when I run the command: Traceback (most recent call last):
File "manage.py", line 21, in <module>
main()
File "manage.py", line 17, in main
execute_from_command_line(sys.argv)
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/django_elasticsearch_dsl/management/commands/search_index.py", line 262, in handle
self._rebuild(models, options)
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/django_elasticsearch_dsl/management/commands/search_index.py", line 242, in _rebuild
options
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/django_elasticsearch_dsl/management/commands/search_index.py", line 179, in _update_alias
es_conn.indices.update_aliases({"actions": alias_actions})
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 301, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/elasticsearch/client/indices.py", line 612, in update_aliases
"POST", "/_aliases", params=params, headers=headers, body=body
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/elasticsearch/transport.py", line 458, in perform_request
raise e
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/elasticsearch/transport.py", line 426, in perform_request
timeout=timeout,
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 287, in perform_request
self._raise_error(response.status, raw_data)
File "/home/saad/.virtualenvs/test_project/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 338, in _raise_error
status_code, error_message, additional_info
elasticsearch.exceptions.RequestError: RequestError(400, 'invalid_alias_name_exception', 'Invalid alias name [some_index_name], an index exists with the same name as the alias') (I think this is happening because we are trying to create an alias with the same name of the index. As the index name does not have any timestamp/uniqueness on the current implementation it is conflicting with the alias name) But If I delete the index on ElasticSearch and then run the rebuild command it works as expected because the newly created index name has timestamp/uniqueness according to the PR implementation. Please let me know If I need to add/update anything to test the changes of this PR. :) |
@saadmk11 No problem! Thanks for catching this error! I ran into it before, but forgot about it before my original commits. I made adjustments to fix the issue. Now, when we want an alias to have the name of an index that already exists, it deletes the existing index as well. Likewise, I made changes so that when we want an index to have the name of an alias that already exists, it deletes the indices pointed at the alias to free up the name. I also made adjustments to account for deleting indices that were created with aliases (after running In addition, I added additional stdout messages in case of doing something alias-specific, but only having regular indices, or vice-versa. For example, trying to delete regular indices after creating indices with an alias (after running Take a look at the changes when you get a chance, and let me know if you have any suggestions and/or think there should be changes! |
@saadmk11 Thanks for the kind words and the help + suggestions! I took your suggestion regarding the There's just 2 suggestions that I'm looking for your response for - everything else should be good! :) |
Thanks for adding this. I would like to know if @safwanrahman has any thoughts on this. |
@saadmk11 No problem! @safwanrahman Let us know! |
I need to take a deeper look into the PR again. i was on a long vacation recently, so it may take some days to catch things up. I am sorry about that. I will reply the queries once I read the PR. |
@safwanrahman Sounds good, and no worries! Hopefully the vacation was good :) |
@safwanrahman Any luck at being able to take a look at this? |
@safwanrahman Just wanted to follow up again - any luck at being able to take a look at this? |
We're really like to see this too @safwanrahman if you are able to review. Thank you for your time |
Hi, any news about the merge of this really useful feature? |
i found that this source code does not erase new indexes when an error occurs in process of creating alias |
Would be great to see this PR merged!! @oehrlein any chance you are considering merging this? |
@phpaulin I definitely would like to, but it's beyond me unfortunately. @safwanrahman @saadmk11 Any chance you guys will have time soon to look at this again and merge it? |
please add it @safwanrahman , it's very useful for production deployments |
Yes, it would be greatly helpful for our team also @safwanrahman . |
@oehrlein Thanks a ton for the PR. I have been quite busy for some months so could not verify to merge it. Finally, I got some time to check the code and it seems to be good from my end. I have merged it now! 💯 Thanks again for your patience. |
Incredible, thanks so much @safwanrahman and @oehrlein |
@safwanrahman No problem, and thanks for taking a look at it and merging it! @violuke No problem! |
@safwanrahman will the release be bumped to include this soon? i'd love to use this functionality! |
@oehrlein thank you for doing this, amazing work! is it possible to update the PR description with the correct management command names and also have the proper documentation for this new feature? (pending on the new version) |
@oehrlein Thank you for the PR. I have a lot of data to index, what will happen if there is new data during the indexing process? Will that data be stored in the new index (when use --use-alias) or not indexed? Because average time to populate takes about 20 min. |
@zillasaur Firstly, my apologies for the long delay to get back to you. I updated the PR description with the correct management commands. I also added documentation for the management commands and submitted a PR (#446) for it. |
@fizikurniawan Firstly, my apologies for the long delay to get back to you. To answer your question, I'm not sure. Ultimately, that's more of a question of how populating an Elasticsearch index works, considering populating an index with or without aliases works the same. |
These changes allow for atomic/zero-downtime index rebuilding using aliases.
There are two new management commands:
--use-alias
and--use-alias-keep-index
, both which only work with the--rebuild
command. If--use-alias-keep-index
is used with--use-alias
, the indices previously aliased will not be deleted. If--use-alias-keep-index
isn't used, the old indices will be deleted.Regarding naming, the defined index name in
documents.py
becomes the alias name, and indices take the alias name plus a suffix (hyphen + current datetime). I took example from this.2 big caveats:
documents.py
:For this to work, remember to add
from fnmatch import fnmatch
to your imports indocuments.py
. Also, replaceALIAS
with your alias name, or just make that argument one string, e.g.'my-alias-*'
.As noted in the example:
I didn't get around to writing tests or adding to the documentation. With all that being said, please let me know if you have questions and/or suggestions!
EDIT: Please note that caveat 2 has been taken care of in ce05715, so it is no longer an issue.