Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create New Index For BHA #34663

Merged
merged 11 commits into from
May 23, 2024
Merged

Create New Index For BHA #34663

merged 11 commits into from
May 23, 2024

Conversation

AmitPhulera
Copy link
Contributor

Product Description

This PR is cherrypicked from #34602 and commits were amended to be more chronological and were merged wherever necessary.
This PR extracts out the index creation part and adapter boilerplate from the mentioned PR.

I have updated the migrations to be run only on SaaS environments.

Feature Flag

NA

Safety Assurance

Tested migration locally by applying it and unapplying it, it worked fine.

Safety story

Does not alter anything in HQ just adds a new index that would be created when this will be deployed.

Automated test coverage

NA

QA Plan

NA

Migrations

  • The migrations in this code can be safely applied first independently of the code

Rollback instructions

The migrations in the PR should be rolled back before this PR is reverted.

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@dimagimon dimagimon added the reindex/migration Reindex or migration will be required during or before deploy label May 22, 2024
@AmitPhulera
Copy link
Contributor Author

An unrelated Bootstrap 5 test is failing on the PR.

Copy link
Contributor

@gherceg gherceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the simple commits! (Sounds like the failing test will need to be resolve on a separate PR)

Comment on lines +29 to +31
def _reverse(apps, schema_editor):
if settings.IS_SAAS_ENVIRONMENT:
DeleteOnlyIfIndexExists(index_runtime_name('case-search-bha-2024-05-10')).run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +30 to +33
doc_id, source = self.from_python(doc)
self._verify_doc_id(doc_id)
self._verify_doc_source(source)
self._index(doc_id, source, refresh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the "sub indices" framing you added below. If the parent can access the relationship in reverse, that might also be a nice way to fork migrations and writes, like the main index method could do something like this:

        doc_id, source = self.from_python(doc)
        self._verify_doc_id(doc_id)
        self._verify_doc_source(source)
        self._index(doc_id, source, refresh)
        for sub_index in self.get_sub_indices():
            sub_index._index(doc_id, source, refresh)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it would have made sense if we would have to always multiplex the writes. The writes are conditional based on domain and if the multiplex settings are configured. So it would look something like -

        self._index(doc_id, source, refresh)
        for sub_index in self.get_sub_indices():
            if should_multiplex(source['domain']):
                sub_index._index(doc_id, source, refresh)
            

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like something like that - I was thinking the sub index should be the thing that decides whether to write or reject a doc, so that could be built in to its implementation of _index, or it could provide a is_relevant method or some such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense. Another thought, the current implementation can utilize bulk method to write in both the indices.
The sub_indices approach would require us to make multiple queries which should be fine I think but definitely would be cleaner and more segregated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a good point - I do think bulk writes make sense too. That could be done by treating sub indices as contributors, something like:

bulk_payload = self._get_payload(doc_id, source)
for index in sub indices:
    payload = index._get_payload(doc_id, source) 
    if payload:
        bulk_payload.extend(payload)



def _create_bha_index_on_saas_envs(apps, schema_editor):
if settings.IS_SAAS_ENVIRONMENT:
Copy link
Contributor

@esoergel esoergel May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could've sworn India was also a SaaS environment, but it looks like this is actually just prod + staging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I also used to think that way until I stumbled upon it 😅

@AmitPhulera
Copy link
Contributor Author

@esoergel @gherceg Requesting a re-review, merged against master to fix the failing test.

@gherceg
Copy link
Contributor

gherceg commented May 22, 2024

Looks like there was a mismatch between using parent_index_name on the BHA adapter and parent_index_cname on the base adapter. I updated to use parent_index_name since the comment on the base adapter explains it is the cname.

@AmitPhulera
Copy link
Contributor Author

AmitPhulera commented May 23, 2024

Looks like there was a mismatch between using parent_index_name on the BHA adapter and parent_index_cname on the base adapter. I updated to use parent_index_name since the comment on the base adapter explains it is the cname.

Thanks @gherceg. I wanted to go with cname instead of index name because index name will change when we update index multiplexing settings and cname would remain same. I have updated the variable name to reflect the same.

Copy link
Member

@sravfeyn sravfeyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AmitPhulera
Copy link
Contributor Author

This has been rolled out on staging without any issues. Just to keep in mind to enable cluster routing before rolling this out on prod.

@AmitPhulera AmitPhulera merged commit 8d9ddd6 into master May 23, 2024
13 checks passed
@AmitPhulera AmitPhulera deleted the ap/operation-to-create-new-index branch May 23, 2024 08:10
@mkangia
Copy link
Contributor

mkangia commented May 27, 2024

All follow up steps were done on this as noted here,

  1. config updated
  2. services restarted
  3. backfilled data in next indices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reindex/migration Reindex or migration will be required during or before deploy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants