-
Notifications
You must be signed in to change notification settings - Fork 0
Use 181 read embeddings with tda #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e85e7be
883d598
016f6fb
2bfa827
57c8aa7
fe79bd9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| # CODEOWNERS file (from GitHub template at | ||
| # https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners) | ||
| # Each line is a file pattern followed by one or more owners. | ||
|
|
||
| ################################################################################ | ||
| # These owners will be the default owners for everything in the repo. This is commented | ||
| # out in favor of using a team as the default (see below). It is left here as a comment | ||
| # to indicate the primary expert for this code. | ||
| # * @adamshire123 | ||
|
|
||
| # Teams can be specified as code owners as well. Teams should be identified in | ||
| # the format @org/team-name. Teams must have explicit write access to the | ||
| # repository. | ||
| * @mitlibraries/dataeng | ||
|
|
||
| # We set the senior engineer in the team as the owner of the CODEOWNERS file as | ||
| # a layer of protection for unauthorized changes. | ||
| /.github/CODEOWNERS @ghukill |
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| interactions: | ||
| - request: | ||
| body: null | ||
| headers: | ||
| content-type: | ||
| - application/json | ||
| user-agent: | ||
| - opensearch-py/2.8.0 (Python 3.12.11) | ||
| method: GET | ||
| uri: http://localhost:9200/_cat/aliases?format=json | ||
| response: | ||
| body: | ||
| string: '[{"alias":"all-current","index":"libguides-2025-12-11t16-36-09","filter":"-","routing.index":"-","routing.search":"-","is_write_index":"-"},{"alias":"libguides","index":"libguides-2025-12-11t16-36-09","filter":"-","routing.index":"-","routing.search":"-","is_write_index":"-"},{"alias":"all-current","index":"test-index-2025-12-11t16-58-08","filter":"-","routing.index":"-","routing.search":"-","is_write_index":"-"},{"alias":"test-index","index":"test-index-2025-12-11t16-58-08","filter":"-","routing.index":"-","routing.search":"-","is_write_index":"-"},{"alias":".kibana","index":".kibana_1","filter":"-","routing.index":"-","routing.search":"-","is_write_index":"-"}]' | ||
| headers: | ||
| content-length: | ||
| - '671' | ||
| content-type: | ||
| - application/json; charset=UTF-8 | ||
| status: | ||
| code: 200 | ||
| message: OK | ||
| version: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| interactions: | ||
| - request: | ||
| body: '{"update":{"_id":"i-am-not-found","_index":"test-index"}} | ||
|
|
||
| {"doc":{"timdex_record_id":"i-am-not-found","title":"Materials Science & Engineering | ||
| (UPDATED)"}} | ||
|
|
||
| ' | ||
| headers: | ||
| Content-Length: | ||
| - '156' | ||
| content-type: | ||
| - application/json | ||
| user-agent: | ||
| - opensearch-py/2.8.0 (Python 3.12.11) | ||
| method: POST | ||
| uri: http://localhost:9200/_bulk | ||
| response: | ||
| body: | ||
| string: '{"took":9,"errors":true,"items":[{"update":{"_index":"test-index-2025-12-11t16-58-08","_id":"i-am-not-found","status":404,"error":{"type":"document_missing_exception","reason":"[i-am-not-found]: | ||
| document missing","index":"test-index-2025-12-11t16-58-08","shard":"0","index_uuid":"in04_JvQS5qqCvUXeZta_g"}}}]}' | ||
| headers: | ||
| content-length: | ||
| - '308' | ||
| content-type: | ||
| - application/json; charset=UTF-8 | ||
| status: | ||
| code: 200 | ||
| message: OK | ||
| version: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| interactions: | ||
| - request: | ||
| body: '{"update":{"_id":"libguides:guides-175846","_index":"test-index"}} | ||
|
|
||
| {"doc":{"timdex_record_id":"libguides:guides-175846","title":"Materials Science | ||
| & Engineering (UPDATED)"}} | ||
|
|
||
| ' | ||
| headers: | ||
| Content-Length: | ||
| - '174' | ||
| content-type: | ||
| - application/json | ||
| user-agent: | ||
| - opensearch-py/2.8.0 (Python 3.12.11) | ||
| method: POST | ||
| uri: http://localhost:9200/_bulk | ||
| response: | ||
| body: | ||
| string: '{"took":7,"errors":false,"items":[{"update":{"_index":"test-index-2025-12-11t16-58-08","_id":"libguides:guides-175846","_version":4,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":7,"_primary_term":1,"status":200}}]}' | ||
| headers: | ||
| content-length: | ||
| - '245' | ||
| content-type: | ||
| - application/json; charset=UTF-8 | ||
| status: | ||
| code: 200 | ||
| message: OK | ||
| - request: | ||
| body: null | ||
| headers: | ||
| Content-Length: | ||
| - '0' | ||
| content-type: | ||
| - application/json | ||
| user-agent: | ||
| - opensearch-py/2.8.0 (Python 3.12.2) | ||
| method: POST | ||
| uri: http://localhost:9200/test-index/_refresh | ||
| response: | ||
| body: | ||
| string: '{"_shards":{"total":2,"successful":1,"failed":0}}' | ||
| headers: | ||
| content-length: | ||
| - '49' | ||
| content-type: | ||
| - application/json; charset=UTF-8 | ||
| status: | ||
| code: 200 | ||
| message: OK | ||
| version: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,7 @@ | |
| from tim.config import PRIMARY_ALIAS | ||
| from tim.errors import ( | ||
| AliasNotFoundError, | ||
| BulkOperationError, | ||
| IndexExistsError, | ||
| IndexNotFoundError, | ||
| ) | ||
|
|
@@ -532,3 +533,34 @@ def test_bulk_delete_logs_error_if_record_not_found( | |
| "Record to delete 'i-am-not-found' was not found in index 'test-index'." | ||
| in caplog.text | ||
| ) | ||
|
|
||
|
|
||
| @my_vcr.use_cassette("opensearch/bulk_update_updates_records.yaml") | ||
| def test_bulk_update_updates_records(test_opensearch_client): | ||
| updates = [ | ||
| { | ||
| "timdex_record_id": "libguides:guides-175846", | ||
| "title": "Materials Science & Engineering (UPDATED)", | ||
| } | ||
| ] | ||
| assert tim_os.bulk_update(test_opensearch_client, "test-index", iter(updates)) == { | ||
| "updated": 1, | ||
| "errors": 0, | ||
| "total": 1, | ||
| } | ||
|
|
||
|
|
||
| @my_vcr.use_cassette( | ||
| "opensearch/bulk_update_raises_bulk_operation_error_if_record_not_found.yaml" | ||
| ) | ||
| def test_bulk_update_raises_bulk_operation_error_if_record_not_found( | ||
| test_opensearch_client, | ||
| ): | ||
| updates = [ | ||
| { | ||
| "timdex_record_id": "i-am-not-found", | ||
| "title": "Materials Science & Engineering (UPDATED)", | ||
| } | ||
| ] | ||
| with pytest.raises(BulkOperationError): | ||
| tim_os.bulk_update(test_opensearch_client, "test-index", iter(updates)) | ||
|
Comment on lines
+553
to
+566
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ghukill Do you think this is the appropriate response? Essentially, once it hits an update for a record not found in the OpenSearch index, the This is the result of carrying over the pattern in The same pattern is currently implemented in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I say we leave it for now. I'd prefer to exit eagerly and often with explicit reasons why at first. Perhaps we'll find that we're okay with most documents getting embeddings even if there are some errors. But I think in the early days, it'll be nice to know if any records we are trying to update don't exist. FWIW, I do think the work in USE-273 which will limit to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, though Is this an issue? And re:
Is there a change request for this? Are you suggesting that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To your first question: I don't think it matters. We could move that log line into the Actually... that might be a nice option. Yes, I'd propose that! If the embeddings fail, it'd be nice to have a non-zero exit code. As for your second question, I just meant it'll be nice to know if we ever attempt to add an embedding for any record that doesn't exist. Specific ones aren't needed I don't think. I do think that USE-273 might be a good time to revisit some these things, when we see how the updated read methods change things (if at all).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Applied the change to exit the CLI with a non-zero exit code when the |
||
Uh oh!
There was an error while loading. Please reload this page.