Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasti engine #50

Merged
merged 30 commits into from
Oct 18, 2023
Merged

Tasti engine #50

merged 30 commits into from
Oct 18, 2023

Conversation

ttt-77
Copy link
Collaborator

@ttt-77 ttt-77 commented Oct 10, 2023

No description provided.

ttt-77 and others added 15 commits September 27, 2023 01:15
# Conflicts:
#	aidb/engine/base_engine.py
# Conflicts:
#	aidb/query/query.py
#	aidb/utils/constants.py
#	aidb/vector_database/chroma_vector_database.py
#	aidb/vector_database/faiss_vector_database.py
#	aidb/vector_database/tasti.py
#	aidb/vector_database/vector_database.py
#	aidb/vector_database/weaviate_vector_database.py
#	tests/tasti_test/tasti_test.py
@ttt-77
Copy link
Collaborator Author

ttt-77 commented Oct 10, 2023

Could you review this branch? @ddkang

aidb/config/config_types.py Outdated Show resolved Hide resolved
aidb/engine/tasti_engine.py Outdated Show resolved Hide resolved
@ddkang
Copy link
Owner

ddkang commented Oct 10, 2023

I think part of the reason this PR is a bit confusing to review is that it isn't connected to any downstream query. I think it would be best to merge this with an example of a downstream query optimization. I think there are a few options:

  • Implement a limit query optimization, where the records are searched by how close they are to cluster representatives that match the predicate
  • Implement the optimization where the inference services are ordered for a select query with a complex predicate
  • Wait for the approximate aggregation PR to be merged and implement control variates

What do you think? @ttt-77

@ttt-77
Copy link
Collaborator Author

ttt-77 commented Oct 10, 2023

Yes, when I was writing this PR, I was not so clear about what should be return from TASTI engine and how it can be combined better with specific query. I checked previous AIDB code and TASTI paper many times. I think in previous code, the proxy scores are only used to order the inference service. Maybe I am missing something. I will consider the first and second options and determine whether any changes need to be made to the current PR

@ttt-77
Copy link
Collaborator Author

ttt-77 commented Oct 10, 2023

And maybe I can also add a design document in Wiki page?

@ddkang
Copy link
Owner

ddkang commented Oct 10, 2023

A design doc sounds good. I'd like to see an actual implementation of 1, 2, or 3 in this PR, otherwise it's a bit hard to tell.

@ttt-77
Copy link
Collaborator Author

ttt-77 commented Oct 10, 2023

I will do that

@ttt-77
Copy link
Collaborator Author

ttt-77 commented Oct 13, 2023

@ddkang Could you please review the new commit? And here is the design document, https://github.com/ddkang/aidb-new/wiki/TASTI-Engine-%E2%80%90-Design-Document

aidb/engine/limit_engine.py Outdated Show resolved Hide resolved
aidb/engine/tasti_engine.py Outdated Show resolved Hide resolved
@ttt-77
Copy link
Collaborator Author

ttt-77 commented Oct 15, 2023

This PR is ready for review @ddkang

aidb/query/query.py Outdated Show resolved Hide resolved
aidb/engine/engine.py Outdated Show resolved Hide resolved
aidb/engine/limit_engine.py Outdated Show resolved Hide resolved
aidb/engine/tasti_engine.py Show resolved Hide resolved
aidb/engine/base_engine.py Outdated Show resolved Hide resolved
aidb/engine/base_engine.py Outdated Show resolved Hide resolved
@akash17mittal
Copy link
Collaborator

@ttt-77 Just to be sure that the proxy score implementation and all is correct. I would also suggest that you compare the number of inference service calls for limit queries with these proxy scores and some adversarial proxy score (maybe you can test with (1 - proxy_score)). If the implementation is correct, number of inference service calls will be fewer in case of perfect proxy scores.

@ttt-77
Copy link
Collaborator Author

ttt-77 commented Oct 16, 2023

Just to be sure that the proxy score implementation and all is correct. I

Currently, the embedding in vector database are generated randomly. I think we can't test it now. Could you please move your suggestion to issue #54 ? We can check it later.

@akash17mittal
Copy link
Collaborator

Just to be sure that the proxy score implementation and all is correct. I

Currently, the embedding in vector database are generated randomly. I think we can't test it now. Could you please move your suggestion to issue #54 ? We can check it later.

Yes. We have jackson dataset with the embeddings. Surely, we can test it later. I mean from a code execution point of view, it is alright. But for proxy scores correctness etc, is there a better way to test rather than just reading code?

aidb/engine/base_engine.py Outdated Show resolved Hide resolved
aidb/engine/base_engine.py Outdated Show resolved Hide resolved
aidb/engine/base_engine.py Outdated Show resolved Hide resolved
akash17mittal
akash17mittal previously approved these changes Oct 18, 2023
@ddkang ddkang merged commit e134584 into main Oct 18, 2023
@ddkang ddkang deleted the tasti_engine branch October 18, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants