-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the Vector Store Module #5
Labels
feature
New feature
Comments
este6an13
added a commit
that referenced
this issue
Mar 22, 2024
- add funcs signatures to `vector_store` module - remove unused imports from `llm_client` unit tests #34
este6an13
added a commit
that referenced
this issue
Mar 22, 2024
- add funcs signatures to `vector_store` module - remove unused imports from `llm_client` unit tests #34
Aliriio
added a commit
to Aliriio/reprebot
that referenced
this issue
Mar 25, 2024
- Implemented functions to load and chunk documents - Developed setup function for vector database, allowing for persistence and metadata handling - Added functionality to reset the vector database - Configured retriever for accessing vectors from the database"
este6an13
pushed a commit
that referenced
this issue
Mar 26, 2024
- Implemented functions to load and chunk documents - Developed setup function for vector database, allowing for persistence and metadata handling - Added functionality to reset the vector database - Configured retriever for accessing vectors from the database"
este6an13
added a commit
that referenced
this issue
Mar 27, 2024
- add `vector_db_path` to `setup_vector_database` function - handle zero documents case
este6an13
added a commit
that referenced
this issue
Mar 27, 2024
- add `vector_db_path` to `setup_vector_database` function - handle zero documents case
este6an13
added a commit
that referenced
this issue
Mar 28, 2024
- move `setup_retriever` to `vector_store` module - decompose `vector_store` module in smaller functions - add logic to load documents from `data` folder - add `CONTEXT_DATA_PATH` constant to reuse it - add `VECTOR_DATABASE_PATH` constant to reuse it - add `context` to prompt messages
este6an13
added a commit
that referenced
this issue
Mar 28, 2024
- move `setup_retriever` to `vector_store` module - decompose `vector_store` module in smaller functions - add logic to load documents from `data` folder - add `CONTEXT_DATA_PATH` constant to reuse it - add `VECTOR_DATABASE_PATH` constant to reuse it - add `context` to prompt messages - skip llm client tests temporarily
este6an13
added a commit
that referenced
this issue
Mar 28, 2024
- move `setup_retriever` to `vector_store` module - decompose `vector_store` module in smaller functions - add logic to load documents from `data` folder - add `CONTEXT_DATA_PATH` constant to reuse it - add `VECTOR_DATABASE_PATH` constant to reuse it - add `context` to prompt messages - skip llm client tests temporarily
este6an13
added a commit
that referenced
this issue
Mar 30, 2024
- add `VectorStoreConfig` type - add `vector_store_config` param to `query` - add `setup_full_retriever` function - add `setup_embeddings` function - re-enable llm client unit tests
este6an13
added a commit
that referenced
this issue
Mar 30, 2024
- add `VectorStoreConfig` type - add `vector_store_config` param to `query` - add `setup_full_retriever` function - add `setup_embeddings` function - re-enable llm client unit tests
este6an13
added a commit
that referenced
this issue
Apr 3, 2024
- the vector db was not working for me because I had a cached version that didn't include any document - removing the `vectordb` folder and running the script again fixed the issue - a `main.py` file with some example queries was added for users to try some questions
este6an13
added a commit
that referenced
this issue
Apr 4, 2024
- add `format_text` function to `faculty_secretary_faq` context builder script to remove extra spaces - modify llm client prompt to ask the model to include URLs if it find any #5 - explore `get_relevant_documents` function to get information about sources - add more example queries to `main.py`
este6an13
added a commit
that referenced
this issue
Apr 4, 2024
- add `format_text` function to `faculty_secretary_faq` context builder script to remove extra spaces - modify unit tests accordingly - modify llm client prompt to ask the model to include URLs if it find any #5 - explore `get_relevant_documents` function to get information about sources - add more example queries to `main.py`
este6an13
added a commit
that referenced
this issue
Apr 4, 2024
- add `format_text` function to `faculty_secretary_faq` context builder script to remove extra spaces - modify unit tests accordingly - modify llm client prompt to ask the model to include URLs if it find any #5 - explore `get_relevant_documents` function to get information about sources - add more example queries to `main.py`
este6an13
added a commit
that referenced
this issue
Apr 7, 2024
- add `database` module to implement simple `sqlite3` database - add `_push_metadata` function to `vector_store` module - note: the purpose of this is to store the vector db id of each document and its filename to be able to map both ids. This would be useful for a CRUD of the vector store and the context data that allows us to get, add, delete and update documents in the vector store with the help of this simple SQL table which works as a simple map between the id and the filename. We store the `group_id` too which is the equivalent of the context folder (ie: `faculty_secretary_faq`)
este6an13
added a commit
that referenced
this issue
Apr 7, 2024
- add `database` module to implement simple `sqlite3` database - add `_push_metadata` function to `vector_store` module - note: the purpose of this is to store the vector db id of each document and its filename to be able to map both ids. This would be useful for a CRUD of the vector store and the context data that allows us to get, add, delete and update documents in the vector store with the help of this simple SQL table which works as a simple map between the id and the filename. We store the `group_id` too which is the equivalent of the context folder (ie: `faculty_secretary_faq`)
este6an13
added a commit
that referenced
this issue
Apr 13, 2024
- add `document?filename` endpoint to api - add `document?document_id` endpoint to api - rename query endpoint to be `query?q`
este6an13
added a commit
that referenced
this issue
Apr 13, 2024
- add `document?filename` endpoint to api - add `document?document_id` endpoint to api - rename query endpoint to be `query?q`
este6an13
added a commit
that referenced
this issue
Apr 13, 2024
- add `document?filename` endpoint to api - add `document?document_id` endpoint to api - rename query endpoint to be `query?q`
este6an13
added a commit
that referenced
this issue
Apr 13, 2024
- add `delete_document` function - add `update_document` function - add `delete` api endpoint - add `put` api endpoint
este6an13
added a commit
that referenced
this issue
Apr 13, 2024
- add `delete_document` function - add `update_document` function - add `delete` api endpoint - add `put` api endpoint
este6an13
added a commit
that referenced
this issue
Apr 14, 2024
- add `document_post` api endpoint - add db `delete` operation - add db `update_filename` operation - add db `get_filename_by_id` operation - add db `get_group_id_by_id` operation - complete `delete_document` function - complete `update_document` function - add `_write_file` private function to `vector_store` - add `add_document` function to `vector_store`
este6an13
added a commit
that referenced
this issue
Apr 14, 2024
- add `document_post` api endpoint - add db `delete` operation - add db `update_filename` operation - add db `get_filename_by_id` operation - add db `get_group_id_by_id` operation - complete `delete_document` function - complete `update_document` function - add `_write_file` private function to `vector_store` - add `add_document` function to `vector_store`
este6an13
added a commit
that referenced
this issue
Apr 18, 2024
- add `DocumentResponse` to format vector store get endpoints reponses - define the return values for the other endpoints of the api
este6an13
added a commit
that referenced
this issue
Apr 18, 2024
- add `DocumentResponse` to format vector store get endpoints reponses - define the return values for the other endpoints of the api
este6an13
added a commit
that referenced
this issue
Apr 18, 2024
- create `crud` submodule inside `vector_store` module - move crud operations to the new `crud` submodule
este6an13
added a commit
that referenced
this issue
Apr 18, 2024
- create `crud` submodule inside `vector_store` module - move crud operations to the new `crud` submodule
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Goal
Implement a module that receives TXT or PDF documents already processed and ready to be embedded and added to the vector database. It should contain function to chunk the documents. A function to reset the vector database.
The documents should be inserted with metadata like an ID for example, so that it's possible to remove them or update them.
Notes
In the future, some sort of interface can be implemented so that special users can see the contents of the database and update the records as if it were a CRUD.
The text was updated successfully, but these errors were encountered: