Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Vector Store Module #5

Closed
este6an13 opened this issue Mar 3, 2024 · 0 comments
Closed

Implement the Vector Store Module #5

este6an13 opened this issue Mar 3, 2024 · 0 comments
Labels
feature New feature

Comments

@este6an13
Copy link
Contributor

este6an13 commented Mar 3, 2024

Goal

Implement a module that receives TXT or PDF documents already processed and ready to be embedded and added to the vector database. It should contain function to chunk the documents. A function to reset the vector database.

The documents should be inserted with metadata like an ID for example, so that it's possible to remove them or update them.

Notes

In the future, some sort of interface can be implemented so that special users can see the contents of the database and update the records as if it were a CRUD.

@este6an13 este6an13 added the feature New feature label Mar 9, 2024
este6an13 added a commit that referenced this issue Mar 22, 2024
- add funcs signatures to `vector_store` module
- remove unused imports from `llm_client` unit tests #34
este6an13 added a commit that referenced this issue Mar 22, 2024
- add funcs signatures to `vector_store` module
- remove unused imports from `llm_client` unit tests #34
Aliriio added a commit to Aliriio/reprebot that referenced this issue Mar 25, 2024
- Implemented functions to load and chunk documents
- Developed setup function for vector database, allowing for persistence and metadata handling
- Added functionality to reset the vector database
- Configured retriever for accessing vectors from the database"
este6an13 pushed a commit that referenced this issue Mar 26, 2024
- Implemented functions to load and chunk documents
- Developed setup function for vector database, allowing for persistence and metadata handling
- Added functionality to reset the vector database
- Configured retriever for accessing vectors from the database"
este6an13 added a commit that referenced this issue Mar 27, 2024
- add `vector_db_path` to `setup_vector_database` function
- handle zero documents case
este6an13 added a commit that referenced this issue Mar 27, 2024
- add `vector_db_path` to `setup_vector_database` function
- handle zero documents case
este6an13 added a commit that referenced this issue Mar 28, 2024
- move `setup_retriever` to `vector_store` module
- decompose `vector_store` module in smaller functions
- add logic to load documents from `data` folder
- add `CONTEXT_DATA_PATH` constant to reuse it
- add `VECTOR_DATABASE_PATH` constant to reuse it
- add `context` to prompt messages
este6an13 added a commit that referenced this issue Mar 28, 2024
- move `setup_retriever` to `vector_store` module
- decompose `vector_store` module in smaller functions
- add logic to load documents from `data` folder
- add `CONTEXT_DATA_PATH` constant to reuse it
- add `VECTOR_DATABASE_PATH` constant to reuse it
- add `context` to prompt messages
- skip llm client tests temporarily
este6an13 added a commit that referenced this issue Mar 28, 2024
- move `setup_retriever` to `vector_store` module
- decompose `vector_store` module in smaller functions
- add logic to load documents from `data` folder
- add `CONTEXT_DATA_PATH` constant to reuse it
- add `VECTOR_DATABASE_PATH` constant to reuse it
- add `context` to prompt messages
- skip llm client tests temporarily
este6an13 added a commit that referenced this issue Mar 30, 2024
- add `VectorStoreConfig` type
- add `vector_store_config` param to `query`
- add `setup_full_retriever` function
- add `setup_embeddings` function
- re-enable llm client unit tests
este6an13 added a commit that referenced this issue Mar 30, 2024
- add `VectorStoreConfig` type
- add `vector_store_config` param to `query`
- add `setup_full_retriever` function
- add `setup_embeddings` function
- re-enable llm client unit tests
este6an13 added a commit that referenced this issue Apr 3, 2024
- the vector db was not working for me because I had a cached
  version that didn't include any document
- removing the `vectordb` folder and running the script again
  fixed the issue
- a `main.py` file with some example queries was added for
  users to try some questions
este6an13 added a commit that referenced this issue Apr 4, 2024
- add `format_text` function to `faculty_secretary_faq` context builder
  script to remove extra spaces
- modify llm client prompt to ask the model to include URLs if it find
  any #5
- explore `get_relevant_documents` function to get information about
  sources
- add more example queries to `main.py`
este6an13 added a commit that referenced this issue Apr 4, 2024
- add `format_text` function to `faculty_secretary_faq` context builder
  script to remove extra spaces
- modify unit tests accordingly
- modify llm client prompt to ask the model to include URLs if it find
  any #5
- explore `get_relevant_documents` function to get information about
  sources
- add more example queries to `main.py`
@este6an13 este6an13 reopened this Apr 4, 2024
este6an13 added a commit that referenced this issue Apr 4, 2024
- add `format_text` function to `faculty_secretary_faq` context builder
  script to remove extra spaces
- modify unit tests accordingly
- modify llm client prompt to ask the model to include URLs if it find
  any #5
- explore `get_relevant_documents` function to get information about
  sources
- add more example queries to `main.py`
este6an13 added a commit that referenced this issue Apr 7, 2024
- add `database` module to implement simple `sqlite3` database
- add `_push_metadata` function to `vector_store` module
- note: the purpose of this is to store the vector db id of
  each document and its filename to be able to map both ids.
  This would be useful for a CRUD of the vector store and the
  context data that allows us to get, add, delete and update
  documents in the vector store with the help of this simple
  SQL table which works as a simple map between the id and
  the filename. We store the `group_id` too which is the
  equivalent of the context folder (ie:
  `faculty_secretary_faq`)
este6an13 added a commit that referenced this issue Apr 7, 2024
- add `database` module to implement simple `sqlite3` database
- add `_push_metadata` function to `vector_store` module
- note: the purpose of this is to store the vector db id of
  each document and its filename to be able to map both ids.
  This would be useful for a CRUD of the vector store and the
  context data that allows us to get, add, delete and update
  documents in the vector store with the help of this simple
  SQL table which works as a simple map between the id and
  the filename. We store the `group_id` too which is the
  equivalent of the context folder (ie:
  `faculty_secretary_faq`)
este6an13 added a commit that referenced this issue Apr 13, 2024
- add `document?filename` endpoint to api
- add `document?document_id` endpoint to api
- rename query endpoint to be `query?q`
este6an13 added a commit that referenced this issue Apr 13, 2024
- add `document?filename` endpoint to api
- add `document?document_id` endpoint to api
- rename query endpoint to be `query?q`
este6an13 added a commit that referenced this issue Apr 13, 2024
- add `document?filename` endpoint to api
- add `document?document_id` endpoint to api
- rename query endpoint to be `query?q`
este6an13 added a commit that referenced this issue Apr 13, 2024
- add `delete_document` function
- add `update_document` function
- add `delete` api endpoint
- add `put` api endpoint
este6an13 added a commit that referenced this issue Apr 13, 2024
- add `delete_document` function
- add `update_document` function
- add `delete` api endpoint
- add `put` api endpoint
este6an13 added a commit that referenced this issue Apr 14, 2024
- add `document_post` api endpoint
- add db `delete` operation
- add db `update_filename` operation
- add db `get_filename_by_id` operation
- add db `get_group_id_by_id` operation
- complete `delete_document` function
- complete `update_document` function
- add `_write_file` private function to `vector_store`
- add `add_document` function to `vector_store`
este6an13 added a commit that referenced this issue Apr 14, 2024
- add `document_post` api endpoint
- add db `delete` operation
- add db `update_filename` operation
- add db `get_filename_by_id` operation
- add db `get_group_id_by_id` operation
- complete `delete_document` function
- complete `update_document` function
- add `_write_file` private function to `vector_store`
- add `add_document` function to `vector_store`
este6an13 added a commit that referenced this issue Apr 18, 2024
- add `DocumentResponse` to format vector store get endpoints reponses
- define the return values for the other endpoints of the api
este6an13 added a commit that referenced this issue Apr 18, 2024
- add `DocumentResponse` to format vector store get endpoints reponses
- define the return values for the other endpoints of the api
este6an13 added a commit that referenced this issue Apr 18, 2024
- create `crud` submodule inside `vector_store` module
- move crud operations to the new `crud` submodule
este6an13 added a commit that referenced this issue Apr 18, 2024
- create `crud` submodule inside `vector_store` module
- move crud operations to the new `crud` submodule
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

No branches or pull requests

1 participant