In [15]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [25]:
from pathlib import Path
from dotenv import load_dotenv
import openai
from rag_diary.config import Config
from rag_diary.vectore_store_chromadb import ChromadbVectorStore

current_folder = globals()['_dh'][0]

load_dotenv(Path(current_folder).parent / ".env")
openai.api_key = Config().OPENAI_API_KEY

config = Config()

## Rag diary Chromadb
#### create a new rag_diary.ChromadbVectorStore

The ChromadbVectorStore implements the rag_diary.VectorStore. rag_diary vector stores should work the same way.

define a collection name to retrieve, if the collection does not exist you will get a ValueError warning then the VecorStore will create a new collection. Chromadb will then persist the collection in a sqllite db stored locally at the path provided in the .env variable rag_diary_vector_db_path

In [ ]:
rag_vector_store = ChromadbVectorStore(db_path=config.rag_diary_vector_db_path, collection_name="rag_diary_jupyter")

### Lets add some test data into the persistent storage 
adds a single record with the text to embedd and some metadata

In [20]:
rag_vector_store.add("this is an entry about dogs", {"name": "dogs", "id":"123"})

/Users/tonail_/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:03<00:00, 21.2MiB/s]


add multiple records at once to the db.

In [24]:
rag_vector_store.add_multiple(
    documents=[
        "this is a document about cats",
        "Robots are destined to rule the world of man"
    ],
    metadatas=[
        {"name": "cats", "id":"122"},
        {"name": "robots", "id": "0101010101010101010"}
    ]
    
)

### Here is the fun part. 
Let us query the vector db. This implementation uses a local embedding model. queries will return results ordered by cosine similarity in the form of Records.

In [60]:
data = rag_vector_store.query_by_str("robots are cool")
data

Number of requested results 10 is greater than number of elements in index 3, updating n_results = 3


[{'document': 'Robots are destined to rule the world of man',
  '_id': 'e5768791-e7a4-4483-bbe4-906550cf3e25',
  'id': '0101010101010101010',
  'name': 'robots'},
 {'document': 'this is an entry about dogs',
  '_id': '9355166b-c9c9-4d84-888a-9fcfd95ccdc2',
  'id': '123',
  'name': 'dogs'},
 {'document': 'this is a document about cats',
  '_id': 'b927d8f6-1714-48b6-891a-49922f539ef7',
  'id': '122',
  'name': 'cats'}]