# Overview 

The main objective of this notebook is to

- Demonstrate how to update existing database with new embeddings

## Setup

In [1]:
from langchain.llms import Ollama
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

In [2]:
model_name = "mistral"
llm  = Ollama(
    model=model_name,
    callbacks=[StreamingStdOutCallbackHandler()],
)
ollama_emb = OllamaEmbeddings(
    model=model_name,
)

In [3]:
input_file = "../data/DNG.pdf"
loader = PyPDFLoader(input_file)
pages = loader.load_and_split()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100)
all_splits = text_splitter.split_documents(pages)

chroma_db = Chroma.from_documents(
    all_splits,
    ollama_emb,
    persist_directory="./chroma_db",  # This is where the database will be stored
)

the db does not know what `ornl-next` is, so we are expecting meaningless results.

In [4]:
from langchain.chains import RetrievalQA

target_source_chunks = 4
retriever = chroma_db.as_retriever(
    search_kwargs={"k": target_source_chunks},
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
)

In [5]:
qa.invoke("what is ornl-next?")

 I cannot directly answer that question with the given context as "ornl-next" is not mentioned in it. The context provided describes the steps for creating an artifact using DNG (Data Narrative Graph), and the possible options for artifact types include Capability and Story, among others.

{'query': 'what is ornl-next?',
 'result': ' I cannot directly answer that question with the given context as "ornl-next" is not mentioned in it. The context provided describes the steps for creating an artifact using DNG (Data Narrative Graph), and the possible options for artifact types include Capability and Story, among others.'}

now let's append document that does explain what `ornl-next` is

In [6]:
input_file = "../data/mantid-fork.pdf"
loader = PyPDFLoader(input_file)
pages = loader.load_and_split()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100)
all_splits = text_splitter.split_documents(pages)

In [7]:
chroma_db.add_documents(all_splits)

['fc487964-e147-11ee-9268-8e5edf19c0b3',
 'fc4879e6-e147-11ee-9268-8e5edf19c0b3',
 'fc487a04-e147-11ee-9268-8e5edf19c0b3',
 'fc487a18-e147-11ee-9268-8e5edf19c0b3',
 'fc487a2c-e147-11ee-9268-8e5edf19c0b3',
 'fc487a36-e147-11ee-9268-8e5edf19c0b3',
 'fc487a4a-e147-11ee-9268-8e5edf19c0b3',
 'fc487a5e-e147-11ee-9268-8e5edf19c0b3',
 'fc487a72-e147-11ee-9268-8e5edf19c0b3',
 'fc487a7c-e147-11ee-9268-8e5edf19c0b3',
 'fc487a90-e147-11ee-9268-8e5edf19c0b3',
 'fc487aa4-e147-11ee-9268-8e5edf19c0b3',
 'fc487aae-e147-11ee-9268-8e5edf19c0b3']

In [8]:
qa.invoke("what is ornl-next?")

 Ornl-next is a branch of the mantid project on GitHub that contains the latest development changes before they are merged into the stable ornal branch. It's where new features and bug fixes are being worked on, and it's recommended to merge release-next into ornl-next every night to prevent divergence with upstream mantid development.

{'query': 'what is ornl-next?',
 'result': " Ornl-next is a branch of the mantid project on GitHub that contains the latest development changes before they are merged into the stable ornal branch. It's where new features and bug fixes are being worked on, and it's recommended to merge release-next into ornl-next every night to prevent divergence with upstream mantid development."}