### Packages

In [39]:
# !pip install langchain langchain_community langchain_openai pymupdf chromadb tiktoken

### Imports
Note: langchain_community chat models are deprecated. Use the `langchain_openai` library istead. I'm still using `langchain_community.chat_models.ChatOpenAI` because of a corruption in my `langchain_openai` library.

In [40]:
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.chat_models import ChatOpenAI #Deprecated
# from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.chains import create_retrieval_chain
import tiktoken
from IPython.display import display, Markdown

### Load documents

In [41]:
local_path = "docs/fpc-manual.pdf"

# Local PDF file uploads
if local_path:
    loader = PyMuPDFLoader(local_path)
    data = loader.load()
else:
    print("Upload a PDF file")

# Preview 1st page
data[0]



### Split documents into small chunks to be store in the vector store

In [42]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=50)
chunks = text_splitter.split_documents(data)

In [43]:
# View the 1st chunk
chunks[0]



### Store chunk embeddings into Chroma vector store

In [44]:
embedding_model = "nomic-embed-text"
collection = "localbot-rag"

vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model=embedding_model,show_progress=True),
    collection_name=collection
)

OllamaEmbeddings: 100%|██████████| 253/253 [00:14<00:00, 17.70it/s]


### Create a retriever to fetch contextually pertinent data from ChromaDB

In [45]:
retriever = vector_db.as_retriever()

### Create a local chatbot based on Llama 3.1 8B running locally

The following code assumes that Ollama is up and running a local Llama 3.1 8B model.

Create a LangChain ChatOpenAI instance with your own local Llama 3.1 model. The 8b version runs pretty well on a decent personal computer. For a modest computer, pull and use a smaller model like [gemma 2 2B](https://ollama.com/library/gemma2).<br>
You can still use a closed OpenAI GPT-X model if you want. In that case, use your API key and do not provide a base URL.

In [46]:
local_model = "llama3.1:8b"
# local_model = "gemma2:2b"
llm = ChatOpenAI(
    model=local_model,
    temperature=0,
    base_url="http://localhost:11434/v1",
    api_key="NA"
)

prompt_template = """
You are an assistant for question-answering tasks.
Use the following context to answer the question.
If you don't know the answer, say that you don't know.

<context>
    {context}
</context>

Question: {input}
"""

prompt = ChatPromptTemplate.from_template(prompt_template)

# Create a retrieval chat chain 
doc_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(retriever, doc_chain)

### Question answering and token usage

#### functions to estimate token usage
Tiktoken is a library designed to break down text into tokens. It can encode text strings into tokens, and can be used to estimate the cost of API calls when the encoding name for the model is known. It is specialized for OpenAI language models like GPT-3.
Even though we are using Llama 3.1, tiktoken can still give a rough estimate of token usage.

In [47]:
def estimate_tokens(text):
  encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")  # Replace with your OpenAI LLM's encoding
  return len(encoding.encode(text))
  
def token_usage(query, response):
    """
    Estimates the token usage for a given query and its response.
    Args:
        query (str): The original query.
        response (dict): A dictionary containing the response and context information.
    Returns:
        None: Prints the token counts to the console.
    """
    
    # Count prompt tokens: query tokens + tokens of the context retrieved from ChomaDB
    query_tokens = estimate_tokens(query)

    context = "\n".join([document.page_content for document in response['context']])
    context_tokens = estimate_tokens(context)

    prompt_tokens = query_tokens + context_tokens
    
    # Count response tokens
    response_tokens = estimate_tokens(response['answer'])

    print(f"Prompt Tokens: {prompt_tokens}")
    print(f"Completion Tokens: {response_tokens}")
    print(f"Total Tokens: {prompt_tokens + response_tokens}")

#### Query 1: Getting the minimum safe cooking temperature for chicken

In [48]:

user_query = "What is the minimum safe cooking temperature for chicken?"

response = retrieval_chain.invoke({"input": user_query})

display(Markdown(response['answer']))

token_usage(user_query, response)


OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 43.81it/s]


According to the context, the minimum safe cooking temperature for poultry (which includes chicken) is 165°F (for 15 seconds).

Prompt Tokens: 1402
Completion Tokens: 27
Total Tokens: 1429


#### Query 2: FIFO in food safety

In [49]:
user_query = "What is FIFO in food safety?"

response = retrieval_chain.invoke({"input": user_query})

display(Markdown(response['answer']))

token_usage(user_query, response)


OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 20.94it/s]


The acronym FIFO means "First In First Out", which is a method of stock rotation that prevents waste of food products and ensures quality by moving the oldest stock to the front and the newly received stock to the back.

Prompt Tokens: 1426
Completion Tokens: 43
Total Tokens: 1469


#### Query 3: Summarization

In [50]:
user_query = "Summarize the document in context"

response = retrieval_chain.invoke({"input": user_query})

display(Markdown(response['answer']))

token_usage(user_query, response)


OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 30.00it/s]


The document appears to be a guide for food establishments in New York City, specifically regarding foodborne illnesses. It mentions that certain menu items can only be listed on a menu or menu board for less than 30 days in a calendar year.

It then describes a common foodborne illness with the following characteristics:

* Onset time: 1-7 days
* Type of illness: Infection
* Symptoms: Abdominal pain, diarrhea (bloody stools), and fever

The document provides control measures to prevent this illness, which include:

* Practicing good personal hygiene, especially hand washing after using the toilet
* Avoiding bare hands contact with ready-to-eat foods
* Cooling foods rapidly to 41°F or below
* Avoiding cross contamination
* Eliminating flies from the facility
* Cleaning and sanitizing all surfaces

Overall, the document aims to educate food establishments on how to prevent and control common foodborne illnesses.

Prompt Tokens: 336
Completion Tokens: 191
Total Tokens: 527
