<a href="https://colab.research.google.com/github/Ayman-Abdeen/testllm/blob/main/ollama_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!python --version

Python 3.10.12


In [None]:
!/usr/local/cuda/bin/nvcc --version

!nvidia-smi

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
/bin/bash: line 1: nvidia-smi: command not found


In [None]:
#curl -fsSL https://ollama.com/install.sh | sh
#ollama serve & ollama run llama3
#ollama pull nomic-embed-text

In [None]:
!pip install -q langchain langchain-community langchain-text-splitters chromadb

In [None]:
from langchain.document_loaders import PyPDFLoader ,TextLoader

In [None]:
local_path = "/content/Data.txt"

# Local PDF file uploads
if local_path:
  loader = TextLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a PDF file")

In [None]:
len(data)

1

In [None]:
!

In [None]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED       
nomic-embed-text:latest	0a109f422b47	274 MB	18 seconds ago	
llama3:latest          	365c0bd3c000	4.7 GB	18 minutes ago	


In [None]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [None]:
# Split and chunk
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=30,
    length_function=len,
    is_separator_regex=False
    )
chunks = text_splitter.split_documents(data)

In [None]:
len(chunks)

470

In [None]:
print(chunks[0].page_content)

UAE achieves 30% of Operation 300bn’s target since its 2021 launch, with the
industrial sector’s contribution to GDP reaching a projected AED 197 billion.
UAE industrial exports increase by 17% since the launch of Operation 300bn in
2021, reaching projected AED 187 billion in 2023.
AED 9.3bn value of import substitution projects.
The value spent by National In County Value (ICV) companies in 2023 on local products
and services increased by 25% to reach AED 67 billion, redirected to the national
economy.
Introduce 1,400 products that can be manufactured locally through the offtake project
during the 1st and 2nd editions of MIITE, with a total value of AED 120 billion. Of these,
51 percent of the product offtake opportunities, equivalent to AED 62 billion, have been
awarded.
During the 3rd edition of MIITE, introduce new offtake products worth AED 20 billion,
bringing the total value of products targeted for localization to AED 140 billion


In [None]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag",
    persist_directory="./chroma_db",
)

OllamaEmbeddings: 100%|██████████| 470/470 [12:43<00:00,  1.62s/it]


In [None]:
# load from disk
vector_db1 = Chroma(
    persist_directory="./chroma_db",
    #embedding_function=embedding_function
    embedding_function=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag",
    )

In [None]:
#Retrieval

In [None]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [None]:
# LLM from Ollama
local_model = "llama3"
llm = ChatOllama(model=local_model)

In [None]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [None]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(),
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [None]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
chain.invoke(input(""))

OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.34s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 13.23it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  3.44it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  8.24it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.10it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.93it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  6.34it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  8.45it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  4.57it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 12.68it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  4.33it/s]


"Based on the provided context, it appears to be related to business, economy, and government initiatives in the United Arab Emirates (UAE). The documents mention various topics such as:\n\n* Artificial Intelligence (AI) development\n* Economic growth and competitiveness\n* Industrial sector expansion\n* Innovation and technology adoption\n* Climate change and sustainability\n* Entrepreneurship and investment opportunities\n\nIt seems to be a collection of reports, speeches, and statistics related to the UAE's economic and industrial development."

In [None]:
chain.invoke('Hi, who are you ?')

OllamaEmbeddings: 100%|██████████| 1/1 [00:01<00:00,  1.16s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 12.46it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.77it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  8.21it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  7.98it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  6.08it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  7.69it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  8.44it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  3.59it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  1.40it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  1.19it/s]
