# Ollama PDF RAG Notebook


In [3]:
# !pip install unstructured langchain
# !pip install unstructured[all-docs]
# !pip install chromadb -U
# !pip install langchain-text-splitters
# !pip install tokenizers -U
# !pip install -U langchain-ollama
# !ollama pull nomic-embed-text

#Delete Ollama model
#!ollama rm nomic-embed-text

# Ref https://github.com/tonykipkemboi/ollama_pdf_rag 
# Youtube: https://www.youtube.com/watch?v=ztBJqzBU5kc&ab_channel=TonyKipkemboi

## Import Libraries & Load PDF

In [None]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers import MultiQueryRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
# Jupyter-specific imports
from IPython.display import display, Markdown

local_path = r"frai-2-1425713.pdf"

# Local PDF file uploads
if local_path:
    loader = UnstructuredPDFLoader(local_path)
    data = loader.load()
    #Preview first page of the PDF
    print(data[0].page_content)
else:
    print("Please provide a correct path to a PDF file")

In collaboration with McKinsey & Company

The Global Cooperation Barometer 2024

I N S I G H T R E P O R T

J A N U A R Y 2 0 2 4

Images: Getty Images

The Global Cooperation Barometer 2024

Contents

Foreword

3

About the Global Cooperation Barometer

4

Executive summary

6

Introduction: The state of global cooperation

7

Five pillars of global cooperation

9

Pillar 1 Trade and capital

9

Pillar 2 Innovation and technology

11

Pillar 3 Climate and natural capital

13

Pillar 4 Health and wellness

15

Pillar 5 Peace and security

17

Conclusion: Towards a more cooperative future

19

Appendix: Sources and methodology

20

Contributors

22

Endnotes

23

Disclaimer This document is published by the World Economic Forum as a contribution to a project, insight area or interaction. The findings, interpretations and conclusions expressed herein are a result of a collaborative process facilitated and endorsed by the World Economic Forum but whose results do not necessarily represent

## Vector Embedding

In [5]:
# !ollama pull nomic-embed-text
# !ollama pull llama3.2:latest
# References to https://ollama.com/
!ollama list

NAME                       ID              SIZE      MODIFIED       
nomic-embed-text:latest    0a109f422b47    274 MB    35 minutes ago    
llama3.2:latest            a80c4f17acd5    2.0 GB    25 hours ago      


### Split text into chunks

In [6]:
# Split and chunk
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(data)

### Create vector database

In [7]:
#Create vector database
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text", show_progress=True),
    collection_name="local-rag"
)

  embedding=OllamaEmbeddings(model="nomic-embed-text", show_progress=True),
OllamaEmbeddings: 100%|██████████| 98/98 [03:22<00:00,  2.06s/it]


## Set up LLM and Retrieval

In [8]:
# LLM from Ollama
local_model = "llama3.2:latest"
llm = ChatOllama(model=local_model)

In [9]:
# Query prompt template
QUERY_PROMPT = PromptTemplate(
    input_variables= ["question"],
    template='''
    You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question:  {question}'''
)

# Set up retriever
retriver = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(),
    llm,
    prompt = QUERY_PROMPT,
)

## Create chain

In [10]:
# RAG prompt
template ='''Answer the question based ONLY on the following context:
            {context}
            Question: {question}
        '''
prompt = ChatPromptTemplate.from_template(template)

In [11]:
# Create chain
chain = ({"context": retriver, "question": RunnablePassthrough()}
         | prompt
         | llm
         | StrOutputParser()
)

In [12]:
def chat_with_pdf(question):
    """
    Chat with the PDF using the RAG chain.
    """
    return display(Markdown(chain.invoke(question)))

chat_with_pdf("What is the main idea of this document?")


OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.04s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.04s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.05s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.04s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.05s/it]


The main idea of this document appears to be an overview or introduction to the "Global Cooperation Barometer", which seems to be a report or study on global cooperation in various areas such as trade, innovation, climate change, health, and peace. The document provides context, methodology, and potential implications for promoting cooperation and addressing global challenges.

## Clean up Vector Database

In [13]:
# Delete all collections in the db
vector_db.delete_collection()