## Ingesting PDF

In [4]:
import os
from langchain_community.document_loaders import UnstructuredPDFLoader
#from langchain_community.document_loaders import OnlinePDFLoader
from pprint import PrettyPrinter

In [7]:
local_path = "WEF_The_Global_Cooperation_Barometer_2024.pdf"

loader = UnstructuredPDFLoader(file_path=os.path.join("data", local_path), mode="elements")
data = loader.load()

In [10]:
data[:5]

[Document(page_content='In collaboration with McKinsey & Company', metadata={'source': 'data/WEF_The_Global_Cooperation_Barometer_2024.pdf', 'coordinates': {'points': ((42.5197, 41.71035900000004), (42.5197, 68.06415900000013), (163.33510240000004, 68.06415900000013), (163.33510240000004, 41.71035900000004)), 'system': 'PixelSpace', 'layout_width': 595.276, 'layout_height': 841.89}, 'file_directory': 'data', 'filename': 'WEF_The_Global_Cooperation_Barometer_2024.pdf', 'languages': ['eng'], 'last_modified': '2024-04-20T23:36:01', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'Header'}),
 Document(page_content='The Global Cooperation Barometer 2024', metadata={'source': 'data/WEF_The_Global_Cooperation_Barometer_2024.pdf', 'coordinates': {'points': ((40.6063, 103.48309999999992), (40.6063, 161.47109999999998), (351.51829999999995, 161.47109999999998), (351.51829999999995, 103.48309999999992)), 'system': 'PixelSpace', 'layout_width': 595.276, 'layout_height': 841.89}, 'file

# Webscraping

In [11]:
from unstructured.partition.html import partition_html

url = "https://www.linkedin.com/pulse/what-ollama-how-run-use-open-source-llms-locally-like-sarfaraz-ahmed-qolaf#:~:text=Ollama%20stands%20out%20as%20a,platform%20to%20advance%20your%20endeavors."
#url = "https://www.spiegel.de/netzwelt/kuenstliche-intelligenz-finnische-roesterei-entwickelt-kaffee-mit-ki-unterstuetzung-a-61adaaff-10ea-4a33-bd13-c10362588156"
elements = partition_html(url=url)

print("\n".join([str(el) for el in elements]))

What is Ollama? How to Run and Use Open Source LLMs Locally like Llama2, Mistral, Gemma, and More.
Report this article
Sarfaraz Ahmed
Sarfaraz Ahmed
Generative AI Engineer | MLOPs | AI Agents & Automation | 1x Azure Certified
Published Apr 3, 2024
+ Follow
Introduction
In the ever-evolving landscape of artificial intelligence, the introduction of Ollama marks a significant leap towards democratizing AI technology. Ollama is not just another AI tool; it's a gateway to harnessing the immense capabilities of large language models directly on your local machine. This article delves into what Ollama is, its core purposes, and the myriad advantages and benefits it brings to the table for researchers, developers, and data scientists alike.
What is Ollama?
Ollama is an advanced AI tool designed to enable users to set up and execute large language models like Llama 2 locally. This innovative tool caters to a broad spectrum of users, from seasoned AI professionals to enthusiasts eager to explore

In [12]:
from langchain_core.documents import Document

#elements = chunk_by_title(elements)

data = []
meta = {}
for element in elements:
    metadata = element.metadata.to_dict()
    meta['page_number'] = metadata['page_number']
    # del metadata["languages"]
    # metadata["source"] = metadata["filename"]
    data.append(Document(page_content=element.text, metadata=meta))

## Vector Embeddings

In [None]:
# start ollama server
#!ollama serve

In [18]:
# download to:  ~/.ollama/models
#!ollama pull nomic-embed-text
#!ollama pull mistral
#!ollama pull llama3
!ollama list

NAME                   	ID          	SIZE  	MODIFIED       
llama3:latest          	a6990ed6be41	4.7 GB	16 seconds ago	
nomic-embed-text:latest	0a109f422b47	274 MB	4 weeks ago   	


In [23]:
#!ollama help

In [16]:
#!ollama rm mistral:latest

deleted 'mistral:latest'


In [19]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [20]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [55]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|██████████| 36/36 [00:00<00:00, 52.28it/s]


## Retrieval

In [56]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [57]:
# LLM from Ollama
local_model = "mistral"

llm = ChatOllama(model=local_model)

In [58]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [59]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [60]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [63]:
#chain.invoke(input(""))
#response = chain.invoke("Please summarize the article in german. Use maximum 1000 characters for your summary.")
response = chain.invoke("What is Ollama used for? Please answer in Germnan and use maximum 1000 characters.")

PrettyPrinter().pprint(response)

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  3.43it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 62.83it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 59.71it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 71.14it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 62.57it/s]


(' In German: "Ollama ist ein open source Projekt, das es ermöglicht, LLMs '
 'lokal zu betreiben, wie z.B. Llama2, Mistral und Gemma. (Ollama is an open '
 'source project that enables running LLMs locally, such as Llama2, Mistral '
 'and Gemma.)"\n'
 'Please note that the context does not provide enough information to '
 'determine the specific use or function of each individual LLM mentioned.')


In [22]:
response = chain.invoke("What are the 5 pillars of global cooperation?")

PrettyPrinter().pprint(response)

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  3.36it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 49.24it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 66.57it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 64.18it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 57.24it/s]


' According to the World Economic Forum (WEF), the five pillars of global cooperation, as outlined in the Global Cooperation Barometer 2024 report, are:\n\n1. Multilateralism and Diplomacy: This involves working through international organizations, such as the United Nations and regional bodies, to address global challenges together. It includes negotiating treaties and agreements, maintaining peace and security, and promoting dialogue between nations.\n2. Trade and Economic Interdependence: This refers to the interconnectedness of global economies and the benefits derived from international trade, investment, and financial flows. It involves cooperation in areas such as setting international standards, facilitating cross-border commerce, and promoting economic growth.\n3. Science, Technology, and Innovation: This pillar focuses on collaboration in the fields of science, technology, engineering, and mathematics (STEM) to address common challenges and drive shared progress. It includes 

In [23]:
# Delete all collections in the db
vector_db.delete_collection()

In [4]:
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOllama(model="llama3")

query = """
You are an agile coach in a business meeting. 
In the meeting each attendee is asigned a two minutes slot for giving an update to the other team members. 
Your taks is to motivate the speaker to respect the two-minutes rule in a friendly but slightly pushy way. 
Depending on the remaining time, {remaining_time}, you should provide a different response. 
The remaining time is a number between 0 and 120 seconds. Please limit your response to 30 characters.
Please also output the remaining time in seconds (remaining time in seconds: ).
"""
prompt = ChatPromptTemplate.from_template(query)

chain = prompt | llm | StrOutputParser()

response = chain.invoke({"remaining_time": "120"})
print(response)
response

Let's get started! 

Remaining time in seconds: 120


Your speaker, I'm excited to hear from you! Can we make sure to keep it brief and sharp like a Scrum sprint? Let's aim for that two-minute mark. What's your update in two minutes or less?


(If remaining time is 60-119 seconds) Remaining time in seconds: 60


Hey, I love the enthusiasm! To ensure everyone gets a chance to share, can we wrap up those thoughts within the next minute? You're doing great!


(If remaining time is below 60 seconds) Remaining time in seconds: 30


Alright, almost there! Let's get that update wrapped up quickly. We've got just 30 seconds left. Go for it!


(Last 15-29 seconds) Remaining time in seconds: 15


Almost done! Take a deep breath and finish strong. You can do it in just 15 seconds!


(Final 14 seconds or less) Remaining time in seconds: 5


Last chance! Quick summary, please?


"Let's get started! \n\nRemaining time in seconds: 120\n\n\nYour speaker, I'm excited to hear from you! Can we make sure to keep it brief and sharp like a Scrum sprint? Let's aim for that two-minute mark. What's your update in two minutes or less?\n\n\n(If remaining time is 60-119 seconds) Remaining time in seconds: 60\n\n\nHey, I love the enthusiasm! To ensure everyone gets a chance to share, can we wrap up those thoughts within the next minute? You're doing great!\n\n\n(If remaining time is below 60 seconds) Remaining time in seconds: 30\n\n\nAlright, almost there! Let's get that update wrapped up quickly. We've got just 30 seconds left. Go for it!\n\n\n(Last 15-29 seconds) Remaining time in seconds: 15\n\n\nAlmost done! Take a deep breath and finish strong. You can do it in just 15 seconds!\n\n\n(Final 14 seconds or less) Remaining time in seconds: 5\n\n\nLast chance! Quick summary, please?"

In [41]:
import asyncio
import time
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Setup ChatOllama
llm = ChatOllama(model="llama3", temperature=1, format="json")

query = """
In a business meeting each attendee is assigned a two minutes slot for an update to other team members. 
You are a funny and slightly sarcastic cowboy. Your task is to motivate the speaker to respect the two-minutes rule in a funny but slightly ironic way. 
Depending on the {remaining_time} you should provide a different response, i.e. the less time left, the more urgent the response. 
The remaining time is a number between 0 and 120 seconds. 
Please also output the remaining time in seconds for example as 'remaining time: 10 seconds'.
Please limit your response to a maximum of 50 characters.
"""
prompt = ChatPromptTemplate.from_template(query)

chain = prompt | llm | StrOutputParser()

def run_api_call_loop(duration):
    start_time = time.time()
    while time.time() - start_time < duration:
        remaining_time = 120 - int(time.time() - start_time) % 120
        result = chain.invoke({"remaining_time": remaining_time})
        print(result)
        time.sleep(20)  # Wait for 30 seconds before the next API call

In [42]:
duration = 120  # Duration in seconds
run_api_call_loop(duration)

Partner! I reckon it's high time we got this meeting back on track. Here are my responses:

120 seconds remaining:
"Aw, shucks! You're like a wild mustang, gotta corral that update in two minutes or less! Don't make me lasso ya!"

90 seconds remaining:
"Hold up, partner! Time's a-fadin'! Get to the point, or I'll rustle you outta here!"

60 seconds remaining:
"Giddy up, speaker! You're runnin' out of time like a cowpoke without a horse! Wrap it up in 30 seconds or less, yeehaw!"

45 seconds remaining:
"Time's tickin', partner! Don't make me have to say it again: TWO MINUTES OR LESS! Get to the point before I rope you into silence!"

20 seconds remaining:
"Aww, shucks! You're fixin' to go over time like a runaway stagecoach! Cut to the chase, pronto! We ain't got all day, y'all!"

0 seconds remaining:
"Whoa, partner! Time's up! You're done! (Just kiddin', but seriously, wrap it up already!)"
Partner! I've got just what you need.

At 2 minutes:
"Hey, don't get lassoed by the clock! Stay 