# Running LLMs Locally using Ollama
Learn how to utilize Ollama in your Python, LangChain, and LlamaIndex applications

- https://ai.gopubby.com/running-llms-locally-using-ollama-f17197f60450

## Ollama Setup

In [1]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [2]:
import subprocess

# Run the command as a background process
subprocess.Popen(["nohup", "ollama", "serve"], stdout=open("nohup.out", "w"), stderr=subprocess.STDOUT)

<Popen: returncode: None args: ['nohup', 'ollama', 'serve']>

In [3]:
%%capture
!ollama pull llama3.2

In [4]:
%%capture
!pip install -q ollama

In [10]:
!ls /root/.ollama/models/manifests/registry.ollama.ai/library/

deepseek-r1  llama3.2


## Using Ollama as a Chatbot
You can use Ollama to generate a response by using the generate() method:

In [11]:
import ollama

response = ollama.generate(
    model="llama3.2:latest",
    prompt="Who is Steve Jobs?"
)
print(response["response"])

Steve Jobs (1955-2011) was a visionary entrepreneur, inventor, and designer who co-founded Apple Inc. and Pixar Animation Studios. He is widely recognized as one of the most innovative and successful business leaders of all time.

Early Life and Education:

Jobs was born in San Francisco, California, to two University of Wisconsin graduate students, Joanne Schieble and Abdulfattah "John" Jandali. He was adopted by Paul and Clara Jobs, a machinist and an accountant, respectively, who raised him in Mountain View, California.

Jobs showed an early interest in electronics and design, attending lectures at Hewlett-Packard (HP) while still in high school. After dropping out of Reed College in Portland, Oregon, he attended meetings of calligraphy classes for students without paying, which sparked his interest in the subject.

Career:

1. Apple Inc. (1976-1985): Jobs co-founded Apple with Steve Wozniak and Ronald Wayne. The company's first product was the Apple I computer, followed by the Appl

In [12]:
from IPython.display import Markdown, display

display(Markdown(response["response"]))

Steve Jobs (1955-2011) was a visionary entrepreneur, inventor, and designer who co-founded Apple Inc. and Pixar Animation Studios. He is widely recognized as one of the most innovative and successful business leaders of all time.

Early Life and Education:

Jobs was born in San Francisco, California, to two University of Wisconsin graduate students, Joanne Schieble and Abdulfattah "John" Jandali. He was adopted by Paul and Clara Jobs, a machinist and an accountant, respectively, who raised him in Mountain View, California.

Jobs showed an early interest in electronics and design, attending lectures at Hewlett-Packard (HP) while still in high school. After dropping out of Reed College in Portland, Oregon, he attended meetings of calligraphy classes for students without paying, which sparked his interest in the subject.

Career:

1. Apple Inc. (1976-1985): Jobs co-founded Apple with Steve Wozniak and Ronald Wayne. The company's first product was the Apple I computer, followed by the Apple II, which revolutionized the personal computer industry.
2. Pixar Animation Studios (1986-2006): After a power struggle at Apple, Jobs acquired Pixar from Lucasfilm for $5 million in 1986. He played a crucial role in revamping the studio's animation software and producing blockbuster films like Toy Story and Finding Nemo.
3. Return to Apple Inc. (1997-2011): In 1997, Jobs returned to Apple as interim CEO after serving on the company's board of directors. Under his leadership, Apple revolutionized the technology industry with innovative products like the iMac, iPod, iPhone, and iPad.

Notable Contributions:

* Revitalizing the personal computer industry
* Revolutionizing the music industry with the iPod
* Popularizing smartphones with the iPhone
* Transforming the tablet computing market with the iPad
* Pioneering user-friendly design principles

Awards and Legacy:

Jobs received numerous awards for his contributions to technology, design, and entrepreneurship. He was a pioneer of innovation and design thinking, inspiring generations of entrepreneurs, designers, and inventors.

In 2011, Steve Jobs passed away after a long battle with pancreatic cancer. His legacy continues to shape the world of technology, design, and entertainment, leaving behind a lasting impact on humanity.

In [14]:
response = ollama.generate(
    model="llama3.2:latest",
    prompt="Define LLMs, RAG (Retrieval Augmented Generation), and LLM Agents"
)
display(Markdown(response["response"]))

Here are definitions for each of the requested terms:

1. **LLMs (Large Language Models)**: A type of artificial intelligence (AI) model that uses natural language processing (NLP) to understand and generate human-like text. LLMs are typically trained on vast amounts of text data, which enables them to learn patterns and relationships within language. They can be used for a variety of tasks such as language translation, question answering, text summarization, and more.

2. **RAG (Retrieval Augmented Generation)**: A technique used in Natural Language Processing (NLP) that leverages Large Language Models (LLMs) to improve the performance of various NLP tasks. RAG involves using an LLM as a retriever to search for relevant information from a large corpus, and then generating text based on the retrieved information. This approach can significantly improve the accuracy and efficiency of tasks such as question answering, text classification, and more.

3. **LLM Agents**: An emerging concept in Artificial Intelligence (AI) that involves using Large Language Models (LLMs) to create autonomous agents capable of interacting with humans and their environment. LLM agents are designed to learn from their interactions and adapt to new situations, enabling them to perform complex tasks such as conversation management, decision-making, and more.

In essence, LLM agents are a type of AI system that uses LLMs to enable intelligent behavior, such as:

* Understanding natural language input
* Generating human-like responses
* Learning from user interactions
* Adapting to new situations

The use of RAG techniques can further enhance the capabilities of these LLM agents by improving their ability to retrieve and generate relevant information.

In [15]:
messages = ''

while True:
    question = input("Ask a question: ")
    if question.lower() == "quit":
        break

    # Add user input to the messages history
    messages += f'You: {question}\n'

    response = ollama.generate(model = 'llama3.2:latest',
                               prompt = messages)    
    print(response['response'])

    # Add Ollama's response to the messages history
    messages += f'Ollama: {response["response"]}\n'

Ask a question:  What is the capital city of France?


The capital city of France is Paris.


Ask a question:  When was it founded and give a brief history about it?


Thank you for the confirmation, Ollama!

Paris, the capital city of France, has a rich history dating back to the 3rd century BC. Here's a brief overview:

* Ancient times: The area now known as Paris was inhabited by the Celtic tribe known as the Parisii.
* Roman era (52 BC): The Romans conquered the area and founded the city of Lutetia Parisiorum, which they named after their leader, Lutetia. This was an important center for trade and commerce.
* Middle Ages (5th-15th centuries): After the fall of the Roman Empire, Paris became a strategic location for various European powers, including the Franks and the Carolingians. The city was an important center of learning and culture during this period.
* Renaissance and Enlightenment (16th-18th centuries): During the Renaissance, Paris became a hub for art, literature, and music, with famous figures like Leonardo da Vinci, Michelangelo, and Voltaire calling the city home. The Enlightenment also had a significant impact on Paris, with thinker

Ask a question:  quit


Ollama also provides another method that allows you to have a conversation with the model — **chat()**. While the **generate()** method is primarily designed for single queries or isolated prompts, the **chat()** method is designed for ongoing conversations with the model. Maintaining conversations is similar to what you have just seen —but instead of saving the previous conversations in a string, you now save it in a list, like this:

In [16]:
messages = []

while True:
    question = input("Ask a question: ")
    if question.lower() == "quit":
        break
    
    # add user input to the messages history
    messages.append({"role": "user", "content": question})

    # use the ollama.chat() function to maintain conversation context
    response = ollama.chat(
        model='llama3.2:latest',
        messages = messages
    )

    # print Ollama's response
    print(f"Ollama: {response['message']['content']}\n")

    # add assistant's response to the messages history
    messages.append({"role": "assistant", "content": response['message']['content']})

Ask a question:  State Netwon's Laws of Motion


Ollama: Sir Isaac Newton formulated three fundamental laws of motion, which describe the relationship between a body and the forces acting upon it. Here are Newton's Laws of Motion:

**Newton's First Law of Motion (Law of Inertia)**

* An object at rest will remain at rest, unless acted upon by an external force.
* An object in motion will continue to move with a constant velocity, unless acted upon by an external force.

In other words, an object will maintain its state of motion unless a force is applied to it. If no forces are acting on the object, it will remain at rest or continue moving at a constant speed.

**Newton's Second Law of Motion (Law of Acceleration)**

* The acceleration of an object is directly proportional to the net force acting upon it.
* F = ma, where:
 + F is the net force applied to the object
 + m is the mass of the object
 + a is the acceleration produced in the object

In simpler terms, the more massive an object is, the less it will accelerate when a given 

Ask a question:  How is the third law applied in jet enginees?


Ollama: The Third Law of Motion (Law of Action and Reaction) plays a crucial role in the design and operation of jet engines.

**The Principle of Jet Engine Operation**

In a jet engine, hot gases are expelled out of the back of the engine at high velocity, which produces a forward thrust. The basic principle is based on Newton's Third Law:

* The exhaust gases (object B) push against the engine's nozzle (object A)
* The engine's nozzle pushes back against the exhaust gases with an equal and opposite force

**How it Works**

Here's a simplified explanation of how this works:

1. **Fuel is burned**: Fuel is injected into the combustion chamber, where it is ignited by a spark or flame.
2. **Hot gases are produced**: The fuel burns rapidly, producing hot gases that expand and push against the engine's walls.
3. **Nozzle expansion**: These hot gases expand through a nozzle, which accelerates them to high velocities (up to 500 m/s).
4. **Exhaust gases exit**: The exhaust gases exit the back

Ask a question:  quit


## Using Ollama in LangChain
Ollama can also be used in a LangChain application.

> **LangChain** is a framework designed for developing applications powered by language models (LLMs). It provides a comprehensive set of tools and abstractions that facilitate the integration of LLMs into various applications, enabling developers to build complex, multi-functional systems that leverage natural language understanding and generation capabilities.

To use Ollama in a LangChain app, use `pip` to install the following library:

In [18]:
%%capture
!pip install langchain_community

You can now ask a question using the `invoke()` method:

In [19]:
from langchain_community.llms import Ollama

llm = Ollama(model="llama3.2:latest")

# print(llm.invoke("Who is Steve Jobs"))
display(Markdown(llm.invoke("Who is Steve Jobs")))

  llm = Ollama(model="llama3.2:latest")


Steve Jobs (1955-2011) was a visionary entrepreneur, inventor, and designer who co-founded Apple Inc. He is widely recognized as one of the most innovative and successful business leaders of the last century.

Early Life and Education:

Steve Jobs was born on February 24, 1955, in San Francisco, California, to two University of Wisconsin graduate students, Joanne Schieble and Abdulfattah "John" Jandali. He was adopted by Paul and Clara Jobs, a machinist and an accountant, respectively, who raised him in Mountain View, California.

Jobs showed an early interest in electronics and design, attending lectures at Hewlett-Packard (HP) while still in high school. After graduating from Homestead High School in 1972, he attended Reed College in Portland, Oregon, but dropped out after one semester due to the financial burden on his parents.

Career:

In 1974, Jobs met Steve Wozniak, a fellow electronics enthusiast and engineer at Hewlett-Packard (HP). The two began designing and building personal computers together, including the Apple I and Apple II. In April 1976, they founded Apple Computer in Jobs' parents' garage.

Under Jobs' leadership, Apple introduced the Macintosh computer in 1984, which popularized the graphical user interface (GUI) and revolutionized personal computing. The company also released the iPod (2001), iPhone (2007), and iPad (2010), all of which became incredibly successful products that transformed the music, phone, and tablet industries.

In 1985, Jobs left Apple after a power struggle with John Sculley, who had been appointed CEO by the board. During his absence from Apple, Jobs acquired Pixar Animation Studios from Lucasfilm for $5 million and served as its CEO until it was acquired by Disney in 2006 for $7.4 billion.

Return to Apple:

In 1997, Apple acquired NeXT, a company co-founded by Jobs after he left Apple. As part of the acquisition, Jobs returned to Apple as an advisor and eventually took over as interim CEO in August 1997. Under his leadership, Apple introduced the iMac (1998), iPod (2001), and iPhone (2007), which revitalized the company's fortunes.

Legacy:

Steve Jobs is widely regarded as a visionary and innovative leader who transformed the way people interact with technology. His focus on design, simplicity, and user experience has influenced countless products and companies around the world.

Some of his most notable achievements include:

1. Revolutionizing personal computing with the Macintosh computer.
2. Popularizing the music industry with the iPod.
3. Redefining the smartphone market with the iPhone.
4. Pioneering tablet computers with the iPad.
5. Building Pixar Animation Studios into a successful and innovative film studio.

Awards and Recognition:

Jobs received numerous awards and recognition for his contributions to technology, design, and innovation, including:

1. National Medal of Technology (1985)
2. National Academy of Engineering (1990)
3. Inducted into the California Hall of Fame (2007)
4. Time Magazine's Person of the Year (2007)

Personal Life:

Steve Jobs was married twice: first to Chrisann Brennan, with whom he had a daughter, Lisa, in 1978; and then to Laurene Powell Jobs, whom he married in 1991.

Health Issues and Death:

In 2003, Jobs was diagnosed with pancreatic cancer. He underwent surgery and was placed on a diet of pureed foods for six months, but the cancer returned in 2009. After undergoing liver transplant surgery, Jobs resigned as CEO of Apple in August 2011 due to his failing health.

Steve Jobs passed away on October 5, 2011, at the age of 56, surrounded by his family and loved ones.

His legacy continues to inspire innovation, design, and entrepreneurship around the world.

In [21]:
%%capture
!pip install langchain_ollama

In [22]:
from langchain_ollama import OllamaLLM

llm = OllamaLLM(model="llama3.2:latest")

# print(llm.invoke("Who is Steve Jobs"))
display(Markdown(llm.invoke("Who is Steve Jobs")))

Steve Jobs (1955-2011) was a visionary entrepreneur, inventor, and designer who co-founded Apple Inc. and Pixar Animation Studios. He is widely recognized as one of the most innovative and successful business leaders of all time.

Early Life and Education:

Jobs was born in San Francisco, California, to two University of Wisconsin graduate students. His biological parents were Joan Baez's adoptive parents, Paul and Clara Jobs, but he had a half-sister, Mona Simpson, who is also an author. When Steve was 12 years old, his father died from complications related to a liver transplant.

In 1972, Jobs met Steve Wozniak, with whom he would later co-found Apple Computer. Jobs attended Reed College in Portland, Oregon, but dropped out after one semester due to the financial burden on his parents.

Career:

1. **Apple Computer (1976-1985)**: With Wozniak, Jobs founded Apple in 1976 and introduced the Apple I, one of the first personal computers on the market.
2. **Pixar Animation Studios (1986-2006)**: Jobs acquired Pixar Animation Studios from Lucasfilm in 1986 for $5 million. Under his leadership, Pixar produced some of its most iconic films, including Toy Story (1995) and Finding Nemo (2003).
3. **Return to Apple (1997-2011)**: After a falling out with then-CEO John Sculley, Jobs returned to Apple in 1997 as interim CEO. He led the company's resurgence with innovative products like the iMac, iPod, iPhone, and iPad.

Innovations and Legacy:

Jobs was known for his passion for design, simplicity, and user experience. His innovations include:

1. **The Macintosh computer**: Introduced the graphical user interface (GUI) to the masses.
2. **iPod**: Revolutionized the portable music player market.
3. **iPhone**: Popularized the multi-touch smartphone concept.
4. **iPad**: Popularized the tablet computer market.

Personal Life and Death:

Jobs was known for his intense focus on work and personal struggles with pancreatic cancer. He passed away on October 5, 2011, at the age of 56.

Awards and Recognition:

* National Medal of Technology (1985)
* Inducted into the California Hall of Fame (2007)
* Inducted into the Museum of Modern Art's (MoMA) Design Collection (2008)

Steve Jobs' legacy continues to shape the tech industry, inspiring future generations of innovators and entrepreneurs.

Of course, in a LangChain application you can always chain together various components, such as **PromptTemplate**, **LLM**, and **StrOutputParser**:

In [25]:
from langchain import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import OllamaLLM

template = '''
Translate the following English sentence to Jamaican Patois:
English: {text}
Jamaican Patois:
'''

prompt = PromptTemplate(
    template = template,
    input_variables = ['text']
)

llm = OllamaLLM(model="llama3.2:latest")

chain = prompt | llm | StrOutputParser()

print(chain.invoke({"text": "The flight has been rescheduled"}))

Befo' flight a-gwaan, ya get di flight nuh. (Note: This is a common way to phrase it in Jamaican Patois.)

Alternatively, you could say:

"Flight a-go change" or "Flight a-go shift"

However, the most informal and colloquial way would be:

"Befo' flight a-gwaan, ya get di new date yah."


In [26]:
template = '''
Translate the following English sentence to Jamaican Patwah:
English: {text}
Jamaican Patwah:
'''

prompt = PromptTemplate(
    template = template,
    input_variables = ['text']
)

llm = OllamaLLM(model="llama3.2:latest")

chain = prompt | llm | StrOutputParser()

print(chain.invoke({"text": "The flight has been rescheduled"}))

I can help you with a translation, but please note that Jamaican Patois is a complex and nuanced language, and my translation might not be perfect. Here's an attempt at translating the sentence:

"Flight nuh go happen like dat; di flight get rescheduled."

Breakdown:

* "Nuh" means "not" or "no"
* "go" is used to express negation
* "dat" means "that" (this is a colloquial usage)
* "flight" remains the same, as it's a loanword from English
* "get" is used instead of "has been"
* "rescheduled" remains the same, but "nuh" could be added to make it sound more like Patois

Keep in mind that Patois has many variations and dialects, so this translation might not be universally accepted.


## Using Ollama with LlamaIndex
Apart from using Ollama as a standalone model or as a LLM component in LangChain, you can also use Ollama with LlamaIndex.

> **LlamaIndex** is a framework designed to facilitate the integration of language models (LLMs) with various types of data sources.

In [31]:
%%capture
!pip install llama-index-llms-ollama
!pip install llama_index
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-huggingface

See previous article using LLammaindex with Hugging Face models locally

- [Article](https://ai.gopubby.com/retrieval-augmented-generation-rag-for-document-based-question-answering-c3f5f939f886?source=post_page-----f17197f60450--------------------------------)

### Preparing the Documents

#### Loading the Documents

In [34]:
from llama_index.core import SimpleDirectoryReader

loader = SimpleDirectoryReader(
    input_dir="./documents",
    recursive=True,
    required_exts=[".pdf"],
)

# loads the documents
documents = loader.load_data()

In [35]:
len(documents)

45

In [40]:
display(Markdown(documents[0].text))

Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.edu
Łukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗ ‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring significantly
less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-
to-German translation task, improving over the existing best results, including
ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task,
our model establishes a new single-model state-of-the-art BLEU score of 41.8 after
training for 3.5 days on eight GPUs, a small fraction of the training costs of the
best models from the literature. We show that the Transformer generalizes well to
other tasks by applying it successfully to English constituency parsing both with
large and limited training data.
∗Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started
the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and
has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head
attention and the parameter-free position representation and became the other person involved in nearly every
detail. Niki designed, implemented, tuned and evaluated countless model variants in our original codebase and
tensor2tensor. Llion also experimented with novel model variants, was responsible for our initial codebase, and
efficient inference and visualizations. Lukasz and Aidan spent countless long days designing various parts of and
implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating
our research.
†Work performed while at Google Brain.
‡Work performed while at Google Research.
31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
arXiv:1706.03762v7  [cs.CL]  2 Aug 2023

### Using an Embedding Model
Once the documents are loaded, the next step is to perform embedding to convert the text data into vector representations. This allows for more efficient querying, similarity search, and further processing of the documents.

In [41]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Indexing the Document
You can now start to index the document using the embedding model via the VectorStoreIndex class, which you will use to create an index and then save the vector embeddings on disk:

In [43]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents,
    embed_model = embedding_model,
)

# save the index in the current directory
index.storage_context.persist(persist_dir="./vectore_docs")

### Loading the Index
Once the index is persisted to disk, you can load it into memory using the StorageContext class and the `load_index_from_storage()` function:

In [45]:
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

storage_context = StorageContext.from_defaults(persist_dir="./vectore_docs")
index = load_index_from_storage(storage_context,
                                embed_model = embedding_model)

### Using an LLM for Querying

In [46]:
from llama_index.llms.ollama import Ollama
import torch

# determine the device
if torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# use ollama as the LLM
llm = Ollama(model="llama3.2:latest")
query_engine = index.as_query_engine(llm=llm)

You can now ask questions about the documents using Ollama:

In [48]:
while True:
    question = input("Question: ")
    if question.lower() == "quit": break
    # print(query_engine.query(question).response)
    display(Markdown(query_engine.query(question).response))

Question:  Define Transformers and Attention Mechanism


Transformers are a type of neural network architecture that primarily consists of stacked self-attention mechanisms and point-wise, fully connected layers for both the encoder and decoder.

The attention mechanism is a function that maps a query and a set of key-value pairs to an output. It computes a weighted sum of the values based on the similarity between the query and keys. This similarity is determined by the dot product of the query and keys, which are then normalized and applied to a softmax function to produce weights for each value. These weights are then used to compute the final output as a weighted sum of the values.

In simpler terms, attention allows the model to focus on specific parts of the input data that are relevant to the task at hand, while ignoring less important information. This is particularly useful in situations where the relationships between different parts of the input data are long-range and need to be modeled.

Question:  Give the model variants of DeepSeek-R1


The model variants of DeepSeek-R1 are:

DeepSeek-R1-Zero
DeepSeek-R1
DeepSeek-R1-Distill-Qwen-1.5B
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Qwen-14B
DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Llama-8B
DeepSeek-R1-Distill-Llama-70B

Question:  Give a summary about the DeepSeek-R1 paper


DeepSeek-R1 is an AI model that has shown significant improvements in accuracy, particularly in STEM-related questions and long-context-dependent QA tasks. It excels in document analysis and fact-based queries, outperforming its predecessor DeepSeek-V3 on certain benchmarks. The model's strengths are attributed to large-scale reinforcement learning training, which enhances its reasoning capabilities. The paper also highlights the importance of instruction-following data and human priors in designing the cold-start data for the model.

Question:  Define LLMs, RAG, and LLM Agents


Language Models (LLMs) are a type of artificial intelligence designed to capture vast amounts of knowledge about the world. They enable models to answer questions without accessing external sources, by leveraging their vast knowledge base.

Retrieval Augmented Generation (RAG) is a framework that provides an integration with both llama-index and Langchain, enabling developers to easily integrate RAG into their standard workflow. It is designed for settings where reference answers may not be available, and aims to estimate different proxies for correctness beyond the usefulness of retrieved passages.

LLM Agents are LLMs that use retrieval augmented strategies in combination with other models, or are only available through APIs. They often require significant tuning, as the overall performance will be affected by various factors such as the retrieval model, considered corpus, LM, or prompt formulation.

Question:  quit


## PDF Summarization with LLamaIndex and Ollama

In [52]:
from llama_index.llms.ollama import Ollama

In [None]:
import os
from dataclasses import dataclass
from typing import List, Dict, Optional
from pathlib import Path

import ollama
from llama_index.core import SimpleDirectoryReader, Document, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

@dataclass
class SummaryResult:
    """Data class to store summary results for a single document"""
    filename: str
    stuff_summary: Optional[str] = None
    map_reduce_summary: Optional[str] = None
    refine_summary: Optional[str] = None

class PDFSummarizer:
    def __init__(
        self, 
        model_name: str = "llama3.2:latest", 
        embedding_model_name: str = "BAAI/bge-small-en-v1.5"
    ):
        """
        Initialize the PDF Summarizer with Ollama LLM and HuggingFace embeddings
        
        Args:
            model_name (str): Name of the Ollama language model to use
            embedding_model_name (str): Name of the HuggingFace embedding model to use
        """
        # Configure Ollama Language Model
        self.llm = Ollama(
            model=model_name,
            request_timeout=300.0
        )
        
        # Configure HuggingFace Embedding
        self.embed_model = HuggingFaceEmbedding(
            model_name=embedding_model_name
        )
        
        # Update global settings
        Settings.llm = self.llm
        Settings.embed_model = self.embed_model
        Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=128)

    def load_single_pdf(self, pdf_path: str) -> Document:
        """
        Load a single PDF document
        
        Args:
            pdf_path (str): Path to the PDF file
            
        Returns:
            Document: Loaded document
        """
        reader = SimpleDirectoryReader(
            input_files=[pdf_path]
        )
        documents = reader.load_data()
        return documents[0]

    def summarize_stuff(self, document: Document, prompt: str = None) -> str:
        """
        Summarize a single document using the Stuff method
        
        Args:
            document (Document): Document to summarize
            prompt (str, optional): Custom summarization prompt
            
        Returns:
            str: Summary of the document
        """
        from llama_index.core import get_response_synthesizer
        
        default_prompt = (
            "Please provide a comprehensive summary of this document. "
            "Focus on the main points, key findings, and essential information. "
            "The summary should be well-structured and easy to understand."
        )
        
        summarization_prompt = prompt or default_prompt
        
        response_synthesizer = get_response_synthesizer(
            response_mode="stuff",
            text_qa_template=summarization_prompt
        )
        
        response = response_synthesizer.synthesize([document])
        return str(response)

    def summarize_map_reduce(self, document: Document, prompt: str = None) -> str:
        """
        Summarize a single document using the MapReduce method
        
        Args:
            document (Document): Document to summarize
            prompt (str, optional): Custom summarization prompt
            
        Returns:
            str: Summary of the document
        """
        from llama_index.core import get_response_synthesizer
        
        default_prompt = (
            "Analyze this section of the document and provide a clear summary. "
            "Highlight the key points and maintain the context."
        )
        
        summarization_prompt = prompt or default_prompt
        
        response_synthesizer = get_response_synthesizer(
            response_mode="compact",
            text_qa_template=summarization_prompt
        )
        
        response = response_synthesizer.synthesize([document])
        return str(response)

    def summarize_refine(self, document: Document, prompt: str = None) -> str:
        """
        Summarize a single document using the Refine method
        
        Args:
            document (Document): Document to summarize
            prompt (str, optional): Custom summarization prompt
            
        Returns:
            str: Summary of the document
        """
        from llama_index.core import get_response_synthesizer
        
        default_prompt = (
            "Read through this document and create a refined summary. "
            "Start with the main points and progressively add important details."
        )
        
        summarization_prompt = prompt or default_prompt
        
        response_synthesizer = get_response_synthesizer(
            response_mode="refine",
            text_qa_template=summarization_prompt
        )
        
        response = response_synthesizer.synthesize([document])
        return str(response)

    def summarize_pdf(self, pdf_path: str, methods: List[str] = None) -> SummaryResult:
        """
        Summarize a single PDF using specified methods
        
        Args:
            pdf_path (str): Path to the PDF file
            methods (List[str], optional): List of summarization methods to use
                                        ('stuff', 'map_reduce', 'refine')
                                        
        Returns:
            SummaryResult: Object containing summaries for the document
        """
        if methods is None:
            methods = ['stuff', 'map_reduce', 'refine']
            
        document = self.load_single_pdf(pdf_path)
        filename = Path(pdf_path).name
        
        result = SummaryResult(filename=filename)
        
        if 'stuff' in methods:
            result.stuff_summary = self.summarize_stuff(document)
        if 'map_reduce' in methods:
            result.map_reduce_summary = self.summarize_map_reduce(document)
        if 'refine' in methods:
            result.refine_summary = self.summarize_refine(document)
            
        return result

    def summarize_directory(self, 
                          pdf_directory: str, 
                          methods: List[str] = None) -> Dict[str, SummaryResult]:
        """
        Summarize all PDFs in a directory individually
        
        Args:
            pdf_directory (str): Directory containing PDF files
            methods (List[str], optional): List of summarization methods to use
            
        Returns:
            Dict[str, SummaryResult]: Dictionary mapping filenames to their summaries
        """
        pdf_files = [f for f in Path(pdf_directory).glob("*.pdf")]
        results = {}
        
        for pdf_file in pdf_files:
            try:
                result = self.summarize_pdf(str(pdf_file), methods)
                results[result.filename] = result
            except Exception as e:
                print(f"Error processing {pdf_file}: {str(e)}")
                
        return results

def save_summaries(results: Dict[str, SummaryResult], output_dir: str):
    """
    Save summaries to text files in the specified directory
    
    Args:
        results (Dict[str, SummaryResult]): Dictionary of summary results
        output_dir (str): Directory to save the summaries
    """
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)
    
    for filename, result in results.items():
        base_name = Path(filename).stem
        
        if result.stuff_summary:
            with open(output_path / f"{base_name}_stuff.txt", 'w') as f:
                f.write(result.stuff_summary)
                
        if result.map_reduce_summary:
            with open(output_path / f"{base_name}_map_reduce.txt", 'w') as f:
                f.write(result.map_reduce_summary)
                
        if result.refine_summary:
            with open(output_path / f"{base_name}_refine.txt", 'w') as f:
                f.write(result.refine_summary)

In [None]:
def main():
    # Example usage
    pdf_directory = "./documents"  # Directory containing PDF files
    output_directory = "./summaries"  # Directory to save summaries
    
    # Create summarizer instance with HuggingFace embeddings
    summarizer = PDFSummarizer(
        model_name="llama3.2:latest",
        embedding_model_name="BAAI/bge-small-en-v1.5"  # HuggingFace embedding model
    )
    
    # Specify which methods to use
    methods = ['stuff', 'map_reduce', 'refine']
    
    # Process all PDFs in the directory
    results = summarizer.summarize_directory(pdf_directory, methods)
    
    # Save summaries to files
    save_summaries(results, output_directory)
    
    # Print summaries to console
    for filename, result in results.items():
        print(f"\n=== Summaries for {filename} ===")
        
        if result.stuff_summary:
            print("\n--- Stuff Method Summary ---")
            print(result.stuff_summary)
            
        if result.map_reduce_summary:
            print("\n--- MapReduce Method Summary ---")
            print(result.map_reduce_summary)
            
        if result.refine_summary:
            print("\n--- Refine Method Summary ---")
            print(result.refine_summary)

if __name__ == "__main__":
    main()