# üß† Notebook 1.0 ‚Äî Introduction to RAG (local with Ollama)

Welcome to the first module of the **AI Agents with RAG** course!

**Objectives**
- ‚úÖ What a RAG system is
- ‚úÖ Difference between a ‚Äúvanilla‚Äù LLM and RAG
- ‚úÖ Build your first **local** RAG with **LangChain + Ollama + Chroma**

---


## 0.0 üß© Prerequisites ‚Äî Install Ollama **(from Terminal/PowerShell)**

Run **only once**, before opening the notebook

- From Terminal/PowerShell
- Inside your working environment

---

### ‚ûú **MacOS (Homebrew)**

In [4]:
# bash
#!brew install --cask ollama
!ollama --version

ollama version is 0.5.7


### ‚ûú **Windows (PowerShell)**

In [7]:
#bash
#!curl -fsSL https://ollama.com/install.sh | sh

# Start the service in the current session:
#!ollama serve

If the model loads correctly, Ollama is working üéâ

## 1.0 üìö What is RAG?

**RAG = Retrieval Augmented Generation**

It is a technique that allows an LLM to **respond based on external documents**, instead of relying solely on its internal training.

Useful when:
- you want *accurate* answers based on your documents
- you want to avoid hallucinations
- you need updated information

---

In [12]:
# powershell / terminal
#$env:Path += ";C:\Users\<TUO_UTENTE>\AppData\Local\Programs\Ollama"
#ollama --version


A standard LLM **does not know your documents**, unless they were in its training set.

Therefore:
- it invents citations
- it hallucinates dates
- it produces generic answers

---

### üîç What RAG adds

**RAG pipeline:**
1. Split documents into chunks
2. Convert chunks into embeddings
3. Save in a vector database
4. When you ask a question ‚Üí retrieve the most relevant chunks
5. Pass question + retrieved chunks to the LLM

---

In [17]:
!ollama pull llama3.2:3b     # lighter, recommended for less powerful laptops
#ollama pull llama3.1:8b     # higher quality, requires more powerful machines

# Embedding model for retrieval
!ollama pull nomic-embed-text

Error: accepts 1 arg(s), received 8
[?25lpulling manifest √¢¬†‚Äπ [?25h[?25l[2K[1Gpulling manifest √¢¬†‚Ñ¢ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬π [?25h[?25l[2K[1Gpulling manifest √¢¬†¬∏ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬º [?25h[?25l[2K[1Gpulling manifest √¢¬†¬¥ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬¶ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬ß [?25h[?25l[2K[1Gpulling manifest √¢¬†‚Ä° [?25h[?25l[2K[1Gpulling manifest √¢¬†ÔøΩ [?25h[?25l[2K[1Gpulling manifest √¢¬†‚Äπ [?25h[?25l[2K[1Gpulling manifest √¢¬†‚Ñ¢ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬π [?25h[?25l[2K[1Gpulling manifest √¢¬†¬∏ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬º [?25h[?25l[2K[1Gpulling manifest √¢¬†¬¥ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬¶ [?25h[?25l[2K[1Gpulling manifest √¢¬†¬ß [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% √¢‚Äì‚Ä¢√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ√¢‚ÄìÀÜ

### 2.1 üìÅ Load documents (txt / pdf / web page‚Ä¶)

Load any text file you want to query.

Below: we load a sample file from disk.

---

In [21]:
!pip -q install langchain langchain-community langchain-text-splitters chromadb ollama

Choose chunk size and overlap so the model can read coherent portions.

Typical values:
- **chunk_size = 500‚Äì1500** characters
- **chunk_overlap = 50‚Äì150**

In [24]:
from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model="llama3.2:3b", temperature=0)
print(llm.invoke("Just say: ok, I'm ready!").content)

  llm = ChatOllama(model="llama3.2:3b", temperature=0)


Ok, I'm ready!


This creates the vector database where all embeddings will be stored.

If the notebook is restarted, Chroma will persist data to the folder.

## 2.4 üîç Build the retriever

Retrieves the **k most relevant chunks**.

Typical values: `k = 3` or `4`

In [28]:
# Here is our "external source of information" (it could be text from pdfs, word docs, etc..)
docs = [
    "TechCorp is a company that develops software for the healthcare sector.",
    "TechCorp's sales department uses Salesforce as its CRM.",
    "The cloud infrastructure is based on AWS.",
    "TechCorp offers 24/7 technical support for enterprise clients.",
    "The vacation policy provides 25 days per year for each employee.",
    "The mobile application is developed in Flutter and updated monthly."
]

Here we combine:
- the retriever (which extracts the right documents)
- the LLM (which generates the final answer)

In [32]:
from langchain_community.chat_models import ChatOllama

# Pick the model previously pulled (es. llama3.2:3b o llama3.1:8b)
llm = ChatOllama(model="llama3.2:3b", temperature=0)

question = "How many days off can one take at TechCorp?"
response = llm.invoke(question).content
print("Vanilla LLM:", response)

Vanilla LLM: I don't have specific information about the policies of TechCorp, as I'm a large language model, I don't have access to that kind of data. However, I can provide some general information about typical vacation and leave policies in the tech industry.

In many companies, including tech corporations, employees are typically entitled to a certain number of paid vacation days, sick leave, and personal days per year. The exact number of days off can vary widely depending on factors such as job title, location, and company size.

Some common policies include:

* Annual vacation time: 10-20 days
* Sick leave: 5-10 days
* Personal days: 5-10 days
* Holidays: 8-12 paid holidays per year

It's worth noting that these are just general estimates, and actual policies can vary significantly from company to company. If you're interested in knowing the specific policies of TechCorp, I would recommend checking your employee handbook or speaking with HR directly.


> *`AttributeError: 'ChatOllama' object has no attribute 'predict'`*

- With modern APIs, use `.invoke()` (as in the examples).

In [42]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from langchain.chains import RetrievalQA

# 1) Text splitting
documents = [
    Document(page_content=text, metadata={"source": f"Doc_{i+1}"})
    for i, text in enumerate(docs)
]

# Splitter
splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
splitted = splitter.split_documents(documents)

# 2) Embeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")

# 3) Vector store
vectorstore = Chroma.from_documents(splitted, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 4) RetrievalQA
qa_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=(
        "You are an assistant that answers ONLY using the information in the DOCUMENTS.\n"
        "If the answer is not present, clearly say: 'Not present in the documents.'\n\n"
        "DOCUMENTS:\n{context}\n\n"
        "Question: {question}\n"
        "Concise answer in English:"
    ),
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,                           # definito nella Sezione 4 (ChatOllama)
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": qa_prompt},
)


  embeddings = OllamaEmbeddings(model="nomic-embed-text")


## 6. ‚öñÔ∏è Compare: LLM vs RAG

In [44]:
questions = [
    "How many vacation days does TechCorp's policy provide?",
    "What is the language used for the mobile app?",
    "Who provides the cloud infrastructure?",
    "What tool does the sales department use?"
]

for q in questions:
    print(f"\nQuestion: {q}")
    # LLM Vanilla (senza documenti)
    print("‚Üí LLM Vanilla:", llm.invoke(q).content)

    # RAG (retrieval + generazione)
    out = qa_chain.invoke({"query": q})
    print("‚Üí RAG:", out["result"])

    # Mostra le fonti trovate dal retriever
    for i, s in enumerate(out.get("source_documents", []), 1):
        print(f"   [{i}] {s.metadata.get('source','?')} | {s.page_content[:80]}...")



Question: How many vacation days does TechCorp's policy provide?
‚Üí LLM Vanilla: I don't have information about TechCorp's specific vacation day policy. If you're looking for details on a company's benefits or policies, I recommend checking their official website or HR department for the most accurate and up-to-date information.
‚Üí RAG: 25 days.
   [1] Doc_5 | The vacation policy provides 25 days per year for each employee....
   [2] Doc_4 | TechCorp offers 24/7 technical support for enterprise clients....
   [3] Doc_1 | TechCorp is a company that develops software for the healthcare sector....

Question: What is the language used for the mobile app?
‚Üí LLM Vanilla: I can't determine which specific mobile app you are referring to. There are many different apps with various programming languages used in their development. If you could provide more information about the app, such as its name or category, I may be able to help you better.
‚Üí RAG: Flutter.
   [1] Doc_6 | The mobile ap

## 7. üß™ Exercises

Here's the English translation:
-Add a new document to the knowledge base and ask a question again.
-Change the chunking parameters and evaluate the impact on the responses.
-Try using another OpenAI model with different temperature settings.


## 8. ‚úÖConclusions & Next Steps

You've built your first RAG system! üéâ

‚úÖ Understood the difference between Vanilla LLM and RAG
‚úÖ Implemented a local RAG with Ollama (LLM + embeddings) and Chroma
‚úÖ Tested real questions on company data and cited the sources of the chunks

üìå Next step: build your own richer company dataset and implement source citation.


## 9) üß∞ Troubleshooting

>**Windows: `ollama` not recognized in the notebook**

Make sure it works in PowerShell; if needed, temporarily add it to PATH (see "Prerequisites" section).

>**Ollama connection error** (*connection refused* / *Failed to connect*)

- Start the service and verify:
    - macOS/Win: `ollama --version`, then `ollama list`
    - Linux: `ollama serve` (if not already running)


>**model not found**  
- Run `ollama pull llama3.2:3b` (LLM) and `ollama pull nomic-embed-text` 

>**ImportError** (`langchain_text_splitters` o similar)  
- Re-install pkgs

In [None]:
!pip install -q langchain langchain-community langchain-text-splitters chromadb ollama

>*`AttributeError: 'ChatOllama' object has no attribute 'predict'`*

- Current API uses `.invoke()` (as used in the examples).

>**RAG responds in a generic/incorrect way**  
- 1. Decrease/increase `k` (e.g. 3‚Üí4) 
- 2. Review `chunk_size/overlap` 
- 3. Print the retrieved documents to understand what it's reading:

In [None]:
q = "How many days off can be taken at TechCorp?"
for i, d in enumerate(retriever.get_relevant_documents(q), 1):
    print(f"[{i}] {d.metadata['source']} | {d.page_content}")