# Build with Haystack 2.0.x

<a target="_blank" href="https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/gemma_chat_rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" width="200" alt="Open In Colab"/>
</a>

<img src="https://huggingface.co/blog/assets/gemma/Gemma-logo-small.png" width="200" style="display:inline;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;



<img src="https://meetcody.ai/wp-content/webp-express/webp-images/doc-root/wp-content/uploads/2023/07/LlamaCover-1-1151x648.png"  width="280" style="display:inline;">

<img src="https://haystack.deepset.ai/images/haystack-ogimage.png" width="280" style="display:inline;">



We will see what we can build a RAG-based system with the [Haystack LLM framework](https://haystack.deepset.ai/).

>>[Installation](#scrollTo=1LRwBMJdF_d1)

>>[Authorization](#scrollTo=Lh60ZvTdGDdh)

>>[Chat with Gemma (travel assistant) 🛩](#scrollTo=rfW8gRwpGZjc)

>>[RAG with Gemma (about Ecuador) 🎸](#scrollTo=7XAtaoEiHE6B)

>>>[Load data from Wikipedia](#scrollTo=TCeqQB3kHqcz)

>>>[Indexing Pipeline](#scrollTo=h1fvmgsZH0i8)

>>>[RAG Pipeline](#scrollTo=Qk8v_s8xIdLV)

>>>[Let's ask some questions!](#scrollTo=DrKccbWeMyjB)



## Installation

In [1]:
!pip install haystack-ai==2.2.4 "transformers>=4.43.1" sentence-transformers accelerate bitsandbytes # 2.1.2

Collecting requests (from haystack-ai==2.2.4)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: requests
  Attempting uninstall: requests
    Found existing installation: requests 2.3.0
    Uninstalling requests-2.3.0:
      Successfully uninstalled requests-2.3.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
wikipedia 1.3.1 requires requests==2.3.0, but you have requests 2.32.3 which is incompatible.[0m[31m
[0mSuccessfully installed requests-2.32.3


## Authorization

- you need an Hugging Face account
- you need to accept Google conditions here: https://huggingface.co/google/gemma-7b-it and wait for the authorization

- you need to accept Meta conditions here: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct and wait for the authorization

In [2]:
# OPCION 1
#import getpass, os
#os.environ["HF_API_TOKEN"] = getpass.getpass("#Your Hugging Face token")

# OPCION 2:
#from huggingface_hub import notebook_login
#notebook_login()

# OPCION 3: Trabajar con claves secretas (antes otorgar acceso)
from google.colab import userdata
HF_API_TOKEN = userdata.get('HF_TOKEN')

<b> Main clases of the Haystack Framework:</b>

- HuggingFaceTGIGenerator (solo generación, interactions cortas): This component is designed for text generation, not for chat. If you want to use these LLMs for chat, use HuggingFaceTGIChatGenerator instead.
- HuggingFaceTGIChatGenerator: para conversaciones: HuggingFaceTGIChatGenerator enables chat completion using Hugging Face Hub-hosted chat-based LLMs.
-  HuggingFaceLocalGenerator: HuggingFaceLocalGenerator provides an interface to generate text using a Hugging Face model that runs locally.

<b> LLMs:</b>
- Google/gemma-1.1-2b-it ---> OK
- meta-llama/Meta-Llama-3-8B-Instruct ---> OK



## Chat without a knowledge base

First, we call the model using the free Hugging Face Inference API with the `HuggingFaceTGIChatGenerator`.

(We might also load it in Colab using the `HuggingFaceLocalChatGenerator` in a quantized version).

In [3]:
from haystack.components.generators.chat import HuggingFaceTGIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

#  "google/gemma-2b-it", "google/gemma-1.1-2b-it"

def chatGenerator(model="google/gemma-1.1-2b-it", max_new_tokens=350):
  return HuggingFaceTGIChatGenerator(
    model=model,
    generation_kwargs={"max_new_tokens": max_new_tokens})


In [4]:
#google/gemma-1.1-2b-it -> SI
chatGenerator = chatGenerator(model = "meta-llama/Meta-Llama-3-8B-Instruct") # To call the model using the free Hugging Face Inference API with the HuggingFaceTGIChatGenerator
chatGenerator.warm_up() # If the url is not provided, check if the model is deployed on the free tier of the HF inference API. Load the tokenizer


To chat with the agent

In [5]:
userQuestions = ["Who is the Ecuador's President", "Quién es el presidente actual de Ecuador"]
messages = []
for msg in userQuestions:
  messages.append(ChatMessage.from_user(msg))
  response = chatGenerator.run(messages=messages)
  assistant_resp = response['replies'][0]
  print("🤖 "+assistant_resp.content)
  messages.append(assistant_resp)


# Para trabajar de forma iterativa con el agente usar, de forma alternativa, el siguiente código:
#messages = []

#while True:
#  msg = input("Enter your message or Q to exit\n🧑 ")
#  if msg=="Q":
#    break
#  messages.append(ChatMessage.from_user(msg))
#  response = generator.run(messages=messages)
#  assistant_resp = response['replies'][0]
#  print("🤖 "+assistant_resp.content)
#  messages.append(assistant_resp)

🤖 assistant

As of my knowledge cutoff in 2021, the President of Ecuador is Guillermo Lasso. He is a businessman and politician who has been serving as the President of Ecuador since May 24, 2021.
🤖 assistant

Actualmente, el presidente de Ecuador es Guillermo Lasso.




## Chat with a knowledge base (RAG)

### To create the knowledge base from Wikipedia

In [6]:
#!pip install wikipedia
!pip install wikipedia==1.3.1

Collecting requests==2.3.0 (from wikipedia==1.3.1)
  Using cached requests-2.3.0-py2.py3-none-any.whl.metadata (25 kB)
Using cached requests-2.3.0-py2.py3-none-any.whl (452 kB)
Installing collected packages: requests
  Attempting uninstall: requests
    Found existing installation: requests 2.32.3
    Uninstalling requests-2.32.3:
      Successfully uninstalled requests-2.32.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 1.25.0 requires requests>=2.27.1, but you have requests 2.3.0 which is incompatible.
cachecontrol 0.14.1 requires requests>=2.16.0, but you have requests 2.3.0 which is incompatible.
google-api-core 2.19.2 requires requests<3.0.0.dev0,>=2.18.0, but you have requests 2.3.0 which is incompatible.
google-cloud-bigquery 3.25.0 requires requests<3.0.0dev,>=2.21.0, but you have requests 2.3.0 which is incompatible.
google-cloud-stor

### Load data from Wikipedia

In [7]:
sources="""2024 Ecuadorian conflict
Daniel Noboa
José Adolfo Macías Villamar
Los Choneros
Crime in Ecuador
Ecuadorian security crisis
""".split("\n")

In [8]:
from IPython.display import Image
from pprint import pprint
import rich
import random

In [9]:
# Get the content from Wikipedia:
import wikipedia
from haystack.dataclasses import Document # Haystack 2.0 uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines

raw_docs=[]

for title in sources:
    print(title)
    page = wikipedia.page(title=title, auto_suggest=False)
    doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
    raw_docs.append(doc)

2024 Ecuadorian conflict
Daniel Noboa
José Adolfo Macías Villamar
Los Choneros
Crime in Ecuador
Ecuadorian security crisis



KeyError: 'query'

### Indexing Pipeline (of the knowledge base)

In [None]:
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy

In [None]:
document_store = InMemoryDocumentStore()

In [None]:
#!pip install mistral-haystack

# Docs: https://docs.mistral.ai/api/

# To request the API KEY: https://console.mistral.ai/
# Free

#List of embedders of Haystack: https://docs.haystack.deepset.ai/docs/embedders
# https://www.sbert.net/docs/sentence_transformer/pretrained_models.html

In [None]:
#Definition of the pipeline:

#Opción 1 (no):
#from haystack.components.embedders import HuggingFaceAPIDocumentEmbedder, HuggingFaceAPITextEmbedder
#embedder = HuggingFaceAPIDocumentEmbedder(api_type="serverless_inference_api",
#				                                           api_params={"model": "sentence-transformers/multi-qa-distilbert-cos-v1"})

#Opción 2 (no): Error code: 429 - {'message': 'Requests rate limit exceeded'}
#from haystack_integrations.components.embedders.mistral.document_embedder import MistralDocumentEmbedder
#from haystack.utils import Secret
#embedder = MistralDocumentEmbedder(api_key=Secret.from_token(MISTRAL_API_KEY), model="open-mistral-nemo-2407")


from haystack.components.embedders import SentenceTransformersDocumentEmbedder
embedder = SentenceTransformersDocumentEmbedder(model="distiluse-base-multilingual-cased-v1")

indexing = Pipeline()
indexing.add_component("cleaner", DocumentCleaner())
indexing.add_component("splitter", (split_by='sentence', split_length=3))
indexing.add_component("embedder", embedder)
indexing.add_component("writer", DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE))

indexing.connect("cleaner", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("splitter", "writer")

In [None]:
indexing.run({"cleaner":{"documents":raw_docs}}) # Run the pipeline with the content extracted from Wikipedia

In [None]:
len = document_store.count_documents() # Number of documents in the document store

#document_store.filter_documents()[0].meta
docs_dict = []

for i in range(len):
  docs_dict.append(document_store.filter_documents()[i].to_dict()) # Embedding of the first document

import pandas as pd

data = pd.DataFrame(docs_dict)
data.sample(5)
data2 = data[['id', 'content', 'embedding', 'title', 'url', 'source_id', ]]
data2.to_csv("corpusSE.csv", index = False)

### RAG Pipeline

In [None]:
from haystack.components.builders import PromptBuilder

prompt_template = """
<start_of_turn>user
Using the information contained in the context, give a comprehensive answer to the question.
If the answer is contained in the context, also report the source URL.
If the answer cannot be deduced from the context, do not give an answer.

Context:
  {% for doc in documents %}
  {{ doc.content }} URL:{{ doc.meta['url'] }}
  {% endfor %};
  Question: {{query}}<end_of_turn>

<start_of_turn>model
"""
prompt_builder = PromptBuilder(template=prompt_template)

Here, we use the `HuggingFaceTGIGenerator` since it is not a chat setting and we don't envision multi-turn conversations but just RAG.

To check documentation: https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgigenerator

In [None]:
from haystack.components.generators import HuggingFaceTGIGenerator
# jace: google/gemma-7b-it", "mistralai/Mistral-7B-v0.1"

def generatorRAG(model="google/gemma-1.1-2b-it", max_new_tokens=500):
  return HuggingFaceTGIGenerator(
    model=model,
    generation_kwargs={"max_new_tokens": max_new_tokens})

generatorRAG = generatorRAG(model="meta-llama/Meta-Llama-3-8B-Instruct")

In [None]:
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever

rag = Pipeline()
rag.add_component("retriever", InMemoryBM25Retriever(document_store=document_store, top_k=5))
rag.add_component("prompt_builder", prompt_builder)
rag.add_component("llm", generatorRAG)

rag.connect("retriever.documents", "prompt_builder.documents")
rag.connect("prompt_builder.prompt", "llm.prompt")

### Let's ask some questions!

In [None]:
def get_generative_answer(query):

  results = rag.run({
      "retriever": {"query": query},
      "prompt_builder": {"query": query}
    }
  )

  answer = results["llm"]["replies"][0]
  rich.print(answer)

In [None]:
get_generative_answer("What did happend on Ecuador during On 9 January 2024?")


In [None]:
get_generative_answer("Who is the president of Ecuador?")

This is a simple demo.
We can improve the RAG Pipeline using better retrieval techniques: Embedding Retrieval, Hybrid Retrieval...

(*Notebook by [Stefano Fiorucci](https://github.com/anakin87)*)