[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aiembassy/workshop-rag-haystack/blob/master/notebooks/01-using-llms.ipynb)

In [None]:
!pip install "haystack-ai" \
    "huggingface-hub" 

# Using LLMs in Haystack

One of the benefits of using frameworks like Haystack for all the interactions with LLMs is that it provides a unified interface for different models. This way, you can switch between models without changing the code
too much.

Selecting the right LLM for a task is a challenge per se. Obviously, OpenAI or Anthropic models are easy to start with, but building a fully private LLM can't be done with commercial models. HuggingFace provides a hub of permissive models that can be used for free, and their Serverless Inference API is a good way to try them out.

**Hint:** please have a look at the list of the [warm text generation models](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending&inference=warm), so you don't have to wait to spin up a new instance.

In [None]:
from haystack.utils import Secret
from haystack.components.generators import HuggingFaceAPIGenerator

generator = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "microsoft/Phi-3.5-mini-instruct"},
    token=Secret.from_token("HF_TOKEN"),
)

In [None]:
response = generator.run(prompt="Which continent has the most countries?")
print(response["replies"][0])

## Local models

Although it's possible to run HuggingFace models inside the application, by using `HuggingFaceLocalGenerator`, it's not recommended for production use.

In [None]:
from haystack.components.generators import HuggingFaceLocalGenerator

local_generator = HuggingFaceLocalGenerator(
    model="distilbert/distilgpt2",  # The smallest generative model available on HF
    token=Secret.from_token("HF_TOKEN"),
)
local_generator.warm_up()

In [None]:
response = local_generator.run(prompt="Which continent has the most countries?")
print(response["replies"][0])

There are other tools intended to deploy LLMs, like Ollama or Llama.cpp, and both might be used in Haystack applications.

- [`OllamaGenerator`](https://docs.haystack.deepset.ai/docs/ollamagenerator)
- [`LlamaCppGenerator`](https://docs.haystack.deepset.ai/docs/llamacppgenerator)
