## Connect and compare LLMs

- Connect to the grok API and choose a model
- Connect to the Gemini API and choose a model
- Connect to the OpenAI API and choose 4o-mini

### Create a prompt and inject a little text snippet of your liking
- The LLM should use the injected information to answer a question.

### Compare the outputs
- Use the same method for every model.
- Do you see differences?

**Tipp:** for prompt injection you can either use string concatenation or the python String formatter. 

In [4]:
! pip install google.generativeai
! pip install openai
! pip install groq
! pip install dotenv
import google.generativeai as genai
from openai import OpenAI
from groq import Groq
import os
from dotenv import load_dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Collecting dotenv
  Downloading dotenv-0.9.9-py2.py3-none-any.whl.metadata (279 bytes)
Collecting python-dotenv (from dotenv)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading do

In [5]:
load_dotenv()
# Access the API key using the variable name defined in the .env file
google_api_key = os.getenv("GOOGLE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
groq_api_key = os.getenv("GROQ_API_KEY")

In [12]:
from dotenv import load_dotenv
import os

load_dotenv(dotenv_path="env/.env")  # ← wichtig!
google_api_key = os.getenv("GOOGLE_API_KEY")




In [15]:
import google.generativeai as genai

genai.configure(api_key=google_api_key)

models = genai.list_models()
for model in models:
    print(model.name, "-", model.supported_generation_methods)


models/chat-bison-001 - ['generateMessage', 'countMessageTokens']
models/text-bison-001 - ['generateText', 'countTextTokens', 'createTunedTextModel']
models/embedding-gecko-001 - ['embedText', 'countTextTokens']
models/gemini-1.0-pro-vision-latest - ['generateContent', 'countTokens']
models/gemini-pro-vision - ['generateContent', 'countTokens']
models/gemini-1.5-pro-latest - ['generateContent', 'countTokens']
models/gemini-1.5-pro-001 - ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-pro-002 - ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-pro - ['generateContent', 'countTokens']
models/gemini-1.5-flash-latest - ['generateContent', 'countTokens']
models/gemini-1.5-flash-001 - ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-flash-001-tuning - ['generateContent', 'countTokens', 'createTunedModel']
models/gemini-1.5-flash - ['generateContent', 'countTokens']
models/gemini-1.5-flash-002 - ['generateContent

## Google

https://ai.google.dev/gemini-api/docs/quickstart?hl=de&lang=python
examples: https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/System_instructions.ipynb?hl=de#scrollTo=WxiIfsbA0WdH

In [16]:
import google.generativeai as genai
from dotenv import load_dotenv
import os

load_dotenv(dotenv_path="env/.env")
google_api_key = os.getenv("GOOGLE_API_KEY")

genai.configure(api_key=google_api_key)

model = genai.GenerativeModel("models/gemini-1.5-pro")  # ← wichtig: vollständiger Name
response = model.generate_content("What is Retrieval-Augmented Generation?")

In [17]:

print(response.text)

Retrieval-Augmented Generation (RAG) is a technique in natural language processing (NLP) that combines the strengths of information retrieval (IR) systems with the generative capabilities of large language models (LLMs).  Instead of relying solely on the knowledge encoded within the LLM's parameters, RAG allows the model to access and process external information relevant to a given prompt, leading to more accurate, up-to-date, and comprehensive responses.

Here's a breakdown of how it works:

1. **Retrieval:** Given a user prompt, a relevant document or set of documents is retrieved from an external knowledge base. This knowledge base can be anything from a specific dataset, a collection of web pages, or even a codebase.  The retrieval process typically involves techniques like keyword search, semantic search, or embedding-based similarity matching.

2. **Augmentation:**  The retrieved document(s) are then used to augment the prompt provided to the LLM. This can be done in several way

# Openai

In [20]:
import openai
from dotenv import load_dotenv
import os

load_dotenv(dotenv_path="env/.env")
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.chat.completions.create(
    model="gpt-4o",  # Oder "gpt-3.5-turbo"
    messages=[
        {
            "role": "user",
            "content": "What are the benefits of using retrieval-augmented generation?",
        }
    ]
)




In [22]:
print(response.choices[0].message.content)

Retrieval-augmented generation (RAG) combines retrieval techniques with generative models to produce more accurate and contextually relevant outputs. Here are the benefits of using RAG:

1. **Improved Relevance**: By integrating retrieval mechanisms, RAG leverages external knowledge bases to enhance the contextual relevance of responses. This reduces the risk of generating inaccurate or outdated information, as the model can access up-to-date or domain-specific data.

2. **Increased Accuracy**: By accessing a large body of documents or datasets during generation, RAG models can produce more factually accurate results. This is particularly valuable for tasks requiring precise information or real-time data.

3. **Reduced Memory Load**: Generative models, on their own, require large memory and computational resources to store extensive knowledge. RAG shifts part of this requirement to external retrieval systems, making the overall architecture more efficient and scalable.

4. **Enhanced F

## Groq
https://console.groq.com/docs/quickstart

goal: llama-3.3-70b-versatile


In [23]:
import openai
import os
from dotenv import load_dotenv

load_dotenv(dotenv_path="env/.env")
groq_api_key = os.getenv("GROQ_API_KEY")

client = openai.OpenAI(
    api_key=groq_api_key,
    base_url="https://api.groq.com/openai/v1"  # wichtig: Groq verwendet die OpenAI-kompatible API
)

response = client.chat.completions.create(
    model="llama3-8b-8192",  # Alternativ: "llama3-70b-8192" oder "mixtral-8x7b-32768"
    messages=[
        {
            "role": "user",
            "content": "What are the key limitations of retrieval-augmented generation?",
        }
    ]
)

print(response.choices[0].message.content)


Retrieval-augmented generation (RAG) is a type of language model that combines the strengths of traditional sequence-to-sequence models with the capabilities of retrieval-based models. While RAG has shown promising results in various natural language processing tasks, it also has some key limitations. Here are some of the main limitations of RAG:

1. **Dependence on stored knowledge**: RAG relies heavily on the quality and diversity of the stored knowledge base. If the knowledge base is limited or biased, it can negatively impact the performance of the model.
2. **Oversmoothing**: RAG can suffer from oversmoothing, where the generated text becomes too similar to the retrieved text, resulting in a loss of creativity and diversity.
3. ** Limited contextual understanding**: RAG models may not fully understand the contextual nuances of the input text, which can lead to inaccuracies or incorrect outputs.
4. **Inability to handle out-of-vocabulary (OOV) words**: RAG models can struggle with 