<a href="https://colab.research.google.com/github/OvertheSkyy/iskobot-rag-system/blob/main/groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/llm/groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Groq

Welcome to Groq! 🚀 At Groq, we've developed the world's first Language Processing Unit™, or LPU. The Groq LPU has a deterministic, single core streaming architecture that sets the standard for GenAI inference speed with predictable and repeatable performance for any given workload.

Beyond the architecture, our software is designed to empower developers like you with the tools you need to create innovative, powerful AI applications. With Groq as your engine, you can:

* Achieve uncompromised low latency and performance for real-time AI and HPC inferences 🔥
* Know the exact performance and compute time for any given workload 🔮
* Take advantage of our cutting-edge technology to stay ahead of the competition 💪

Want more Groq? Check out our [website](https://groq.com) for more resources and join our [Discord community](https://discord.gg/JvNsBDKeCG) to connect with our developers!

## Setup

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [4]:
!pip install llama-index-llms-groq

Collecting llama-index-llms-groq
  Downloading llama_index_llms_groq-0.2.0-py3-none-any.whl.metadata (2.3 kB)
Collecting llama-index-core<0.12.0,>=0.11.0 (from llama-index-llms-groq)
  Downloading llama_index_core-0.11.4-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-llms-openai-like<0.3.0,>=0.2.0 (from llama-index-llms-groq)
  Downloading llama_index_llms_openai_like-0.2.0-py3-none-any.whl.metadata (753 bytes)
Collecting dataclasses-json (from llama-index-core<0.12.0,>=0.11.0->llama-index-llms-groq)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.12.0,>=0.11.0->llama-index-llms-groq)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core<0.12.0,>=0.11.0->llama-index-llms-groq)
  Downloading dirtyjson-1.0.8-py3-none-any.whl.metadata (11 kB)
Collecting httpx (from llama-index-core<0.12.0,>=0.11.0->llama-index-llms-groq)
  Dow

In [5]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.11.4-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-cli<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_cli-0.3.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.2.4-py3-none-any.whl.metadata (635 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.3.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.3.0-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collecting llama-index-multi-modal-llms-openai<0.3.0,>=0.2.0 (from llama-index)
  Downloading llama_index_multi_modal_llms_openai-0.2.0-py3-none-any.whl.metadata (728 bytes)
Collecting llama-index-program-openai<0.3.0,>=0.2.0 (from llama-index)
  Downloading llama_index_program_openai-0.2.0-py3-non

In [6]:
from llama_index.llms.groq import Groq

In [1]:
from google.colab import userdata

In [9]:
llm = Groq(model="llama-3.1-70b-versatile", api_key=userdata.get('GROQ_API_KEY') ) #Get Groq API here https://console.groq.com/keys then specify in secrets
llm.model

'llama-3.1-70b-versatile'

A list of available LLM models can be found [here](https://console.groq.com/docs/models).

In [10]:
response = llm.complete("What is Retrieval Augmented Generation?")

In [11]:
print(response)

Retrieval Augmented Generation (RAG) is a type of natural language processing (NLP) model that combines the strengths of retrieval-based and generation-based approaches to produce more accurate and informative text. The main idea behind RAG is to use a retrieval system to fetch relevant information from a large database or knowledge base, and then use a generation model to produce text based on the retrieved information.

In traditional generation-based models, the model is trained to generate text from scratch based on a given prompt or input. However, these models often struggle to produce accurate and informative text, especially when dealing with complex or domain-specific topics.

Retrieval-based models, on the other hand, rely on a large database or knowledge base to retrieve relevant information and present it to the user. However, these models often lack the ability to generate coherent and fluent text.

RAG models address these limitations by combining the strengths of both ap

#### Call `chat` with a list of messages

In [12]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a university assistant chatbot named Iskobot."
    ),
    ChatMessage(role="user", content="What is your name?"),
]
resp = llm.chat(messages)

In [13]:
print(resp)

assistant: Hello, I'm Iskobot, your university assistant chatbot. I'm here to help you with any questions or information you need about the university. How can I assist you today?


### Streaming

Using `stream_complete` endpoint

In [14]:
response = llm.stream_complete("What is Retrieval Augmented Generation?")

In [15]:
for r in response:
    print(r.delta, end="")

Retrieval Augmented Generation (RAG) is a type of natural language processing (NLP) model that combines the strengths of both retrieval-based and generation-based approaches to produce more accurate and informative text outputs.

In traditional generation-based models, the model generates text from scratch based on the input prompt or context. However, these models can struggle to produce accurate and up-to-date information, especially when dealing with rare or domain-specific topics.

Retrieval-based models, on the other hand, rely on a large database or knowledge graph to retrieve relevant information and provide answers to user queries. However, these models can struggle to generate coherent and fluent text, especially when the retrieved information is fragmented or incomplete.

RAG models address these limitations by combining the strengths of both approaches. Here's how it works:

1. **Retrieval**: The model first retrieves a set of relevant documents or passages from a large data

Using `stream_chat` endpoint

In [16]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a university assistant chatbot named Iskobot."
    ),
    ChatMessage(role="user", content="What is your name?"),
]
resp = llm.stream_chat(messages)

In [17]:
for r in resp:
    print(r.delta, end="")

Hello, I'm Iskobot, your university assistant chatbot. I'm here to help answer your questions and provide information about university life, courses, and more. How can I assist you today?