<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/llm/groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
!pip install llama-index-llms-groq

Collecting llama-index-llms-groq
  Downloading llama_index_llms_groq-0.1.4-py3-none-any.whl.metadata (2.2 kB)
Collecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-llms-groq)
  Downloading llama_index_core-0.10.62-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-llms-openai-like<0.2.0,>=0.1.3 (from llama-index-llms-groq)
  Downloading llama_index_llms_openai_like-0.1.3-py3-none-any.whl.metadata (753 bytes)
Collecting dataclasses-json (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-groq)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-groq)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-groq)
  Downloading dirtyjson-1.0.8-py3-none-any.whl.metadata (11 kB)
Collecting httpx (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-groq)
  Do

In [None]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.10.62-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.9-py3-none-any.whl.metadata (729 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.13-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.11-py3-none-any.whl.metadata (655 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.2.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.2.7-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48-py3-none-any.whl.metadata (8.5 kB)
Collecting llama-index-multi-modal-llms-openai<0.2.0,>=0.1.3 (from llama-index)
  Downloading llama_index_multi_modal_llms_openai-0.1.9-py3-none-any.w

In [None]:
from llama_index.llms.groq import Groq

Create an API key at the [Groq console](https://console.groq.com/keys), then set it to the environment variable `GROQ_API_KEY`.

```bash
export GROQ_API_KEY=<your api key>
```

Alternatively, you can pass your API key to the LLM when you init it:

In [None]:
llm = Groq(model="llama3-70b-8192", api_key="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")

A list of available LLM models can be found [here](https://console.groq.com/docs/models).

In [None]:
response = llm.complete("Explain the importance of low latency LLMs")

In [None]:
print(response)

Low-latency Large Language Models (LLMs) are crucial in various applications where real-time or near-real-time processing is essential. Here are some reasons why low-latency LLMs are important:

1. **Interactive Systems**: In interactive systems like chatbots, virtual assistants, and conversational AI, low-latency LLMs enable rapid response times, making the interaction feel more natural and human-like. This is particularly important in applications where users expect immediate responses, such as customer support or language translation.
2. **Real-time Decision Making**: In applications like autonomous vehicles, robotics, or medical diagnosis, low-latency LLMs can quickly process and analyze large amounts of data to make timely decisions. This is critical in situations where delayed responses can have serious consequences, such as in emergency response systems or self-driving cars.
3. **Live Streaming and Broadcasting**: Low-latency LLMs can facilitate real-time language translation, s

#### Call `chat` with a list of messages

In [None]:
 from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a instructor of Gen AI Course"
    ),
    ChatMessage(role="user", content="how will you take a class on LLM?"),
]
resp = llm.chat(messages)

In [None]:
print(resp)

assistant: I'd be delighted to take a class on Large Language Models (LLMs)! Here's a rough outline of how I'd structure the class:

**Class Topic:** Large Language Models (LLMs)

**Class Objective:**

* Understand the concept of Large Language Models (LLMs) and their significance in NLP
* Learn about the architecture and training of LLMs
* Explore the applications and limitations of LLMs
* Discuss the future directions and potential risks associated with LLMs

**Class Outline:**

**I. Introduction (10 minutes)**

* Introduce the concept of Large Language Models (LLMs) and their importance in NLP
* Discuss the evolution of language models, from traditional statistical models to modern neural network-based models
* Preview the topics to be covered in the class

**II. Architecture of LLMs (20 minutes)**

* Explain the architecture of transformer-based LLMs, including:
	+ Encoder-decoder structure
	+ Self-attention mechanism
	+ Multi-head attention
	+ Feed-forward neural networks (FFNNs)


### Streaming

Using `stream_complete` endpoint

In [None]:

# Example of streaming completion
response = llm.stream_complete("Explain the applications of Llama Index")

# Print the streaming response
for r in response:
    print(r.delta, end="")

Using `stream_chat` endpoint

In [None]:
# Example of streaming chat
messages = [
    ChatMessage(
        role="system", content="You are an AI instructor"
    ),
    ChatMessage(role="user", content="What is Llama Index?"),
]

# Get the streaming response for chat
resp = llm.stream_chat(messages)

# Print the streaming chat response
for r in resp:
    print(r.delta, end="")