Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, `HuggingFaceH4/zephyr-7b-beta` uses about 8GB of VRAM and spiked to 14GB of RAM when loading the model, then settled around 5GB. I used a T4 instance for this notebook.

In [None]:
!pip install llama-index transformers accelerate bitsandbytes

Collecting llama-index
  Downloading llama_index-0.8.53.post3-py3-none-any.whl (794 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m794.6/794.6 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.34.1-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m47.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.24.0-py3-none-any.whl (260 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.0/261.0 kB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.1-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
Collecting aiostream<0.6.0,>=0.5.2 (from llama-index)
  Downloading aiostream-0.5.2-py3-none-any.whl (39 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from ll

## Setup

### Data

In [None]:
from llama_index.readers import BeautifulSoupWebReader

url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"

documents = BeautifulSoupWebReader().load_data([url])

### LLM

This should run on a T4 instance on the free tier

In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


def messages_to_prompt(messages):
  prompt = ""
  for message in messages:
    if message.role == 'system':
      prompt += f"<|system|>\n{message.content}</s>\n"
    elif message.role == 'user':
      prompt += f"<|user|>\n{message.content}</s>\n"
    elif message.role == 'assistant':
      prompt += f"<|assistant|>\n{message.content}</s>\n"

  # ensure we start with a system prompt, insert blank if needed
  if not prompt.startswith("<|system|>\n"):
    prompt = "<|system|>\n</s>\n" + prompt

  # add final assistant prompt
  prompt = prompt + "<|assistant|>\n"

  return prompt


llm = HuggingFaceLLM(
    model_name="HuggingFaceH4/zephyr-7b-beta",
    tokenizer_name="HuggingFaceH4/zephyr-7b-beta",
    query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    device_map="auto",
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/643 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [None]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")

Downloading (…)lve/main/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


### Index Setup

In [None]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

### Helpful Imports / Logging

In [None]:
from llama_index.response.notebook_utils import display_response

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [None]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)



**`Final Response:`** In the given context, OpenAI and Meta differ in their approach to using AI tools. OpenAI presents its AI products as productivity tools, while Meta is using AI for entertainment purposes. OpenAI's latest updates for ChatGPT, such as the ability to interact via voice and upload images, make the tool more versatile and useful for various tasks. On the other hand, Meta has revealed its own uses for generative AI and voices, including 28 personality-driven chatbots for its messaging apps. These chatbots are based on popular celebrities like Charli D’Amelio, Dwyane Wade, and Paris Hilton, and are intended for entertainment purposes. While both companies are using AI, OpenAI's focus is on productivity, while Meta's focus is on entertainment.

### Refine

In [None]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

**`Final Response:`** In terms of their approach to AI tools, OpenAI and Meta differ in their focus and intended use cases. While OpenAI presents its AI products, such as ChatGPT, as productivity tools with practical applications, Meta is exploring the entertainment value of AI through the creation of personality-driven chatbots for use in its messaging apps. OpenAI's recent updates to ChatGPT, such as the addition of voice and image capabilities, are presented as ways to make the tool more useful and powerful, while Meta's AI characters are being positioned as social networking entities with Facebook pages, Instagram accounts, and the potential to create Reels. As Meta continues to place its AI characters on every major surface of its products, feeds that were once defined by human connections may become partially synthetic social networks, raising questions about personalization, engagement, and entertainment value.

### Tree Summarize

In [None]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

**`Final Response:`** OpenAI and Meta both are developing AI tools, but they differ in their approach and intended use cases. OpenAI presents its AI products as productivity tools, while Meta is focusing on entertainment and social networking applications. OpenAI's latest updates for ChatGPT, such as voice and image capabilities, are aimed at making the tool more useful and engaging, while Meta is using AI to create personality-driven chatbots for its messaging apps. Both companies are exploring the potential of AI-generated content, but OpenAI's focus is on productivity, while Meta's is on entertainment and social networking.

## Router Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=False
)

response = query_engine.query("What was mentioned about Meta?")

display_response(response)

**`Final Response:`** Meta, a tech company, is building LLMs (large language models) and has revealed the creation of 28 personality-driven chatbots based on popular celebrities. These chatbots will be used in their messaging apps, and Meta plans to place its AI characters on every major surface of its products, including Facebook pages and Instagram accounts. The article suggests that this technology is new enough that celebrities are not yet entrusting their entire personas to Meta, but rather giving people a taste of what it's like to talk to AI versions of themselves before delivering the real thing. The potential for these chatbots seems to have more passing novelty value, but the article raises questions about how many hours people would spend talking to a digital version of Taylor Swift this year, and how much they would pay for the privilege. When the digital versions of celebrities are introduced, the potential seems very real.

### Multi Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

response = query_engine.query("What was mentioned about Meta? Summarize with any other companies mentioned in the entire document.")

display_response(response)

**`Final Response:`** In the given context, it is mentioned that Meta (formerly Facebook) is building large language models (LLMs) and has created 28 personality-driven chatbots for its messaging apps, featuring celebrities such as Charli D’Amelio, Dwyane Wade, and Snoop Dogg. Meta plans to place its AI characters on all major surfaces of its products, including Facebook and Instagram. Other companies mentioned in the article include OpenAI, which is updating its AI language model ChatGPT with voice and image capabilities, Google, which provides the Google assistant, and YouTube, which may create AI Drake fakes. The article raises questions about the potential benefits and drawbacks of these developments, including the possibility of synthetic social networks and the potential for AI to replace human connections.

## SubQuestion Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    verbose=True,
)

response = query_engine.query("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")

display_response(response)

Generated 5 sub questions.
[1;3;38;2;237;90;200m[vector_search] Q: What information is provided about Meta in the given document?
[0m[1;3;38;2;237;90;200m[vector_search] A: The given document provides information about Meta's efforts in developing artificial intelligence and voices. It mentions that Meta is building LLMs (large language models) and has revealed 28 personality-driven chatbots to be used in their messaging apps. Celebrities such as Charli D’Amelio, Dwyane Wade, Kendall Jenner, MrBeast, Snoop Dogg, Tom Brady, and Paris Hilton have lent their voices to these chatbots. The document also mentions that Meta plans to place these AI characters on every major surface of its products, including Facebook pages and Instagram accounts, and users will message them in the same inbox that they message their friends and family. The potential of these AI characters is discussed, and the document raises questions about whether they will feel personalized, engaging, and entertaining or 

**`Final Response:`** In the given document, Meta is discussed in relation to its development of AI characters and chatbots for entertainment purposes. The document mentions that Meta is building large language models (LLMs) and has revealed 28 personality-driven chatbots, voiced by celebrities like Snoop Dogg and Charli D’Amelio, to be used in their messaging apps. These AI characters will be integrated into Meta's social media platforms, including Facebook pages and Instagram accounts, and users will be able to message them in the same inbox as their friends and family. The document raises questions about the potential impact of these AI characters on user engagement and experience.

In contrast, the discussion about OpenAI in the same document focuses on its latest updates to ChatGPT, which include the addition of voice and image capabilities. OpenAI is presenting these updates as productivity tools, and the article suggests that the company's LLMs have potential entertainment value as well. The article also touches on the emotional implications of these updates, as the synthetic social network may lead to the rise of AI companions that are smarter, more patient, and more available than human companions. Overall, the discussion about Meta is more focused on entertainment and celebrity

## SQL Query Engine

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip -O /content/chinook.zip
!unzip /content/chinook.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  298k  100  298k    0     0  1120k      0 --:--:-- --:--:-- --:--:-- 1121k
curl: (3) URL using bad/illegal format or missing URL
Archive:  /content/chinook.zip
  inflating: chinook.db              


In [None]:
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select, column

engine = create_engine("sqlite:////content/chinook.db")

In [None]:
from llama_index import SQLDatabase

sql_database = SQLDatabase(engine)

In [None]:
from llama_index.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    service_context=service_context
)

In [None]:
response = query_engine.query("What are some albums? Limit to 5.")

display_response(response)

**`Final Response:`** Based on the SQL query results, some albums with their corresponding track names, durations, and prices are:

1. "Battlestar Galactica: The Story So Far" with a total duration of 2622250 milliseconds (approximately 43 minutes and 42 seconds) and a price of $1.99.
2. "Occupation / Precipice" with a total duration of 5286953 milliseconds (approximately 88 minutes and 23 seconds) and a price of $1.99.
3. "Exodus, Pt. 1" with a total duration of 2621708 milliseconds (approximately 43 minutes and 48 seconds) and a price of $1.99.
4. "Exodus, Pt. 2" with a total duration of 2618000 milliseconds (approximately 43 minutes and 48 seconds) and a price of $1.99.
5. "Collaborators" with a total duration of 26266

In [None]:
response = query_engine.query("What are some artists? Limit it to 5.")

display_response(response)

**`Final Response:`** Based on the SQL query results, some popular artists across different genres include those in the Rock genre with 1297 tracks, followed by Latin with 579 tracks, Metal with 374 tracks, Alternative & Punk with 332 tracks, and Jazz with 130 tracks. These genres and their respective track counts provide insight into the current trends and preferences in the music industry, and offer a diverse range of artists to explore.

This last query should be a more complex join

In [None]:
response = query_engine.query("What are some tracks from the artist AC/DC? Limit it to 3")

display_response(response)

**`Final Response:`** Some popular tracks by the legendary rock band AC/DC that you might want to check out are "For Those About To Rock (We Salute You)", "Put The Finger On You", and "Let's Get It Up". These songs are part of their discography and have been crowd favorites at their live shows.

In [None]:
print(response.metadata['sql_query'])

SELECT tracks.Name FROM tracks JOIN albums ON tracks.AlbumId = albums.AlbumId JOIN artists ON albums.ArtistId = artists.ArtistId WHERE artists.Name = 'AC/DC' LIMIT 3;


## Programs

Depending the LLM, you will have to test with either `OpenAIPydanticProgram` or `LLMTextCompletionProgram`

In [None]:
from typing import List
from pydantic import BaseModel

from llama_index.program import OpenAIPydanticProgram, LLMTextCompletionProgram

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [None]:
from llama_index.output_parsers import PydanticOutputParser

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Album),
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

In [None]:
output = program(movie_name="The Shining")

ValidationError: ignored

In [None]:
print(output)

## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

In [None]:
from llama_index.agent import OpenAIAgent, ReActAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

Some inputs are hallucinated, causing issues with responses. Likely a better system prompt or tool descriptions could help.

In [None]:
response = agent.chat("Hello!")
print(response)

[1;3;38;5;200mThought: (Implicit) I can answer without any more tools!
Response: Hello! According to the information provided, both OpenAI and Meta are working on developing large language models (LLMs) and adding voices to them. However, their focus and intended use of these models differ. OpenAI is presenting its LLMs as productivity tools, while Meta is using them for entertainment purposes, specifically in its messaging apps. Meta has revealed 28 personality-driven chatbots, featuring celebrities like Charli D’Amelio, Dwyane Wade, and Paris Hilton, which are intended to be used for voice chats. While both companies are exploring the potential of LLMs, OpenAI is focusing on productivity, while Meta is focusing on entertainment.
[0mHello! According to the information provided, both OpenAI and Meta are working on developing large language models (LLMs) and adding voices to them. However, their focus and intended use of these models differ. OpenAI is presenting its LLMs as productivi

In [None]:
response = agent.chat("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")
print(response)

[1;3;38;5;200mThought: I need to use a tool to help me answer this question.
Action: vector_search
Action Input: {'text': 'What was mentioned about Meta? How Does it differ from how OpenAI is talked about?'}
[0m[1;3;34mObservation: In the given context, it is mentioned that both OpenAI and Meta are developing large language models (LLMs) and adding voices to them. However, the focus and purpose of their efforts differ. OpenAI is presenting its products as productivity tools, while Meta is using LLMs for entertainment purposes, specifically in its messaging apps. Meta has also revealed 28 personality-driven chatbots, featuring celebrities like Charli D’Amelio, Dwyane Wade, and Paris Hilton, which are intended to be used for voice chats. The article suggests that this technology is still in its early stages, and it remains to be seen how popular these chatbots will be. Overall, while both companies are exploring the potential of LLMs, OpenAI is focusing on productivity, while Meta is 