Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, `mistralai/Mistral-7B-Instruct-v0.1` uses about 12GB of VRAM and 8.5GB of RAM. I used a T4-High RAM instance for this notebook.

In [None]:
!pip install git+https://github.com/run-llama/llama_index

Collecting git+https://github.com/run-llama/llama_index
  Cloning https://github.com/run-llama/llama_index to /tmp/pip-req-build-hi1motme
  Running command git clone --filter=blob:none --quiet https://github.com/run-llama/llama_index /tmp/pip-req-build-hi1motme
  Resolved https://github.com/run-llama/llama_index to commit 8b427a7bffa8d359927d162d3ddeb553f93585d7
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [None]:
!pip install transformers accelerate bitsandbytes



## Setup

### Data

In [None]:
from llama_index.readers import BeautifulSoupWebReader

url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"

documents = BeautifulSoupWebReader().load_data([url])

### LLM

This should run on a T4 instance on the free tier

In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


llm = HuggingFaceLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.1",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.1",
    query_wrapper_prompt=PromptTemplate("<s>[INST] {query_str} [/INST] </s>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.2, "top_k": 5, "top_p": 0.95},
    device_map="auto",
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")

### Index Setup

In [None]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

### Helpful Imports / Logging

In [None]:
from llama_index.response.notebook_utils import display_response

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [None]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** OpenAI and Meta differ in their approach to AI tools. OpenAI tends to present its products as productivity tools, simple utilities for getting things done. On the other hand, Meta is in the entertainment business and is building LLMs, and on Wednesday the company revealed that it has found its own uses for generative AI and voices. OpenAI's ChatGPT is a language model that can be interacted with via voice and can be used for a variety of tasks, including answering questions and providing daily affirmations. Meta, on the other hand, is building personality-driven chatbots that can be used in its messaging apps, including characters voiced by celebrities. While OpenAI's ChatGPT is focused on productivity, Meta's chatbots are focused on entertainment and social interaction.

### Refine

In [None]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** OpenAI and Meta differ in their approach to AI tools. OpenAI tends to present its products as productivity tools, simple utilities for getting things done, while Meta is in the entertainment business. However, Meta is also building LLMs and has found its own uses for generative AI and voices. On the other hand, OpenAI's latest updates for ChatGPT include a voice feature that lets you interact with its large language model via voice, and the app feels much more powerful as a mobile app. The voices are earnest, upbeat, and more dynamic than what we are used to with Alexa or the Google assistant. OpenAI's ChatGPT also has a feature that lets you upload images and ask questions about them. The result is that a tool which was already useful for lots of things suddenly became useful for much more.

In addition to this, Meta is also exploring new ways to integrate AI into its social networking products. For example, the company is planning to place its AI characters on every major surface of its products, including Facebook pages and Instagram accounts. These characters will be able to be messaged in the same inbox as friends and family, and it's possible that they will also be able to make Reels

### Tree Summarize

In [None]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** OpenAI and Meta differ in their approach to AI tools. OpenAI tends to present its products as productivity tools, simple utilities for getting things done. On the other hand, Meta is in the entertainment business and is building LLMs, and on Wednesday the company revealed that it has found its own uses for generative AI and voices. OpenAI's ChatGPT is a language model that can be interacted with via voice and can be used for a variety of tasks, including answering questions and providing daily affirmations. Meta, on the other hand, is building personality-driven chatbots that can be used in its messaging apps, including characters voiced by celebrities. While OpenAI's ChatGPT is focused on productivity, Meta's chatbots are focused on entertainment and social networking.

## Router Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=False
)

response = query_engine.query("What was mentioned about Meta?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


OutputParserException: ignored

### Multi Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

response = query_engine.query("What was mentioned about Meta? Summarize with any other companies mentioned in the entire document.")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** Meta was announced to be integrating AI chatbots into its messaging apps, which will be based on the personalities and likenesses of celebrities, including MrBeast, Dwyane Wade, Kendall Jenner, and Paris Hilton. The chatbots are designed for entertainment rather than productivity. The shift towards a more synthetic social network could have implications for how people engage with each other on these platforms. Other companies mentioned in the entire document include OpenAI, Taylor Swift and other celebrities like Charli D'Amelio, Dwyane Wade, Kendall Jenner, MrBeast, Snoop Dogg, Tom Brady, and Paris Hilton.

## SubQuestion Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    verbose=True,
)

response = query_engine.query("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Generated 4 sub questions.
[1;3;38;2;237;90;200m[vector_search] Q: Vector search for 'Meta'
[0m

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;38;2;237;90;200m[vector_search] A: 
Based on the provided context information, a vector search for "Meta" would reveal the following:

* Meta, the company formerly known as Facebook, is actively researching and developing artificial intelligence (AI) technologies.
* Meta has announced a new feature for its messaging apps that uses AI-generated imagery and personality-driven chatbots based on the voices of celebrities.
* Meta has also announced the release of 28 chatbot characters based on the voices of popular celebrities.
* The chatbot feature is designed to be a productivity tool that simplifies various tasks, while the chatbot characters are meant to entertain users in various ways.
* The technology is new, and the company is using this feature as a testing ground before fully incorporating AI chatbots in its social networking platforms.
* The technology has the potential to revolutionize social networking by creating a partially synthetic social network that could be more eng

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;38;2;90;149;237m[summary] A: 
The Verge has recently reported on a development in artificial intelligence (AI) and its impact on consumer internet. OpenAI, a leading AI company, announced the latest updates to its ChatGPT, a language model that can now interact with users via voice. The app, which previously required typing, is now more mobile and allows users to chat with it while walking around. The voice feature, which is currently only available to paying users, has the potential to give ChatGPT a hint of personality and has already been used to provide emotional support.

On the other hand, Meta, another technology company, also revealed its plans to integrate generative AI and voices into its products. The company announced the development of 28 personality-driven chatbots based on celebrity voices, with a brief and often cringeworthy description. While this may have limited appeal, it represents a new chapter in social networking and raises questions about how people may i

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;38;2;11;159;203m[vector_search] A: 
This article, titled "The synthetic social network is coming", discusses the latest updates to OpenAI's chatbot service, which now includes voice functionality. This feature allows users to interact with the chatbot using speech, giving the bot a more human-like personality. The article also talks about Meta's AI characters, which are being developed by the company. These characters are based on the personalities of celebrities and are used in Meta's messaging apps. The article suggests that the use of AI in social networking is evolving and that this technology is likely to become more prominent in the future.
[0m[1;3;38;2;155;135;227m[summary] Q: Summary of OpenAI mentions
[0m

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;38;2;155;135;227m[summary] A: 
OpenAI, a leading company in the development of Artificial Intelligence (AI), recently announced new updates and features for its language model, ChatGPT. These updates include a voice feature that allows users to interact with the app via voice, as well as the ability to upload images and ask questions about them. These updates make the app much more powerful and engaging, with users experiencing the app in a more natural and human way. The voice feature gives ChatGPT a hint of personality, making it seem more empathetic, patient and available to its users. This is the first step in a new era of consumer internet, where AI-powered chatbots and companions become more prevalent and accessible to users, providing emotional support and companionship to those who may be lonely, isolated, or on the margins. Meanwhile, OpenAI's competitors such as Meta, are also developing AI chatbots and voices, with the company launching 28 personality-driven chatbots t

**`Final Response:`** Meta is mentioned as a technology company that is actively researching and developing artificial intelligence (AI) technologies. The company recently announced a new feature for its messaging apps that uses AI-generated imagery and personality-driven chatbots based on the voices of celebrities. The chatbot

## SQL Query Engine

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip -O /content/chinook.zip
!unzip /content/chinook.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  298k  100  298k    0     0  1000k      0 --:--:-- --:--:-- --:--:-- 1001k
curl: (3) URL using bad/illegal format or missing URL
Archive:  /content/chinook.zip
  inflating: chinook.db              


In [None]:
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select, column

engine = create_engine("sqlite:////content/chinook.db")

In [None]:
from llama_index import SQLDatabase

sql_database = SQLDatabase(engine)

In [None]:
from llama_index.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    service_context=service_context
)

In [None]:
response = query_engine.query("What are some albums? Limit it to 5.")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** Here are five popular albums that might interest you: 
1. War by Pink Floyd 
2. Lost, Season 2 by The Lumineers 
3. Sir Neville Marriner: A Celebration by Sir Neville Marriner 
4. Out Of Exile by Drake 
5. Instant Karma: The Amnesty International Campaign to Save Darfur by The Police

In [None]:
response = query_engine.query("What are some artists? Limit it to 5.")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** 1. Aerosmith
2. AC/DC
3. Alice In Chains
4. Anthrax
5. Black Sabbath

This last query should be a more complex join

In [None]:
response = query_engine.query("What are some tracks from the artist AC/DC? Limit it to 3")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


NotImplementedError: ignored

In [None]:
print(response.metadata['sql_query'])

## Programs

Depending the LLM, you will have to test with either `OpenAIPydanticProgram` or `LLMTextCompletionProgram`

In [None]:
from typing import List
from pydantic import BaseModel

from llama_index.program import OpenAIPydanticProgram, LLMTextCompletionProgram

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [None]:
from llama_index.output_parsers import PydanticOutputParser

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Album),
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

In [None]:
output = program(movie_name="The Shining")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


ValidationError: ignored

In [None]:
print(output)

## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

In [None]:
from llama_index.agent import OpenAIAgent, ReActAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

It seems tool usage is pretty flakey

In [None]:
response = agent.chat("Hello!")
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


ValueError: ignored

In [None]:
response = agent.chat("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: summary
Action Input: {'text': 'What was mentioned about Meta? How Does it differ from how OpenAI is talked about? '}
[0m

TypeError: ignored