<a href="https://colab.research.google.com/github/bacoco/LLM_train/blob/main/Mistral_7b_instruct_feature_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, `mistralai/Mistral-7B-Instruct-v0.1` uses about 6GB of VRAM and 8.5GB of RAM. I used a T4 instance for this notebook.

In [None]:
!pip install llama-index transformers accelerate bitsandbytes



## Setup

### Data

In [None]:
from llama_index.readers import BeautifulSoupWebReader

url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"

documents = BeautifulSoupWebReader().load_data([url])

### LLM

This should run on a T4 instance on the free tier

In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


llm = HuggingFaceLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.1",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.1",
    query_wrapper_prompt=PromptTemplate("<s>[INST] {query_str} [/INST] </s>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"do_sample": True, "top_k": 5},
    device_map="auto",
)

ModuleNotFoundError: ignored

In [None]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")

### Index Setup

In [None]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

### Helpful Imports / Logging

In [None]:
from llama_index.response.notebook_utils import display_response

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [None]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** OpenAI and Meta differ in a few key ways when it comes to AI tools. OpenAI tends to present its products as productivity tools, simple utilities for getting things done. On the other hand, Meta is in the entertainment business and is building LLMs for its messaging apps. OpenAI's ChatGPT, the large language model, has a voice feature that is currently only available to ChatGPT Plus subscribers, but free users will have access to it in the future. This feature is a big step for AI, as it gives the model a hint of personality. In contrast, Meta has unveiled 28 personality-driven chatbots to be used in its messaging apps, which are inspired by real-life celebrities. While these chatbots may have some novelty value, it remains to be seen how much time users will spend interacting with them. Additionally, Meta is planning to place its AI characters on every major surface of its products, including Facebook pages, Instagram accounts, and Reels. This could fundamentally change the way we think about social networking.

### Refine

In [None]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** OpenAI and Meta differ on AI tools in terms of how they approach their AI development and deployment. OpenAI tends to present its products as productivity tools that are designed to help users get things done efficiently. They focus on the practical use of AI, such as language modeling and image recognition, to improve productivity in various tasks. ChatGPT, one of their products, for example, allows users to interact with a large language model via voice, making it useful for things like language translation and answering questions.

Meta, on the other hand, is in the entertainment business and approaches AI development with a focus on entertainment and user experience. They have found various uses for generative AI, including creating celebrity chatbots for their messaging apps. These chatbots are designed to entertain users and enhance their messaging experience. They have created AI chatbots for celebrities such as MrBeast, Taylor Swift, Charli D’Amelio, Snoop Dogg, and many others. The AI chatbots are designed to be used in social media, and they have their own pages and accounts, and users can message them in the same inbox that they message their friends and family.

Overall, both OpenAI

### Tree Summarize

In [None]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** OpenAI and Meta differ on the approach they take with AI tools. OpenAI tends to present their products as productivity tools, simple utilities for getting things done. On the other hand, Meta, which is in the entertainment business, is building LLMs and has found its own uses for generative AI and voices.

OpenAI's ChatGPT, for example, is designed to be a companion that is warm, empathetic, and available. Users can interact with its large language model via voice or upload images and ask questions. The app is still in the early stages, but it has the potential to be a synthetic companion that provides emotional support and companionship.

Meta, on the other hand, has introduced its own AI-powered chatbots. These chatbots are designed to entertain users and are based on the personalities of celebrities such as Charli D'Amelio, Dwyane Wade, and Snoop Dogg. While the chatbots are an intermediate step, they represent the potential for a digital version of popular celebrities to provide entertainment and emotional support to users.

## Router Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=False
)

response = query_engine.query("What was mentioned about Meta?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** Meta, the parent company of Facebook, announced 28 personality-driven chatbots on Wednesday that will be used in its messaging apps. These chatbots, which are based on generative AI and voices, have been designed to entertain users and provide personalized interactions. They are based on characters voiced by popular celebrities such as Charli D'Amelio, Dwyane Wade, Kendall Jenner, MrBeast, Snoop Dogg, Tom Brady, and Paris Hilton. Meta plans to place these chatbots prominently in its products, and it is not clear yet how successful they will be.

### Multi Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

response = query_engine.query("What was mentioned about Meta? Summarize with any other companies mentioned in the entire document.")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


TypeError: ignored

## SubQuestion Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    verbose=True,
)

response = query_engine.query("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


TypeError: ignored

## SQL Query Engine

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip -O /content/chinook.zip
!unzip /content/chinook.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  298k  100  298k    0     0  1215k      0 --:--:-- --:--:-- --:--:-- 1218k
curl: (3) URL using bad/illegal format or missing URL
Archive:  /content/chinook.zip
  inflating: chinook.db              


In [None]:
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select, column

engine = create_engine("sqlite:////content/chinook.db")

In [None]:
from llama_index import SQLDatabase

sql_database = SQLDatabase(engine)

In [None]:
from llama_index.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    service_context=service_context
)

In [None]:
response = query_engine.query("What are some albums?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** 1. "Black Sabbath" by Black Sabbath
2. "Led Zeppelin I" by Led Zeppelin
3. "Megadeth" by Megadeth
4. "Motley Crue" by Motley Crue
5. "Nirvana" by Nirvana
6. "Pantera" by Pantera
7. "Slayer" by Slayer
8. "Tool" by Tool
9. "The Who" by The Who
10. "Viktor Frankl" by Viktor Frankl

In [None]:
response = query_engine.query("What are some artists? Limit it to 5.")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


NotImplementedError: ignored

This last query should be a more complex join

In [None]:
response = query_engine.query("What are some tracks from the artist AC/DC? Limit it to 3")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** Here are three tracks from the band AC/DC: "For Those About to Rock (We Salute You)", "Put The Finger On You", and "Let's Get It Up".

In [None]:
print(response.metadata['sql_query'])

SELECT t.Name
FROM tracks t
JOIN albums a ON t.AlbumId = a.AlbumId
JOIN artists ar ON a.ArtistId = ar.ArtistId
WHERE ar.Name = 'AC/DC'
LIMIT 3


## Programs

Depending the LLM, you will have to test with either `OpenAIPydanticProgram` or `LLMTextCompletionProgram`

In [None]:
from typing import List
from pydantic import BaseModel

from llama_index.program import OpenAIPydanticProgram, LLMTextCompletionProgram

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [None]:
from llama_index.output_parsers import PydanticOutputParser

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Album),
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

In [None]:
output = program(movie_name="The Shining")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [None]:
print(output)

name='The Shining' artist='Jack Nicholson' songs=[Song(title='The Shining', length_seconds=300), Song(title='All Work and No Play', length_seconds=240), Song(title='Red Rum', length_seconds=450), Song(title='The Overlook Hotel', length_seconds=270)]


## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

In [None]:
from llama_index.agent import OpenAIAgent, ReActAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

It seems tool usage is pretty flakey

In [None]:
response = agent.chat("Hello!")
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


ValueError: ignored

In [None]:
response = agent.chat("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: summary
Action Input: {'text': 'What was mentioned about Meta? How Does it differ from how OpenAI is talked about? '}
[0m

TypeError: ignored