<a href="https://colab.research.google.com/github/EdBerg21/AI-Professional-Prompts/blob/main/BAHAIzephyr_7b_beta_feature_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, `HuggingFaceH4/zephyr-7b-beta` uses about 8GB of VRAM and spiked to 14GB of RAM when loading the model, then settled around 5GB. I used a T4 instance for this notebook.

In [1]:
!pip install llama-index transformers accelerate bitsandbytes

Collecting llama-index
  Downloading llama_index-0.9.21-py3-none-any.whl (15.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m50.9 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.3.post2-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
Collecting beautifulsoup4<5.0.0,>=4.12.2 (from llama-index)
  Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.0/143.0 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json (from llama-index)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting deprec

## Setup

### Data

In [2]:
from llama_index.readers import BeautifulSoupWebReader

# url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"
url = "https://news.bahai.org/story/1708/"

documents = BeautifulSoupWebReader().load_data([url])

### LLM

This should run on a T4 instance on the free tier

In [3]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


def messages_to_prompt(messages):
  prompt = ""
  for message in messages:
    if message.role == 'system':
      prompt += f"<|system|>\n{message.content}</s>\n"
    elif message.role == 'user':
      prompt += f"<|user|>\n{message.content}</s>\n"
    elif message.role == 'assistant':
      prompt += f"<|assistant|>\n{message.content}</s>\n"

  # ensure we start with a system prompt, insert blank if needed
  if not prompt.startswith("<|system|>\n"):
    prompt = "<|system|>\n</s>\n" + prompt

  # add final assistant prompt
  prompt = prompt + "<|assistant|>\n"

  return prompt


llm = HuggingFaceLLM(
    model_name="HuggingFaceH4/zephyr-7b-beta",
    tokenizer_name="HuggingFaceH4/zephyr-7b-beta",
    query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    device_map="auto",
)

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [4]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

### Index Setup

In [5]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [6]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

### Helpful Imports / Logging

In [7]:
from llama_index.response.notebook_utils import display_response

In [8]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [9]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("What happens at Westminster?")

display_response(response)



**`Final Response:`** The context information provided describes a special reception at Portcullis House in Westminster that commemorated the centennial anniversary of the National Spiritual Assembly of the Bahá’ís of the United Kingdom. The gathering explored efforts to foster social harmony, with a focus on the important role of cohesive relationships among individuals, communities, and institutions in cultivating a more unified society. The event was hosted by the All-Party Parliamentary Group on the Bahá’í Faith and brought together government officials, civil society leaders, journalists, and representatives of diverse faith communities. The reception featured artistic performances, as well as presentations about the efforts of the Bahá’ís of the United Kingdom to contribute to social progress. Overall, the event highlighted the principle of oneness as a foundational force in shaping these relationships and the critical role of interactions between youth and Bahá’í educational agencies in unlocking the immense potential within young people to contribute to the betterment of society.

### Refine

In [10]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("What was the role of Shirin Taherzadeh at Westminster?")

display_response(response)

**`Final Response:`** At the special reception held at Portcullis House in Westminster, Shirin Taherzadeh, a member of the National Spiritual Assembly, emphasized the importance of the principle of oneness in shaping cohesive relationships among individuals, communities, and institutions. Drawing on Bahá’u’lláh’s teachings, she explained that this unity is not uniformity but a celebration of diversity, essential to the fabric of a peaceful society. Her role at the event was to highlight the foundational force of the principle of oneness in promoting a shared identity that sees all people as members of one human family, as the Bahá’í community in the United Kingdom has been doing for over a century. This collaborative spirit fosters a profound sense of belonging that fuels the youth's desire to contribute to the needs of their neighborhood, nurturing personal growth and fostering a stronger, more cohesive community. These ideas were explored in a video produced for the occasion by the Bahá’í Office of Public Affairs. The reception also featured artistic performances and presentations about the efforts of the Bahá’ís of the United Kingdom to contribute to social progress.

### Tree Summarize

In [11]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("How Bahá’ís celebrate the centennial anniversary of the National Spiritual Assembly of the Bahá’ís of the United Kingdom?")

display_response(response)

**`Final Response:`** The Bahá’ís in the United Kingdom celebrated the centennial anniversary of the National Spiritual Assembly of the Bahá’ís of the United Kingdom with a special reception at Portcullis House in Westminster. The gathering, hosted by the All-Party Parliamentary Group on the Bahá’í Faith, explored efforts to foster social harmony and highlighted the principle of oneness as a foundational force in shaping cohesive relationships among individuals, communities, and institutions. The event featured artistic performances, presentations about the contributions of the Bahá’ís of the United Kingdom to social progress, and heartfelt contributions from youth engaged in Bahá’í moral educational programs. The reception was enriched by a video produced by the Bahá’í Office of Public Affairs, and attendees gained insight into the collaborative spirit fostered by these initiatives, which nurtures personal growth and strengthens communities. Patrick O’Mara, Secretary of the National Assembly of the United Kingdom, reflected on the community's dedication to promoting a shared identity that sees all people as members of one human family and learning to contribute insights from experiences relevant to the challenges facing society through collaborative endeavors with fellow citizens.

## Router Query Engine

In [18]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [19]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=False
)

response = query_engine.query("What was mentioned about Meta?")

display_response(response)

**`Final Response:`** Meta (formerly known as Facebook) is building large language models (LLMs) and has revealed 28 personality-driven chatbots based on popular celebrities to be used in their messaging apps. These chatbots, voiced by celebrities such as Charli D’Amelio, Dwyane Wade, Kendall Jenner, MrBeast, Snoop Dogg, Tom Brady, and Paris Hilton, will be placed on every major surface of its products, including Facebook pages and Instagram accounts, and users will be able to message them in the same inbox they use to message their friends and family. This information suggests that Meta is exploring the use of AI characters in their social networking platforms, in addition to their traditional focus on entertainment.

### Multi Selector

In [20]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

response = query_engine.query("What was mentioned about Meta? Summarize with any other companies mentioned in the entire document.")

display_response(response)

**`Final Response:`** In the given context, Meta (formerly known as Facebook) is revealed to be building LLMs (large language models) and creating AI characters for use in their messaging apps. These characters, voiced by celebrities such as Charli D'Amelio, Snoop Dogg, and Tom Brady, are intended to provide personality-driven chatbots for users to interact with. This development is presented as a step towards a partially synthetic social network, where feeds may become defined by AI characters rather than human connections. The article also mentions OpenAI, a company known for its AI language model ChatGPT, which has added voice functionality and the ability to interact with images. The article suggests that these developments could lead to a new era in the consumer internet, with AI becoming a more prominent feature in social networking and potentially replacing some human interactions with synthetic companions.

## SubQuestion Query Engine

In [21]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [24]:
import nest_asyncio
nest_asyncio.apply()

In [25]:
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    verbose=True,
)

response = query_engine.query("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")

display_response(response)

ValueError: ignored

## SQL Query Engine

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [26]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [47]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip -O /content/chinook.zip
!unzip /content/chinook.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  298k  100  298k    0     0  2310k      0 --:--:-- --:--:-- --:--:-- 2313k
curl: (3) URL using bad/illegal format or missing URL
Archive:  /content/chinook.zip
  inflating: chinook.db              


In [54]:
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select, column

engine = create_engine("sqlite:///content/chinook.db")

ArgumentError: ignored

In [51]:
!unzip /content/chinook.zip

Archive:  /content/chinook.zip
  inflating: chinook.db              


In [52]:


from llama_index import SQLDatabase

sql_database = SQLDatabase(engine)

OperationalError: ignored

In [30]:
from llama_index.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    service_context=service_context
)

In [31]:
response = query_engine.query("What are some albums? Limit to 5.")

display_response(response)

**`Final Response:`** Based on the SQL query provided, it seems that the input question is asking for the names of some albums, with a limit of 5. However, the SQL query provided is not actually selecting the names of the albums, but rather checking if a specific song (Rolling in the Deep) appears on a specific album (21 by Adele). The SQL response indicates that the song does not appear on any albums in the database, as the result is an empty tuple. Therefore, there are no albums to provide as a response to the input question.

In order to actually select the names of some albums, a different SQL query would be needed. Here's an example:

SQL: SELECT DISTINCT albums.Title FROM albums LIMIT 5

This query will select the titles of up to 5 distinct albums from the albums table. The DISTINCT keyword ensures that only unique album titles are selected, and the LIMIT keyword limits the number of results returned.

Alternatively, you could modify the original input question to ask for the names of some songs instead, and then use a similar SQL query to select the song titles. Here's an example:

SQL: SELECT DISTINCT

In [32]:
response = query_engine.query("What are some artists? Limit it to 5.")

display_response(response)

**`Final Response:`** Unfortunately, based on the given SQL query and response, it seems like there are no artists with more than one track on the album "Purple Rain" with the exception of the title track, which is listed as "Untitled" in the database. Therefore, I'm afraid I'm unable to provide you with a list of five artists in response to your input question. I apologize for any inconvenience this may cause.

This last query should be a more complex join

In [33]:
response = query_engine.query("What are some tracks from the artist AC/DC? Limit it to 3")

display_response(response)

**`Final Response:`** Unfortunately, based on the provided SQL query, it seems that the input question was mistakenly copied and pasted instead of the correct query for the artist AC/DC. The SQL query provided returns the count of distinct albums for the artist Ed Sheeran, which is currently 0. As a result, there are no tracks to synthesize from this query. Please provide a correct SQL query for the artist AC/DC, and I will be happy to synthesize a response for you.

In [34]:
print(response.metadata['sql_query'])

SELECT COUNT(DISTINCT albums.AlbumId) FROM albums WHERE albums.ArtistId = (SELECT ArtistId FROM artists WHERE Name = 'Ed Sheeran')


## Programs

Depending the LLM, you will have to test with either `OpenAIPydanticProgram` or `LLMTextCompletionProgram`

In [35]:
from typing import List
from pydantic import BaseModel

from llama_index.program import OpenAIPydanticProgram, LLMTextCompletionProgram

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [36]:
from llama_index.output_parsers import PydanticOutputParser

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Album),
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

In [37]:
output = program(movie_name="The Shining")

ValidationError: ignored

In [38]:
print(output)

NameError: ignored

## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

In [39]:
from llama_index.agent import OpenAIAgent, ReActAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

Some inputs are hallucinated, causing issues with responses. Likely a better system prompt or tool descriptions could help.

In [40]:
response = agent.chat("Hello!")
print(response)

ValueError: ignored

In [41]:
response = agent.chat("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")
print(response)

[1;3;38;5;200mThought: I need to use a tool to help me answer this question.
Action: vector_search
Action Input: {'input': 'meta openai'}
[0m[1;3;34mObservation: In the given context, Meta and OpenAI are both companies exploring the use of generative AI and voices. Meta has revealed the launch of 28 personality-driven chatbots featuring celebrities such as Charli D'Amelio, Dwyane Wade, Kendall Jenner, MrBeast, Snoop Dogg, Tom Brady, and Paris Hilton. OpenAI, on the other hand, has added voice functionality to its popular AI tool, ChatGPT, which allows users to interact with the large language model via voice. Both companies are exploring the potential of AI in social networking and entertainment, with OpenAI presenting its products as productivity tools and Meta as an entertainment business.
[0m[1;3;38;5;200mThought: I can answer without using any more tools.
Response: In summary, both Meta and OpenAI are involved in the development of generative AI and voices, but Meta's focus is