<a href="https://colab.research.google.com/github/ArthurVanSchendel/RAG_wimbledon/blob/main/Copy_of_zephyr_7b_alpha_feature_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, `HuggingFaceH4/zephyr-7b-alpha` uses about 8GB of VRAM and spiked to 14GB of RAM when loading the model, then settled around 5GB. I used a T4 instance for this notebook.

In [None]:
!pip install llama-index transformers accelerate bitsandbytes

Collecting llama-index
  Downloading llama_index-0.8.67-py3-none-any.whl (859 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m859.6/859.6 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m28.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.24.1-py3-none-any.whl (261 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.2.post2-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
Collecting aiostream<0.6.0,>=0.5.2 (from llama-index)
  Downloading aiostream-0.5.2-py3-none-any.whl (39 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from lla

In [None]:
import os

import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output

from llama_index import (
    KnowledgeGraphIndex,
    ServiceContext,
    SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import SimpleGraphStore


from llama_index.llms import HuggingFaceLLM
from IPython.display import Markdown, display


# define LLM
# NOTE: at the time of demo, text-davinci-002 did not have rate-limit errors
#llm = HuggingFaceLLM(model_name="lmsys/fastchat-t5-3b-v1.0")
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
)

from llama_index.readers import BeautifulSoupWebReader

#url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"
#url = "https://en.wikipedia.org/wiki/Tennis"

#url = "https://www.itjungle.com/2023/10/16/take-a-progressive-approach-to-devops/"

#documents = BeautifulSoupWebReader().load_data([url])

#
url1 = "https://en.wikipedia.org/wiki/Australian_Open"
url2 = "https://en.wikipedia.org/wiki/French_Open"
url3 = "https://en.wikipedia.org/wiki/Wimbledon_Championships"
url4 = "https://en.wikipedia.org/wiki/US_Open_(tennis)"
url5 = "https://en.wikipedia.org/wiki/Novak_Djokovic"
url6 = "https://en.wikipedia.org/wiki/Rafael_Nadal"
url7 = "https://en.wikipedia.org/wiki/Roger_Federer"
url8 = "https://en.wikipedia.org/wiki/Pete_Sampras"
url9 = "https://en.wikipedia.org/wiki/Stan_Wawrinka"

url11= "https://en.wikipedia.org/wiki/Bitcoin"
url12= "https://en.wikipedia.org/wiki/Bitcoin"
url13= "https://en.wikipedia.org/wiki/Blockchain"
url14= "https://en.wikipedia.org/wiki/Distributed_ledger"
url15= "https://en.wikipedia.org/wiki/Consensus_(computer_science)"
url16= "https://en.wikipedia.org/wiki/Proof_of_work"

documents = BeautifulSoupWebReader().load_data([url11, url12, url13, url14, url15, url16])

Downloading (…)lve/main/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


def messages_to_prompt(messages):
  prompt = ""
  for message in messages:
    if message.role == 'system':
      prompt += f"<|system|>\n{message.content}</s>\n"
    elif message.role == 'user':
      prompt += f"<|user|>\n{message.content}</s>\n"
    elif message.role == 'assistant':
      prompt += f"<|assistant|>\n{message.content}</s>\n"

  # ensure we start with a system prompt, insert blank if needed
  if not prompt.startswith("<|system|>\n"):
    prompt = "<|system|>\n</s>\n" + prompt

  # add final assistant prompt
  prompt = prompt + "<|assistant|>\n"

  return prompt


llm = HuggingFaceLLM(
    model_name="HuggingFaceH4/zephyr-7b-alpha",
    tokenizer_name="HuggingFaceH4/zephyr-7b-alpha",
    query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
    context_window=3900,
    max_new_tokens=1024,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    device_map="auto",
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [None]:
#space_name = "llamaindex"
#edge_types, rel_prop_names = ["relationship"], [
#    "relationship"
#]  # default, could be omit if create from an empty kg
#tags = ["entity"]  # default, could be omit if create from an empty kg

#graph_store = NebulaGraphStore(
#    space_name=space_name,
#    edge_types=edge_types,
#    rel_prop_names=rel_prop_names,
#    tags=tags,
#)

graph_store = SimpleGraphStore()


storage_context = StorageContext.from_defaults(graph_store=graph_store)



In [None]:
# NOTE: can take a while!
#kg_index = KnowledgeGraphIndex.from_documents(
#    documents,
#    max_triplets_per_chunk=7,
#    storage_context=storage_context,
#    service_context=service_context,
#)

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=7,
    service_context=service_context,
    include_embeddings=True,
)



1. (Australian Open, is, Wikipedia page)
2. (Australian Open, has, 75 languages)
3. (Australian Open, has, history)
4. (Australian Open, has, Open era)
5. (Australian Open, has, Melbourne Park expansion)
6. (Australian Open, has, current courts)
7. (Australian Open, has, ranking points)
8. (Australian Open, has, prize money and trophies)
9. (Australian Open, has, champions)
10. (Australian Open, has, former champions)
11. (Australian Open, has, current champions)
12. (Australian Open, has, most recent finals)
13. (Australian Open, has, records)
14. (Australian Open, has, media coverage and attendance)
15. (Australian Open, has, see also)
16. (Australian Open, has, notes)
17. (Australian Open, has, references)
18. (Australian Open, has, external links)
1. Australian Open, is, annual tennis tournament held in Melbourne
2. Australian Open, has, redirects to, article about the tennis tournament
3. Australian Open, has, redirects to, disambiguation page
4. Australian Open, has, redirects to

In [None]:
query_engine = kg_index.as_query_engine(
    include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
    "What is the longest tennis match ever played in Grand Chelem? Where was this match played, and when?",
)
display_response(response)

**`Final Response:`** The longest tennis match ever played in Grand Chelem is the 2010 Wimbledon men's singles match between John Isner and Nicolas Mahut. The match lasted 11 hours and 5 minutes over the course of three days (June 22–24) and ended in a 6–4, 3–6, 6–7(7–9), 7–6(10–8), 38–36 victory for Isner.

In [None]:
from llama_index import KnowledgeGraphIndex

kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    service_context=service_context,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
    include_embeddings=True,
)

In [None]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

In [None]:
from llama_index.query_engine import KnowledgeGraphQueryEngine

from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore

# query_engine = KnowledgeGraphQueryEngine(
#     storage_context=storage_context,
#     service_context=service_context,
#     llm=llm,
#     verbose=True,
# )

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

# kg_tool = QueryEngineTool(
#     kg_index.as_query_engine(
#     include_text=True,
#     response_mode="tree_summarize",
#     embedding_mode="hybrid",
#     similarity_top_k=5,
#     ),
#     metadata=ToolMetadata(
#         name="knowledge_graph",
#         description="Useful for searching precise and factual information."
#     )
# )

In [None]:
from llama_index.query_engine import RouterQueryEngine

sd_query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

#response = query_engine.query("Who is considered to be the best tennis player of all time, and why? And what are his strengths and weaknesses?")
response = sd_query_engine.query("Which football club is Rafael Nadal supporting?")

display_response(response)

**`Final Response:`** The given context information does not provide any information about Rafael Nadal's support for a football club. The information provided is about the French Open and Wimbledon, and the prize money and rankings points for these tournaments.

In [None]:
 !pip install llama-index transformers accelerate bitsandbytes

Collecting llama-index
  Downloading llama_index-0.8.54-py3-none-any.whl (795 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m795.5/795.5 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.34.1-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m70.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.24.0-py3-none-any.whl (260 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.0/261.0 kB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.1-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
Collecting aiostream<0.6.0,>=0.5.2 (from llama-index)
  Downloading aiostream-0.5.2-py3-none-any.whl (39 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from llama-in

In [None]:
!pip freeze > requirements.txt

## Setup

### Data

In [None]:
from llama_index.readers import BeautifulSoupWebReader

#url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"
#url = "https://en.wikipedia.org/wiki/Tennis"

#url = "https://www.itjungle.com/2023/10/16/take-a-progressive-approach-to-devops/"

#documents = BeautifulSoupWebReader().load_data([url])

#
url1 = "https://en.wikipedia.org/wiki/Australian_Open"
url2 = "https://en.wikipedia.org/wiki/French_Open"
url3 = "https://en.wikipedia.org/wiki/Wimbledon_Championships"
url4 = "https://en.wikipedia.org/wiki/US_Open_(tennis)"
url5 = "https://en.wikipedia.org/wiki/Novak_Djokovic"
url6 = "https://en.wikipedia.org/wiki/Rafael_Nadal"
url7 = "https://en.wikipedia.org/wiki/Roger_Federer"
url8 = "https://en.wikipedia.org/wiki/Pete_Sampras"
url9 = "https://en.wikipedia.org/wiki/Stan_Wawrinka"

documents = BeautifulSoupWebReader().load_data([url1, url2, url3, url4]) #, url5, url6, url7, url8, url9

### LLM

This should run on a T4 instance on the free tier

In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


def messages_to_prompt(messages):
  prompt = ""
  for message in messages:
    if message.role == 'system':
      prompt += f"<|system|>\n{message.content}</s>\n"
    elif message.role == 'user':
      prompt += f"<|user|>\n{message.content}</s>\n"
    elif message.role == 'assistant':
      prompt += f"<|assistant|>\n{message.content}</s>\n"

  # ensure we start with a system prompt, insert blank if needed
  if not prompt.startswith("<|system|>\n"):
    prompt = "<|system|>\n</s>\n" + prompt

  # add final assistant prompt
  prompt = prompt + "<|assistant|>\n"

  return prompt


llm = HuggingFaceLLM(
    model_name="HuggingFaceH4/zephyr-7b-alpha",
    tokenizer_name="HuggingFaceH4/zephyr-7b-alpha",
    query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
    context_window=3900,
    max_new_tokens=1024,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    device_map="auto",
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

KeyboardInterrupt: ignored

In [None]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")

Downloading (…)lve/main/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


### Index Setup

In [None]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [None]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

### Helpful Imports / Logging

In [None]:
from llama_index.response.notebook_utils import display_response

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [None]:
query_engine = vector_index.as_query_engine(response_mode="compact")

#response = query_engine.query("How do OpenAI and Meta differ on AI tools?")
#response = query_engine.query("Who is considered to be the best tennis player of all time, and why?")
#query = "Can you write a blog article based on the following press article in a thousand words showing that a step-by-step approach is better for DevOps. Remove the references to Amdahl. Write in a personal blog style. Make it light-hearted and humorous."
query = "How many times did Djokovic loose against Nadal?"
response = query_engine.query(query)
display_response(response)

**`Final Response:`** Djokovic has lost 29 times against Nadal in their 59 matches.

### Refine

In [None]:
query_engine = vector_index.as_query_engine(response_mode="refine")

#query = "Can you write a blog article based on the following press article in a thousand words showing that a step-by-step approach is better for DevOps. Remove the references to Amdahl. Write in a personal blog style. Make it light-hearted and humorous."

response = query_engine.query(query)

display_response(response)

**`Final Response:`** As of the time of writing this answer, Novak Djokovic has won against Rafael Nadal 30 times and lost against Nadal 29 times in their head-to-head matches. However, in terms of their Grand Slam record, Nadal leads 11-7, while Djokovic leads 29-30 overall. Their matches on clay are particularly notable, with Nadal winning 20 out of 28 matches, while Djokovic has a better record on hard courts, winning 20 out of 27 matches. They have played a record 18 Grand Slam matches and a joint-record nine Grand Slam tournament finals (tied with Nadal-Federer). Nadal leads on clay 20-8, while Djokovic leads on hard courts 20-7, and they are tied on grass 2-2. Their rivalry is considered one of the greatest in tennis history, with Djokovic having defeated Nadal seven consecutive times, doing so twice, and two times consecutively on clay. However, Nadal has won their last three meetings in 2012, including the French Open final, and defeated Djokovic in the 2013 US Open Final. In their most recent match, Nadal beat Djokovic in the 2018 Rome semifinals.

### Tree Summarize

In [None]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query(query)
print("response: ", response)
display_response(response)

response:  Djokovic has won 29 times against Nadal and lost 30 times.


**`Final Response:`** Djokovic has won 29 times against Nadal and lost 30 times.

## Router Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [summary_tool],
    service_context=service_context,
    select_multi=False
)
query = "Can you write a humorous blog article of one thousand words approximately, by using the content of the given press article. Please do not mention Amdhal's Law."
response = query_engine.query(query)

display_response(response)

**`Final Response:`** I'm incapable of writing a humorous blog article, but i can provide you with a parody of the given press article.

title: "take a progressive approach to devops... or else!"

devops is the latest buzzword in the tech industry, and it's all the rage these days. but what exactly is devops, and why should you care? well, let me tell you, my dear reader, that devops is the future of software development, and if you don't embrace it, you're going to be left behind.

devops is a combination of two words: development and operations. it's a methodology that aims to bridge the gap between these two departments, which have traditionally been at odds with each other. in the past, developers would write code and throw it over the wall to the operations team, who would then deploy it to production. this often led to conflicts, as the operations team would complain that the code was not ready for production, and the developers would argue that the operations team was being too picky.

devops aims to solve this problem by having developers and operations work together from the very beginning. this means that developers will have to learn how to write code that is easy to deploy and operate, and the operations team will have to learn how to work with developers to ensure that the code is ready for production.

the benefits of devops are numerous. for one, it can significantly reduce the time it takes to deploy new features and fix bugs. this is because devops allows for continuous integration and continuous delivery, which means that code can be deployed to production as soon as it's ready, without having to go through a lengthy approval process.

another benefit of devops is that it can significantly reduce the cost of software development. by having developers and operations work together, there is less waste and duplication of effort. this can lead to significant cost savings, as well as a more efficient and streamlined development process.

however, there are some challenges to implementing devops. one of the biggest challenges is getting everyone on board. devops requires a significant cultural shift, as it requires developers and operations to work together in a way that they may not be used to. this can be difficult, as both departments have their own unique ways of working, and it can be challenging to get everyone to buy into the new methodology.

another challenge is that devops requires a significant investment in time and resources. this can be a significant barrier for smaller companies, who may not have the resources to invest in devops. however, the benefits of devops are significant, and it's worth the investment for companies that are serious about staying competitive in the tech industry.

so, how can you take a progressive approach to devops? here are some tips:

1. start small: don't try to implement devops across your entire organization all at once. instead, start with a small pilot project, and gradually expand to other areas of your organization.

2. involve everyone: devops requires a significant cultural shift, and it's important that everyone is involved in the process. this means involving developers, operations, and other stakeholders in the planning and implementation process.

3. focus on communication: communication is key to the success of devops. make sure that everyone is on the same page, and that there is open and honest communication between developers and operations.

4. automate as much as possible: automation is a key component of devops. by automating as much of the development and deployment process as possible, you can significantly reduce the time and resources required to deploy new features and fix bugs.

5. measure your success: it's important to measure your success with devops. this will help you identify areas where you can improve, and will also help you demonstrate the benefits of devops to stakeholders.

in conclusion, devops is the future of software development, and it's important that you take a progressive approach to implementing it. by starting small, involving everyone, focusing on communication, automating as much as possible, and measuring your success, you can significantly reduce the time and resources required to deploy new features and fix bugs, and can significantly improve the efficiency and streamlinedness of your development process. so, what are you waiting for? embrace devops today!

note: this article is a parody and does not represent the views or opinions of the author.

### Multi Selector

In [None]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

#response = query_engine.query("Who is considered to be the best tennis player of all time, and why? And what are his strengths and weaknesses?")
response = query_engine.query("Who is considered to be the best tennis player of all time, and why?")

display_response(response)

**`Final Response:`** According to the given context information, there are several tennis pundits and analysts who have classified Novak Djokovic as one of the greatest tennis players of all time. Rafael Nadal, who is also considered to be one of the greatest players, has praised Djokovic's peak level of performance and stated that he is the best player of all time in 2011. In 2016, Nadal reiterated this and said that Djokovic has better numbers than him and is the best in history. Pete Sampras, who was considered to be the greatest male tennis player of all time before Djokovic, also stated in 2021 that Djokovic's consistency, winning the majors, and finishing number one for seven years make him the greatest of all time. Tennis coach Nick Bollettieri has also praised Djokovic as the "most complete player ever" and "the most perfect player of all time." However, opinions and preferences are subjective, and some may have different opinions on who is the best tennis player of all time.

## SubQuestion Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    verbose=True,
)

response = query_engine.query(query)

display_response(response)



ValueError: ignored

## SQL Query Engine

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip -O /content/chinook.zip
!unzip /content/chinook.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  298k  100  298k    0     0  2409k      0 --:--:-- --:--:-- --:--:-- 2426k
curl: (3) URL using bad/illegal format or missing URL
Archive:  /content/chinook.zip
  inflating: chinook.db              


In [None]:
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select, column

engine = create_engine("sqlite:////content/chinook.db")

In [None]:
from llama_index import SQLDatabase

sql_database = SQLDatabase(engine)

In [None]:
from llama_index.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    service_context=service_context
)

In [None]:
response = query_engine.query("What are some albums? Limit to 5.")

display_response(response)

**`Final Response:`** Based on the query results, some albums are:
1. Koyaanisqatsi (Soundtrack from the Motion Picture) by Philip Glass Ensemble (AlbumId: 347)
2. Mozart: Chamber Music by Nash Ensemble (AlbumId: 346)
3. Monteverdi: L'Orfeo by C. Monteverdi, Nigel Rogers - Chiaroscuro; London Baroque; London Cornett & Sackbu (AlbumId: 345)
4. Schubert: The Late String Quartets & String Quintet (3 CD's) by Emerson String Quartet (AlbumId: 344)
5. Respighi:Pines of Rome by Eugene Ormandy (AlbumId: 343)

In [None]:
response = query_engine.query("What are some artists? Limit it to 5.")

display_response(response)

**`Final Response:`** Some artists, limited to 5, are: A Cor Do Som, AC/DC, Aaron Copland & London Symphony Orchestra, Aaron Goldberg, and Academy of St. Martin in the Fields & Sir Neville Marriner.

This last query should be a more complex join

In [None]:
response = query_engine.query("What are some tracks from the artist AC/DC? Limit it to 3")

display_response(response)

**`Final Response:`** Some tracks from the artist AC/DC that we'll be discussing today are "Bad Boy Boogie," "Breaking The Rules," and "C.O.D." These songs are from different albums, but they all showcase the iconic sound of AC/DC.

In [None]:
print(response.metadata['sql_query'])

SELECT tracks.Name
FROM tracks
INNER JOIN albums ON tracks.AlbumId = albums.AlbumId
INNER JOIN artists ON albums.ArtistId = artists.ArtistId
WHERE artists.Name = 'AC/DC'
GROUP BY tracks.Name
ORDER BY tracks.Name ASC
LIMIT 3;


## Programs

Depending the LLM, you will have to test with either `OpenAIPydanticProgram` or `LLMTextCompletionProgram`

In [None]:
from typing import List
from pydantic import BaseModel

from llama_index.program import OpenAIPydanticProgram, LLMTextCompletionProgram

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [None]:
from llama_index.output_parsers import PydanticOutputParser

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Album),
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

In [None]:
output = program(movie_name="The Shining")

In [None]:
print(output)

name='The Shining Soundtrack' artist='Wendy Carlos' songs=[Song(title='Main Title', length_seconds=2), Song(title='The Shining', length_seconds=10), Song(title='The Maze', length_seconds=12), Song(title='The Redrum', length_seconds=10), Song(title='The Maze (Reprise)', length_seconds=6), Song(title='The Shining (End Title)', length_seconds=10)]


## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

In [None]:
from llama_index.agent import OpenAIAgent, ReActAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

Some inputs are hallucinated, causing issues with responses. Likely a better system prompt or tool descriptions could help.

In [None]:
response = agent.chat("Hello!")
print(response)

[1;3;38;5;200mThought: I am designed to help with a variety of tasks.
Action: vector_search
Action Input: {'text': 'Hello!', 'num_beams': 5}
[0m[1;3;34mObservation: This query is not related to the given context information. The query provided is a hypothetical example of a query that could be made to a language model, and does not have any relevance to the article discussed.
[0m[1;3;38;5;200mResponse: The input provided is not related to the given context information.
[0mThe input provided is not related to the given context information.


In [None]:
response = agent.chat("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")
print(response)

[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: vector_search
Action Input: {'text': 'Meta and OpenAI', 'num_beams': 5}
[0m[1;3;34mObservation: In the given context, the query "Meta and OpenAI" refers to two companies that are building LLMs (language learning models) and using generative AI and voices. Meta is in the entertainment business and has unveiled 28 personality-driven chatbots to be used in its messaging apps, while OpenAI announced updates for ChatGPT, including the ability to interact with it via voice and upload images for questions. Both companies are exploring the potential of synthetic companions that can offer coaching, tutoring, therapy, and entertainment, and the rise of a new era in the consumer internet.
[0m[1;3;38;5;200mResponse: In terms of how they differ, Meta's focus is on using generative AI and voices to create personality-driven chatbots for its messaging apps, while OpenAI's focus is on improving its ChatGPT model a