In [1]:
! python --version

Python 3.12.2


## Subaru Forester vs Subaru Outback

In this demo, we explore answering complex queries by decomposing them into simpler sub-queries. 

We will be comparing and contrasting both of these popluar Subaru models.

## Install the required packages
- `%%capture` is used to suppress the output of the installation commands.

In [2]:
%%capture
%pip install llama-index-readers-file pymupdf
%pip install llama-index-vector-stores-postgres
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-bedrock
%pip install llama-index-embeddings-bedrock
%pip install psycopg2-binary
%pip install ipywidgets
%pip install SQLAlchemy
%pip install python-dotenv

In [3]:
import nest_asyncio

nest_asyncio.apply()

## Import the required libraries
- The `load_dotenv` function is used to load the environment variables from the `.env` file - this is used when I had to access a more capable generator model in Bedrock. 


In [4]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from dotenv import load_dotenv
load_dotenv(verbose=True, dotenv_path=".env")

True

## Setup the Retriever and Generator models
- pass `mode=local` to the `setup` function to use our local LMStudio models.
- pass `mode=remote` to the `setup` function to use the AWS Bedrock.

In [5]:
from llama_index.core import Settings
from llama_index.llms.bedrock import Bedrock
from llama_index.llms.lmstudio import LMStudio
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import os

def setup_models(mode="local"):
    if mode == "local":
        # Setup Retriever model
        embedding_model = "BAAI/bge-large-en-v1.5" # "BAAI/bge-base-en-v1.5"
        print(f"Setting up local Retriever model (embedding: {embedding_model})...")
        Settings.embed_model = HuggingFaceEmbedding(model_name=embedding_model)

        Settings.chunk_size = 512
        Settings.chunk_overlap = 20
        
        # Setup Generator model
        llm_model = "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF"
        print(f"Setting up local Generator model (main LLM: {llm_model})...")
        Settings.llm = LMStudio(
            model_name=llm_model,
            base_url="http://localhost:1234/v1",
            temperature=0,
            request_timeout=120,
        )
    elif mode == "remote":
        # Setup Retriever model
        embedding_model = "cohere.embed-multilingual-v3"
        print(f"Setting up remote Retriever model (embedding: {embedding_model})...")
        Settings.embed_model = BedrockEmbedding(
            model_name=embedding_model,
            region_name=os.environ["AWS_DEFAULT_REGION"],
        )
        Settings.chunk_size = 1024
        Settings.chunk_overlap = 20
                
        # Setup Generator model
        llm_model = "anthropic.claude-3-sonnet-20240229-v1:0"
        print(f"Setting up remote Generator model (main LLM: {llm_model})...")
        Settings.llm = Bedrock(
            model=llm_model,
            aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
            aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
            aws_session_token=os.environ["AWS_SESSION_TOKEN"],
            region_name=os.environ["AWS_DEFAULT_REGION"],
            request_timeout=120,
        )

    else:
        raise ValueError(f"Unknown mode: {mode}")
    
setup_models(mode="remote") # <=== Change this `local` or `remote`

text_embedding = Settings.embed_model.get_text_embedding("Once upon a time, there was a cat.")
print(text_embedding[:5])
print(f"Emedding length: {len(text_embedding)}")
vector_size = len(text_embedding)


Setting up remote Retriever model (embedding: cohere.embed-multilingual-v3)...
Setting up remote Generator model (main LLM: anthropic.claude-3-sonnet-20240229-v1:0)...
[-0.043518066, -0.010955811, -0.00032567978, 0.0057792664, -0.016540527]
Emedding length: 1024


# Setup PgVector extension in Postgres SQL
- In the code below, we drop the database everytime, just to ensure that we are starting from scratch. This is not recommended in production.

In [6]:
import psycopg2
import nest_asyncio

try:
    pg_pw = "mysecretpassword"
    pg_db = "vector_store"
    connection_string = f"postgresql://postgres:{pg_pw}@localhost:5432"
    db_name = pg_db
    conn = psycopg2.connect(connection_string)
    conn.autocommit = True

    with conn.cursor() as c:
        c.execute(f"DROP DATABASE {db_name} WITH (FORCE);")
        c.execute(f"CREATE DATABASE {db_name};")

    conn.commit()
    conn.close()
    
    nest_asyncio.apply()
    
except Exception as e:
    print(e)
    

In [7]:
from IPython.display import Markdown
from llama_index.core import SimpleDirectoryReader
from sqlalchemy import make_url
from llama_index.core import VectorStoreIndex
from llama_index.core import StorageContext
from llama_index.vector_stores.postgres import PGVectorStore

def simple_RAG(vector_size):
    """
    Simple Retrieval Augmented Generation (RAG) using Llama Index.
    """
    BASE_DIR = "./data/Subaru/"

    url = make_url(connection_string)
    print(f"Url {url}")
    
    vector_store = PGVectorStore.from_params(
        database=db_name,
        host=url.host,
        password=url.password,
        port=url.port,
        user=url.username,
        table_name="basic_rag",
        embed_dim=vector_size
    )

    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    
    nodes = ingest_directory(BASE_DIR)
    
    print(f"Number of nodes: {len(nodes)}")

    index = VectorStoreIndex.from_documents(nodes, storage_context=storage_context, show_progress=True)
    return index

def advanced_RAG(vector_size, input_file):
    """
    Simple Retrieval Augmented Generation (RAG) using Llama Index.
    """

    print(f"Ingesting document: {input_file}...")
    url = make_url(connection_string)
    print(f"Url {url}")
    
    vector_store = PGVectorStore.from_params(
        database=db_name,
        host=url.host,
        password=url.password,
        port=url.port,
        user=url.username,
        table_name="advanced_rag",
        embed_dim=vector_size
    )

    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    
    nodes = ingest_document(input_file)
    
    print(f"Number of nodes: {len(nodes)}")

    index = VectorStoreIndex.from_documents(nodes, storage_context=storage_context, show_progress=True)
    return index

def ingest_document(input_file):
    """
    Ingest a document into the vector store. 
    """
    reader = SimpleDirectoryReader(input_files=[input_file])
    return reader.load_data(show_progress=True)

def ingest_directory(directory):
    """
    Ingest documents from a directory into the vector store. 
    """
    reader = SimpleDirectoryReader(input_dir=directory)
    return reader.load_data(show_progress=True)

def display_markdown(question, response, display_nodes=False):
    """
    Display a question and response in markdown format.
    """
    nodes = []
    if display_nodes:
        nodes = response.source_nodes
    return Markdown(
f"""
## Question:
{question}

## Answer:
{response.response}

## Num Nodes:
{len(nodes)}
""")

## Using naive LlamaIndex RAG
- A single index containing both company's 10-K filings, and setup the query engine with top `k=3`. 

In [8]:
index = simple_RAG(vector_size=vector_size)
query_engine = index.as_query_engine(similarity_top_k=3, verbose=True)

Url postgresql://postgres:***@localhost:5432


Loading files:   0%|          | 0/2 [00:00<?, ?file/s]invalid pdf header: b'-----'
incorrect startxref pointer(1)
Loading files: 100%|██████████| 2/2 [00:03<00:00,  1.58s/file]

Number of nodes: 119





Parsing nodes:   0%|          | 0/119 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/119 [00:00<?, ?it/s]

In [9]:
question = "Compare and contrast the size of the cargo space of both models."
response = query_engine.query(question)

display_markdown(question, response, display_nodes=True)


## Question:
Compare and contrast the size of the cargo space of both models.

## Answer:
The context information does not provide details to directly compare the cargo space size between different Subaru Outback models. However, it does mention that the Outback has a large, 522L flat cargo space that can easily accommodate a family's gear. This suggests a spacious cargo area across the Outback lineup, designed to meet the needs of those seeking versatility and the ability to carry substantial amounts of cargo for their adventures and getaways.

## Num Nodes:
3


In [10]:
question = "I want to know the coulour options for both models."
response = query_engine.query(question)

display_markdown(question, response, display_nodes=True)


## Question:
I want to know the coulour options for both models.

## Answer:
For the Subaru Outback, the available exterior color options are:

- Crystal White Pearl
- Ice Silver Metallic  
- Cashmere Gold Opal (not available on Outback AWD Sport)
- Brilliant Bronze Metallic (not available on Outback AWD Sport)
- Autumn Green Metallic
- Crimson Red Pearl (not available on Outback AWD Sport)
- Sapphire Blue Pearl
- Magnetite Grey Metallic
- Crystal Black Silica

For the Subaru Forester, the available exterior color options are:

- Crystal White Pearl
- Ice Silver Metallic
- Brilliant Bronze Metallic
- Autumn Green Metallic
- Magnetite Grey Metallic
- Horizon Blue Pearl
- Sapphire Blue Pearl
- Cascade Green Silica
- Crimson Red Pearl
- Crystal Black Silica

Both models offer a range of popular color choices, allowing customers to select a shade that suits their personal preference and style.

## Num Nodes:
3


## Using Sub Question decomposition method
- Given the same question `Compare and contrast the their major assets and liabilities in 2021` Sub Question is able to generate answeres more clearly and intelligently than the naive RAG method. Also notice the sub questions generated by the system.
- Below, we are building two indices, one for each company, so that they can be compared and contrasted more effectively.

In [11]:
forester_index = advanced_RAG(vector_size=vector_size, input_file="./data/Subaru/Subaru-Forester-brochure.pdf")
outback_index = advanced_RAG(vector_size=vector_size, input_file="./data/Subaru/Subaru-Outback-brochure.pdf")

Ingesting document: ./data/Subaru/Subaru-Forester-brochure.pdf...
Url postgresql://postgres:***@localhost:5432


Loading files:   0%|          | 0/1 [00:00<?, ?file/s]invalid pdf header: b'-----'
incorrect startxref pointer(1)
Loading files: 100%|██████████| 1/1 [00:01<00:00,  1.78s/file]

Number of nodes: 56





Parsing nodes:   0%|          | 0/56 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/56 [00:00<?, ?it/s]

Ingesting document: ./data/Subaru/Subaru-Outback-brochure.pdf...
Url postgresql://postgres:***@localhost:5432


Loading files: 100%|██████████| 1/1 [00:01<00:00,  1.29s/file]

Number of nodes: 63





Parsing nodes:   0%|          | 0/63 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/63 [00:00<?, ?it/s]

In [12]:
outback_engine = outback_index.as_query_engine(similarity_top_k=3, verbose=True)
forester_engine = forester_index.as_query_engine(similarity_top_k=3, verbose=True)

query_engine_tools = [
    QueryEngineTool(
        query_engine=outback_engine,
        metadata=ToolMetadata(
            name="Subaru Outback",
            description=(
                "Provides information about Subaru Outback"
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=forester_engine,
        metadata=ToolMetadata(
            name="Subaru Forester",
            description=(
                "Provides information about Subaru Forester"
            ),
        ),
    ),
]

s_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    verbose=True
)

In [13]:
question = "Compare and contrast the size of the cargo space of both models."
response = s_engine.query(question)

display_markdown(question, response, display_nodes=True)

Generated 2 sub questions.
[1;3;38;2;237;90;200m[Subaru Outback] Q: What is the cargo space size of the Subaru Outback?
[0m[1;3;38;2;90;149;237m[Subaru Forester] Q: What is the cargo space size of the Subaru Forester?
[0m[1;3;38;2;90;149;237m[Subaru Forester] A: The Subaru Forester has a cargo volume of 498 liters with the rear seats up, and 1,060 liters (or up to 1,779 liters to the ceiling on some models) with the rear seats folded down.
[0m[1;3;38;2;237;90;200m[Subaru Outback] A: The Subaru Outback has a cargo space of 522 liters with the rear seats up, and 1,267 liters with the rear seats folded down. Additionally, the total cargo volume extends up to the ceiling, providing 1,783 liters of space on some models and 1,711 liters on others.
[0m


## Question:
Compare and contrast the size of the cargo space of both models.

## Answer:
The Subaru Outback offers slightly more cargo space than the Forester. With the rear seats up, the Outback has 522 liters of space compared to 498 liters in the Forester. When the rear seats are folded down, the Outback provides 1,267 liters of space, while the Forester offers 1,060 liters. However, the total cargo volume to the ceiling is larger in some Forester models at 1,779 liters, compared to 1,783 liters in certain Outback variants and 1,711 liters in others. Overall, both vehicles offer ample cargo capacity, with the Outback having a slight edge in most configurations, but the Forester potentially providing more total volume in specific models.

## Num Nodes:
8


In [14]:
question = "I want to know the colour options for both models."
response = s_engine.query(question)

display_markdown(question, response, display_nodes=True)

Generated 2 sub questions.
[1;3;38;2;237;90;200m[Subaru Outback] Q: What are the color options for the Subaru Outback?
[0m[1;3;38;2;90;149;237m[Subaru Forester] Q: What are the color options for the Subaru Forester?
[0m[1;3;38;2;237;90;200m[Subaru Outback] A: The Subaru Outback comes in a range of popular exterior color options, including Crystal White Pearl, Ice Silver Metallic, Cashmere Gold Opal, Brilliant Bronze Metallic, Autumn Green Metallic, Crimson Red Pearl, Sapphire Blue Pearl, Magnetite Grey Metallic, and Crystal Black Silica. However, it's worth noting that not all color options are available on all models, and some colors like Brilliant Bronze Metallic, Crimson Red Pearl, and Cashmere Gold Opal are not offered on the Outback AWD Sport trim.
[0m[1;3;38;2;90;149;237m[Subaru Forester] A: Based on the information provided, the available exterior color options for the Subaru Forester are:

- Crystal White Pearl
- Ice Silver Metallic  
- Brilliant Bronze Metallic
- Autumn


## Question:
I want to know the colour options for both models.

## Answer:
The color options available for the Subaru Outback and Subaru Forester models are:

Outback color options:
- Crystal White Pearl
- Ice Silver Metallic
- Cashmere Gold Opal
- Brilliant Bronze Metallic
- Autumn Green Metallic  
- Crimson Red Pearl
- Sapphire Blue Pearl
- Magnetite Grey Metallic
- Crystal Black Silica

Forester color options:
- Crystal White Pearl
- Ice Silver Metallic
- Brilliant Bronze Metallic
- Autumn Green Metallic
- Magnetite Grey Metallic
- Horizon Blue Pearl
- Sapphire Blue Pearl
- Cascade Green Silica
- Crimson Red Pearl
- Crystal Black Silica

Both models share several color options like Crystal White Pearl, Ice Silver Metallic, Brilliant Bronze Metallic, Autumn Green Metallic, Crimson Red Pearl, Sapphire Blue Pearl, Magnetite Grey Metallic, and Crystal Black Silica. However, the Forester has some exclusive colors like Horizon Blue Pearl and Cascade Green Silica, while the Outback offers Cashmere Gold Opal as a unique option.

## Num Nodes:
8


## Handling missing data
- Here it asks about the years 2017 to 2018, but the context only has data from 2019 to 2021.

In [15]:
question = "What are the key specifications for the Toyota Corolla?"

try:
    response = s_engine.query(question)
    display_markdown(question, response, display_nodes=True)
except Exception as e:
    print(f"Cannot answer the question: {e}")

Cannot answer the question: No valid JSON found in output: Since the provided tools do not contain any information about the Toyota Corolla, it is not possible to answer the user question using these tools. The output would be an empty JSON object:

```json
{}
```


In [16]:
question = "Which model has better fuel efficiency?"
response = s_engine.query(question)

display_markdown(question, response, display_nodes=True)

Generated 2 sub questions.
[1;3;38;2;237;90;200m[Subaru Outback] Q: What is the fuel efficiency of the Subaru Outback?
[0m[1;3;38;2;90;149;237m[Subaru Forester] Q: What is the fuel efficiency of the Subaru Forester?
[0m[1;3;38;2;90;149;237m[Subaru Forester] A: The Subaru Forester has impressive fuel efficiency ratings. For the petrol variants, the combined fuel consumption is 7.4L/100km. The hybrid variants are even more fuel-efficient, with a combined fuel consumption of just 6.7L/100km. The urban and extra urban figures are also provided, showcasing the Forester's excellent fuel economy across different driving conditions.
[0m[1;3;38;2;237;90;200m[Subaru Outback] A: According to the specifications provided, the fuel efficiency (fuel consumption) of the Subaru Outback models is as follows:

Outback AWD:
Combined: 7.3 L/100km
Urban: 9.3 L/100km  
Extra Urban: 6.2 L/100km

Outback AWD Sport and Outback AWD Touring (non-turbo models):
Same fuel efficiency as the Outback AWD above.


## Question:
Which model has better fuel efficiency?

## Answer:
Based on the fuel efficiency figures provided, the Subaru Forester models have better overall fuel efficiency compared to the Subaru Outback models. The Forester's combined fuel consumption of 7.4L/100km for the petrol variants and an impressive 6.7L/100km for the hybrid variants is lower than the Outback's combined fuel consumption figures, which range from 7.3L/100km for the non-turbo models to 9.0L/100km for the turbocharged variants. Therefore, the Subaru Forester emerges as the more fuel-efficient model between the two.

## Num Nodes:
8


In [17]:
question = "Compare and contrast both models and which would I buy for mylself, assuming I love going outdoors and have a lot of gear to carry?"
response = s_engine.query(question)

display_markdown(question, response, display_nodes=True)

Generated 6 sub questions.
[1;3;38;2;237;90;200m[Subaru Outback] Q: What are the key features and specifications of the Subaru Outback?
[0m[1;3;38;2;90;149;237m[Subaru Forester] Q: What are the key features and specifications of the Subaru Forester?
[0m[1;3;38;2;11;159;203m[Subaru Outback] Q: What is the cargo capacity and storage space of the Subaru Outback?
[0m[1;3;38;2;155;135;227m[Subaru Forester] Q: What is the cargo capacity and storage space of the Subaru Forester?
[0m[1;3;38;2;237;90;200m[Subaru Outback] Q: What are the off-road capabilities and ground clearance of the Subaru Outback?
[0m[1;3;38;2;90;149;237m[Subaru Forester] Q: What are the off-road capabilities and ground clearance of the Subaru Forester?
[0m[1;3;38;2;90;149;237m[Subaru Forester] A: The Subaru Forester comes with a range of impressive features and specifications across its various trim levels. Some of the key highlights include:

- Symmetrical All-Wheel Drive system standard across the range
- 5-


## Question:
Compare and contrast both models and which would I buy for mylself, assuming I love going outdoors and have a lot of gear to carry?

## Answer:
Both the Subaru Outback and Forester are excellent choices for outdoor enthusiasts who need ample cargo space and off-road capabilities. However, if I had to choose one based on the provided information, the Subaru Outback would be the better option for someone who loves going outdoors and has a lot of gear to carry.

The Outback offers a larger cargo capacity of 522L compared to the Forester's 498L with the rear seats up. Additionally, the Outback's flat cargo area and convenient features like pull-out tie-down points, a 12V power supply, and a hands-free powered rear tailgate make it easier to load and secure gear. The integrated roof rails on certain Outback models also provide additional storage space for bulky items.

While the Forester has a slightly higher ground clearance of 220mm compared to the Outback's 213mm, both vehicles are equipped with Symmetrical All-Wheel Drive and the X-Mode system, ensuring excellent off-road capabilities. However, the Outback's larger size and longer wheelbase may provide better stability on rough terrain.

Ultimately, the Subaru Outback's combination of generous cargo capacity, versatile storage solutions, and robust off-road capabilities make it the more suitable choice for someone who frequently ventures outdoors with a lot of gear. Its spacious interior and practical features would make it an ideal companion for outdoor adventures and camping trips.

## Num Nodes:
24
