<a href="https://colab.research.google.com/github/amitsangani/llama_index/blob/main/Llama_Index_chatbot_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [8]:
pip install llama-index


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [10]:
# download files
!mkdir data
!wget "https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1" -O data/UBER.zip
!unzip data/UBER.zip -d data


--2023-04-14 21:04:41--  https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.18, 2620:100:6020:18::a27d:4012
Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/948jr9cfs7fgj99/UBER.zip [following]
--2023-04-14 21:04:41--  https://www.dropbox.com/s/dl/948jr9cfs7fgj99/UBER.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucb3cd3b60080506d3debfbaa70e.dl.dropboxusercontent.com/cd/0/get/B6Kg-nkXRmePDsIqPbNUzP_f0kBwc5Uhdb6xnMCFe-_4Ll4osC6hcNpyGROsHL4d-dL8QQSuJygJWdQpkNXSU44wqSv3V8yDp3HGQORjJW8UUEEVgOqQghcqfxIBt7l-zRffs2MKbVs6iszmqQYRRmVdDdhkwxOmGgADMDJEc1wDqA/file?dl=1# [following]
--2023-04-14 21:04:41--  https://ucb3cd3b60080506d3debfbaa70e.dl.dropboxusercontent.com/cd/0/get/B6Kg-nkXRmePDsIqPbNUzP_f0kBwc5Uhdb6xnMCFe-_4Ll4osC6hcNpyGROsHL4d-dL8QQSuJygJWdQpkNXSU44

In [11]:
# set text wrapping
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)


In [12]:
import os
os.environ['OPENAI_API_KEY'] = "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"

In [14]:
from llama_index import download_loader, GPTSimpleVectorIndex, ServiceContext
from pathlib import Path


In [15]:
years = [2022, 2021, 2020, 2019]


In [16]:
UnstructuredReader = download_loader("UnstructuredReader", refresh_cache=True)


In [17]:
loader = UnstructuredReader()
doc_set = {}
all_docs = []
for year in years:
    year_docs = loader.load_data(file=Path(f'./data/UBER/UBER_{year}.html'), split_documents=False)
    # insert year metadata into each year
    for d in year_docs:
        d.extra_info = {"year": year}
    doc_set[year] = year_docs
    all_docs.extend(year_docs)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [23]:
len(all_docs)

4

In [24]:
# initialize simple vector indices + global vector index
# NOTE: don't run this cell if the indices are already loaded! 
index_set = {}
service_context = ServiceContext.from_defaults(chunk_size_limit=512)
for year in years:
    cur_index = GPTSimpleVectorIndex.from_documents(doc_set[year], service_context=service_context)
    index_set[year] = cur_index
    cur_index.save_to_disk(f'index_{year}.json')

In [25]:
# Load indices from disk
index_set = {}
for year in years:
    cur_index = GPTSimpleVectorIndex.load_from_disk(f'index_{year}.json')
    index_set[year] = cur_index

In [27]:
from llama_index import GPTListIndex, LLMPredictor
from langchain import OpenAI
from llama_index.indices.composability import ComposableGraph


In [28]:
# set summary text for each doc
index_summaries = [f"UBER 10-k Filing for {year} fiscal year" for year in years]


In [29]:
# set number of output tokens
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, max_tokens=512))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)


In [30]:
# define a list index over the vector indices
# allows us to synthesize information across each index
graph = ComposableGraph.from_indices(
    GPTListIndex, 
    [index_set[y] for y in years], 
    index_summaries=index_summaries,
    service_context=service_context
)


In [31]:
graph.save_to_disk('10k_graph.json')


In [32]:
graph = ComposableGraph.load_from_disk('10k_graph.json', service_context=service_context)


In [33]:
from langchain.agents import Tool
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent

from llama_index.langchain_helpers.agents import LlamaToolkit, create_llama_chat_agent, IndexToolConfig, GraphToolConfig


In [35]:
# define a decompose transform
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
    llm_predictor, verbose=True
)

# define query configs for graph 
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 1,
            # "include_summary": True
        },
        "query_transform": decompose_transform
    },
    {
        "index_struct_type": "list",
        "query_mode": "default",
        "query_kwargs": {
            "response_mode": "tree_summarize",
            "verbose": True
        }
    },
]

In [36]:
# define toolkit
index_configs = []
for y in range(2019, 2023):
    tool_config = IndexToolConfig(
        index=index_set[y], 
        name=f"Vector Index {y}",
        description=f"useful for when you want to answer queries about the {y} SEC 10-K for Uber",
        index_query_kwargs={"similarity_top_k": 3},
        tool_kwargs={"return_direct": True}
    )
    index_configs.append(tool_config)
    
# graph config
graph_config = GraphToolConfig(
    graph=graph,
    name=f"Graph Index",
    description="useful for when you want to answer queries that require analyzing multiple SEC 10-K documents for Uber.",
    query_configs=query_configs,
    tool_kwargs={"return_direct": True}
)

toolkit = LlamaToolkit(
    index_configs=index_configs,
    graph_configs=[graph_config]
)

In [37]:
memory = ConversationBufferMemory(memory_key="chat_history")
llm=OpenAI(temperature=0)
agent_chain = create_llama_chat_agent(
    toolkit,
    llm,
    memory=memory,
    verbose=True
)

In [38]:
agent_chain.run(input="hi, I am Amit")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? No
AI: Hi Amit, nice to meet you! How can I help you today?[0m

[1m> Finished chain.[0m


'Hi Amit, nice to meet you! How can I help you today?'

In [39]:
agent_chain.run(input="What were some of the biggest risk factors in 2022 for Uber?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Vector Index 2022
Action Input: Risk Factors[0m
Observation: [31;1m[1;3m

Risk Factors Summary

The following are some of the risks that could have an adverse effect on our business, financial condition, operating results, or prospects in 2022:

1. Drivers being classified as employees, workers or quasi-employees instead of independent contractors.
2. High competition in the mobility, delivery, and logistics industries, with well-established and low-cost alternatives that have been available for decades, low barriers to entry, low switching costs, and well-capitalized competitors in nearly every major geographic region.
3. Lowering of fares or service fees to remain competitive in certain markets.
4. Significant losses since inception, including in the United States and other major markets, and the expectation of increasing operating expenses in the foreseeable future.
5. Inab

'\n\nRisk Factors Summary\n\nThe following are some of the risks that could have an adverse effect on our business, financial condition, operating results, or prospects in 2022:\n\n1. Drivers being classified as employees, workers or quasi-employees instead of independent contractors.\n2. High competition in the mobility, delivery, and logistics industries, with well-established and low-cost alternatives that have been available for decades, low barriers to entry, low switching costs, and well-capitalized competitors in nearly every major geographic region.\n3. Lowering of fares or service fees to remain competitive in certain markets.\n4. Significant losses since inception, including in the United States and other major markets, and the expectation of increasing operating expenses in the foreseeable future.\n5. Inability to attract or maintain a critical mass of Drivers, consumers, merchants, Shippers, and Carriers.\n6. Dependence on retaining and attracting high-quality personnel, an

In [40]:
agent_chain.run(input="Who are the top executives at Uber?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? No
AI: The top executives at Uber are Dara Khosrowshahi, CEO; Nelson Chai, CFO; Tony West, Chief Legal Officer; Barney Harford, Chief Operating Officer; and Thuan Pham, Chief Technology Officer.[0m

[1m> Finished chain.[0m


'The top executives at Uber are Dara Khosrowshahi, CEO; Nelson Chai, CFO; Tony West, Chief Legal Officer; Barney Harford, Chief Operating Officer; and Thuan Pham, Chief Technology Officer.'

In [41]:
agent_chain.run(input="Who are the top executives at Uber in 2022?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
AI: In 2022, the top executives at Uber are expected to be Dara Khosrowshahi, CEO; Nelson Chai, CFO; Tony West, Chief Legal Officer; Barney Harford, Chief Operating Officer; and Thuan Pham, Chief Technology Officer.[0m

[1m> Finished chain.[0m


'In 2022, the top executives at Uber are expected to be Dara Khosrowshahi, CEO; Nelson Chai, CFO; Tony West, Chief Legal Officer; Barney Harford, Chief Operating Officer; and Thuan Pham, Chief Technology Officer.'

In [42]:
agent_chain.run(input="How does the executive compensation at Uber look like?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Vector Index 2022
Action Input: Executive Compensation[0m
Observation: [31;1m[1;3m

Executive compensation is the total compensation package offered to executives and key personnel of a company. This package typically includes salary, bonuses, stock options, and other benefits. In 2022, Uber will likely continue to rely heavily on equity as a component of executive compensation to attract and retain key personnel. However, if the value of the underlying stock does not increase commensurate with expectations or consistent with Uber's historical stock price growth, the equity incentives may not be as effective as in the past. To ensure that executive compensation remains competitive and attractive, the Compensation Committee has incorporated employee retention metrics into the compensation packages of our most senior executives. In addition, the company has adapted to a new way 

"\n\nExecutive compensation is the total compensation package offered to executives and key personnel of a company. This package typically includes salary, bonuses, stock options, and other benefits. In 2022, Uber will likely continue to rely heavily on equity as a component of executive compensation to attract and retain key personnel. However, if the value of the underlying stock does not increase commensurate with expectations or consistent with Uber's historical stock price growth, the equity incentives may not be as effective as in the past. To ensure that executive compensation remains competitive and attractive, the Compensation Committee has incorporated employee retention metrics into the compensation packages of our most senior executives. In addition, the company has adapted to a new way of working by shifting to a hybrid model where employees have flexibility to work from home, and has taken steps to ensure employee engagement and development through surveys, feedback, and 

In [43]:
cross_query_str = (
    "Compare/contrast the risk factors described in the Uber 10-K across years. Give answer in bullet points."
)

agent_chain.run(input=cross_query_str)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
AI: 

The following are some of the key risk factors that have been identified in Uber's SEC 10-K filings across the years:

2019: 
• Drivers being classified as employees, workers or quasi-employees instead of independent contractors.
• High competition in the mobility, delivery, and logistics industries, with well-established and low-cost alternatives that have been available for decades, low barriers to entry, low switching costs, and well-capitalized competitors in nearly every major geographic region.
• Lowering of fares or service fees to remain competitive in certain markets.
• Significant losses since inception, including in the United States and other major markets, and the expectation of increasing operating expenses in the foreseeable future.

2020: 
• Drivers being classified as employees, workers or quasi-employees instead of independent contractors.
• High competition in the mobility, delivery, and logistics in

"The following are some of the key risk factors that have been identified in Uber's SEC 10-K filings across the years:\n\n2019: \n• Drivers being classified as employees, workers or quasi-employees instead of independent contractors.\n• High competition in the mobility, delivery, and logistics industries, with well-established and low-cost alternatives that have been available for decades, low barriers to entry, low switching costs, and well-capitalized competitors in nearly every major geographic region.\n• Lowering of fares or service fees to remain competitive in certain markets.\n• Significant losses since inception, including in the United States and other major markets, and the expectation of increasing operating expenses in the foreseeable future.\n\n2020: \n• Drivers being classified as employees, workers or quasi-employees instead of independent contractors.\n• High competition in the mobility, delivery, and logistics industries, with well-established and low-cost alternatives

In [44]:
# reinitialize agent
memory = ConversationBufferMemory(memory_key="chat_history")
llm=OpenAI(temperature=0)
agent_chain = create_llama_chat_agent(
    toolkit,
    llm,
    memory=memory,
)


In [45]:
while True:
    text_input = input("User: ")
    response = agent_chain.run(input=text_input)
    print(f'Agent: {response}')
    


User: Amit 
Agent: Hi Amit, how can I help you?
User: How is Uber's stock performing in 2022 and what are the risks going forward?
Agent: 

Uber's stock performance and risks are largely dependent on the global economic conditions and the impact of the COVID-19 pandemic. The pandemic has had a significant impact on Uber's business, resulting in a net loss of $9.1 billion in 2022, including a pre-tax unrealized loss on debt and equity securities of $7.0 billion. Additionally, Uber's Adjusted EBITDA was $1.7 billion, growing $2.5 billion compared to 2021. The pandemic has also caused Uber to suspend its shared rides offering globally and offer "leave at door" delivery options for Delivery offerings. 

In November 2022, Uber issued 72 shares of its common stock to holders of Careem Convertible Notes who elected to convert the balance of such notes to common stock at a conversion price of $55 per share. The terms of certain of Uber's outstanding debt instruments restrict its ability to pay

KeyboardInterrupt: ignored