In [1]:
%pip install llama-index
%pip install llama-parse

Collecting llama-index
  Downloading llama_index-0.12.13-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.2-py3-none-any.whl.metadata (727 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_cli-0.4.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.13 (from llama-index)
  Downloading llama_index_core-0.12.13-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.4-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-llms-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_llms_openai-0.3.14-py3-none-any.whl.metadata (3.3 kB)
Collec

In [1]:
import nest_asyncio

from llama_index.llms.openai import OpenAI
from llama_index.core import VectorStoreIndex
from IPython.display import Image, Markdown
from dotenv import load_dotenv
from llama_parse import LlamaParse

from llama_index.core.node_parser import MarkdownElementNodeParser

load_dotenv()


True

In [2]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
nest_asyncio.apply()

In [3]:
import os

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

llm_o1 = OpenAI(model="o1-mini")
llm_gpt4o_mini = OpenAI(model="gpt-4o-mini")
llm_o1_preview = OpenAI(model="o1-preview")

In [4]:
parser = LlamaParse(
    api_key=os.getenv("LLAMA_KEY"),
    result_type="markdown",
)

excel_files = [
    "/Users/jeffreyjeyachandren/Desktop/tpm_assessment_insurance_proj/data/Financial_Report.xlsx",
    "/Users/jeffreyjeyachandren/Desktop/tpm_assessment_insurance_proj/data/Financial_Report (1).xlsx",
    "/Users/jeffreyjeyachandren/Desktop/tpm_assessment_insurance_proj/data/Financial_Report (2).xlsx",
    "/Users/jeffreyjeyachandren/Desktop/tpm_assessment_insurance_proj/data/Financial_Report (3).xlsx",
]

# Initialize an empty list to store all documents
all_documents = []

# Load each Excel file and extend the documents list
for file_path in excel_files:
    documents = parser.load_data(file_path)
    all_documents.extend(documents)


Error while parsing the file '/content/sample_data/Financial_Report.xlsx': [Errno 2] No such file or directory: '/content/sample_data/Financial_Report.xlsx'
Error while parsing the file '/content/sample_data/Financial_Report (1).xlsx': [Errno 2] No such file or directory: '/content/sample_data/Financial_Report (1).xlsx'
Error while parsing the file '/content/sample_data/Financial_Report (2).xlsx': [Errno 2] No such file or directory: '/content/sample_data/Financial_Report (2).xlsx'
Error while parsing the file '/content/sample_data/Financial_Report (3).xlsx': [Errno 2] No such file or directory: '/content/sample_data/Financial_Report (3).xlsx'
Error while parsing the file '/content/sample_data/Financial_Report.xlsx': [Errno 2] No such file or directory: '/content/sample_data/Financial_Report.xlsx'


In [7]:
len(all_documents)


123

In [8]:
node_parser = MarkdownElementNodeParser(llm=llm_gpt4o_mini, num_workers=4)


In [9]:
nodes = node_parser.get_nodes_from_documents(all_documents)


1it [00:00, 458.24it/s]
1it [00:00, 996.51it/s]
1it [00:00, 7825.19it/s]
1it [00:00, 7206.71it/s]
1it [00:00, 6452.78it/s]
1it [00:00, 9383.23it/s]
1it [00:00, 2904.64it/s]
1it [00:00, 4691.62it/s]
1it [00:00, 719.31it/s]
1it [00:00, 2091.92it/s]


In [10]:
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)


In [11]:
len(nodes), len(base_nodes), len(objects)


(30, 10, 10)

In [12]:
print(objects[3].get_content())


This table presents the Consolidated Statement of Comprehensive Income (Loss) for a company over three years, detailing net income and various components of other comprehensive income, including unrealized gains and losses on investments, changes in benefit plan assets, and foreign currency translation adjustments.,
with the following table title:
Consolidated Statement of Comprehensive Income (Loss) - USD ($) $ in Millions,
with the following columns:
- Net income: Represents the company's profit for the period.
- Other comprehensive income (loss): Includes various gains and losses not included in net income.
- Comprehensive income (loss): Total income including net income and other comprehensive income.



In [13]:
# dump both indexed tables and page text into the vector index
recursive_index = VectorStoreIndex(nodes=base_nodes + objects, llm=llm_gpt4o_mini)

recursive_query_engine_gpt4o_mini = recursive_index.as_query_engine(
    similarity_top_k=5, llm=llm_gpt4o_mini
)

In [16]:
query = "What are the causes driving the largest amount of losses?"

response_recursive_gpt4o_mini = recursive_query_engine_gpt4o_mini.query(query)

In [18]:
print("----------------------RESPONSE WITH GPT4O-MINI----------------------")
display(Markdown(f"{response_recursive_gpt4o_mini}"))

----------------------RESPONSE WITH GPT4O-MINI----------------------


The largest amount of losses can be attributed to several factors, including significant unrealized losses on investment securities, particularly those classified as having no credit losses recognized. Additionally, changes in benefit plan assets and obligations, as well as foreign currency translation adjustments, have contributed to the overall comprehensive income (loss). The fluctuations in these areas can lead to substantial impacts on the company's financial performance, as reflected in the other comprehensive income (loss) figures.

In [20]:
print(response_recursive_gpt4o_mini.source_nodes[0].get_content())


This table presents the Consolidated Statement of Comprehensive Income (Loss) for a company over three years, detailing net income and various components of other comprehensive income, including unrealized gains and losses on investments, changes in benefit plan assets, and foreign currency translation adjustments.,
with the following table title:
Consolidated Statement of Comprehensive Income (Loss) - USD ($) $ in Millions,
with the following columns:
- Net income: Represents the company's profit for the period.
- Other comprehensive income (loss): Includes various gains and losses not included in net income.
- Comprehensive income (loss): Total income including net income and other comprehensive income.

|Consolidated Statement of Comprehensive Income (Loss) - USD ($) $ in Millions|12 Months Ended|             |             .1|
|---|---|---|---|
| |Dec. 31, 2023  |Dec. 31, 2022|Dec. 31, 2021|
|Statement of Comprehensive Income [Abstract]|               |             |             |
|