<h1> Equity IQ </h1>
An AI-powered multi-agent financial system using CrewAI/AutoGen and LlamaIndex to automate financial data extraction, analysis, and summarization, enabling more efficient and informed investment decisions.


1. Installing all dependencies

In [None]:
!pip install crewai
!pip install langchain_groq
!pip install crewai_tools
!pip install langchain_huggingface

!pip install llama_index.llms.groq
!pip install llama-index-embeddings-huggingface
!pip install llama-parse
!pip install llama-index-llms-langchain
!pip install llama_index.evaluation
!pip install ragas

Collecting crewai
  Downloading crewai-0.114.0-py3-none-any.whl.metadata (33 kB)
Collecting appdirs>=1.4.4 (from crewai)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting auth0-python>=4.7.1 (from crewai)
  Downloading auth0_python-4.9.0-py3-none-any.whl.metadata (9.0 kB)
Collecting chromadb>=0.5.23 (from crewai)
  Downloading chromadb-1.0.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting instructor>=1.3.3 (from crewai)
  Downloading instructor-1.7.9-py3-none-any.whl.metadata (22 kB)
Collecting json-repair>=0.25.2 (from crewai)
  Downloading json_repair-0.41.1-py3-none-any.whl.metadata (11 kB)
Collecting json5>=0.10.0 (from crewai)
  Downloading json5-0.12.0-py3-none-any.whl.metadata (36 kB)
Collecting jsonref>=1.1.0 (from crewai)
  Downloading jsonref-1.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting litellm==1.60.2 (from crewai)
  Downloading litellm-1.60.2-py3-none-any.whl.metadata (36 kB)
Collecting opentelemetry-e

In [None]:
import os
import json
import csv


from crewai import Agent, Task, Crew

from langchain_groq import ChatGroq

For this project, we are using the Llama 3.3 70B versatile model. This model supports tool calling and is the right fit between performance and compute. We are using Groq APIs for this model. <br/><br/>
We define a global base_llm model

In [None]:
from google.colab import userdata

os.environ["GROQ_API_KEY"] = "" #use your own api key here

base_llm = ChatGroq(
    model_name="groq/llama-3.3-70b-versatile",
    temperature=0.0,
    max_tokens=2000
)


In [None]:
import os
os.makedirs('data', exist_ok=True)


!wget "https://s23.q4cdn.com/407969754/files/doc_financials/2019/ar/Uber-Technologies-Inc-2019-Annual-Report.pdf" -O data/uber_10k.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf" -O data/apple_2023.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2022/q4/_10-K-2022-(As-Filed).pdf" -O data/apple_2022.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf" -O data/apple_2021.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2020/ar/_10-K-2020-(As-Filed).pdf" -O data/apple_2020.pdf
!wget "https://www.dropbox.com/scl/fi/i6vk884ggtq382mu3whfz/apple_2019_10k.pdf?rlkey=eudxh3muxh7kop43ov4bgaj5i&dl=1" -O data/apple_2019.pdf

# download Tesla
!wget "https://ir.tesla.com/_flysystem/s3/sec/000162828024002390/tsla-20231231-gen.pdf" -O data/tesla_2023.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000095017023001409/tsla-20221231-gen.pdf" -O data/tesla_2022.pdf
!wget "https://www.dropbox.com/scl/fi/ptk83fmye7lqr7pz9r6dm/tesla_2021_10k.pdf?rlkey=24kxixeajbw9nru1sd6tg3bye&dl=1" -O data/tesla_2021.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459021004599/tsla-10k_20201231-gen.pdf" -O data/tesla_2020.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459020004475/tsla-10k_20191231-gen_0.pdf" -O data/tesla_2019.pdf


--2025-04-19 17:34:18--  https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf
Resolving s2.q4cdn.com (s2.q4cdn.com)... 68.70.205.3, 68.70.205.4, 68.70.205.1, ...
Connecting to s2.q4cdn.com (s2.q4cdn.com)|68.70.205.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 714094 (697K) [application/pdf]
Saving to: ‘data/apple_2023.pdf’


2025-04-19 17:34:18 (54.2 MB/s) - ‘data/apple_2023.pdf’ saved [714094/714094]

--2025-04-19 17:34:18--  https://ir.tesla.com/_flysystem/s3/sec/000162828024002390/tsla-20231231-gen.pdf
Resolving ir.tesla.com (ir.tesla.com)... 23.219.8.88, 2600:1408:9000:684::700, 2600:1408:9000:695::700
Connecting to ir.tesla.com (ir.tesla.com)|23.219.8.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/pdf]
Saving to: ‘data/tesla_2023.pdf’

data/tesla_2023.pdf     [ <=>                ] 961.50K  --.-KB/s    in 0.01s   

2025-04-19 17:34:19 (66.6 MB/s) - ‘data/tesl

<h2> RAG Agent </h2>
The first step in our pipeline is the RAG Agent. Based on the user's selection, the appropriate report is ingested.
<h3> Model </h3>

The RAG Agent uses the Llama3-8b model with a temperature set to 0.0 to promote determinism.

<h3> Embeddings </h3>
The RAG Agent uses BAAI's bge-small-en-v1.5 embeddings for document retrieval.

In [None]:
!rm -rf vector_store
!pip install llama-index-readers-file

Collecting llama-index-readers-file
  Downloading llama_index_readers_file-0.4.7-py3-none-any.whl.metadata (5.4 kB)
Collecting striprtf<0.0.27,>=0.0.26 (from llama-index-readers-file)
  Downloading striprtf-0.0.26-py3-none-any.whl.metadata (2.1 kB)
Downloading llama_index_readers_file-0.4.7-py3-none-any.whl (40 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading striprtf-0.0.26-py3-none-any.whl (6.9 kB)
Installing collected packages: striprtf, llama-index-readers-file
Successfully installed llama-index-readers-file-0.4.7 striprtf-0.0.26


In [None]:
from crewai_tools import PDFSearchTool
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from langchain_openai import ChatOpenAI
from llama_index.core import StorageContext, load_index_from_storage

import os

chat_llm = ChatOpenAI(
    openai_api_base="https://api.groq.com/openai/v1",
    openai_api_key=os.environ['GROQ_API_KEY'],
    model="gemma2-9b-it",
    temperature=0,
)

VECTOR_STORE_DIR = "vector_store"
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

def createVectorStore():

  # loads BAAI/bge-small-en-v1.5

  if os.path.exists(VECTOR_STORE_DIR):
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir=VECTOR_STORE_DIR)

    # load index
    index = load_index_from_storage(storage_context,embed_model=embed_model)
    print("Vector store loaded successfully!")

  else:
    # get the data to be ingested
    documents = SimpleDirectoryReader("data").load_data()

    # creates vector store index for the documents using the embedding model provided
    index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

    # create vector store dir if it didn't exist
    os.makedirs(VECTOR_STORE_DIR, exist_ok=True)

    index.storage_context.persist(persist_dir=VECTOR_STORE_DIR)
    print("Vector store created successfully!")
    print("Vector store saved to:", VECTOR_STORE_DIR)


  # create query engine to query the vector store index and generate the context response
  query_engine = index.as_query_engine(similarity_top_k=5,llm=chat_llm, groq_api_key = os.environ['GROQ_API_KEY'])

  return query_engine


vector_store = createVectorStore()


Vector store loaded successfully!


### Querying vector store returns top k semantically match nodes, with the final response from the llm using those context

In [None]:
vector_store.query("What are the risk factors associated with Tesla?").response

/usr/local/lib/python3.11/dist-packages/llama_index/llms/langchain/utils.py:51: PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  for req_key in LC_MessageClass.schema().get("required"):
/usr/local/lib/python3.11/dist-packages/llama_index/llms/langchain/utils.py:51: PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  for req_key in LC_MessageClass.schema().get("required"):


"Tesla's success depends on consumer demand for electric vehicles, which is influenced by factors like public perception of electric vehicles, charging infrastructure availability, and competition from other vehicle types.  \n\nThe automotive industry is known for its cyclical nature and volatility, and Tesla's sales could be affected by economic downturns or shifts in consumer preferences.  \n\nAdditionally, Tesla's reliance on lithium-ion batteries and raw materials like lithium, nickel, and cobalt exposes them to price fluctuations and supply chain disruptions. \n\n\nTesla also operates in a highly competitive market and faces risks related to government regulations, economic incentives, and consumer concerns about the company's future viability. \n"

## Evaluate RAG Database

In [None]:
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    Faithfulness,
    ResponseRelevancy,
    LLMContextPrecisionWithReference,
    LLMContextRecall,
)
from langchain_openai import ChatOpenAI
from ragas.llms import LangchainLLMWrapper
import os
from google.colab import userdata

# # Set up environment variables for Groq
# os.environ["OPENAI_API_KEY"] = userdata.get('GROQ_API_KEY')
# os.environ["OPENAI_BASE_URL"] = "https://api.groq.com/openai/v1"

# Initialize Groq LLM via LangChain
chat_llm = ChatOpenAI(
    openai_api_base="https://api.groq.com/openai/v1",
    openai_api_key=os.environ["GROQ_API_KEY"],
    model="llama3-70b-8192",  # corrected model name if needed
    temperature=0
)

# Wrap it for RAGAS
evaluator_llm = LangchainLLMWrapper(chat_llm)

# # Prepare your evaluation dataset
with open('ragas_dataset.json', 'r') as f:
    json_data = json.load(f)

# Create a HuggingFace Dataset
dataset = Dataset.from_dict(json_data)

# Evaluate using RAGAS
results = evaluate(
    dataset=dataset,
    metrics=[
        Faithfulness(llm=evaluator_llm),
        LLMContextPrecisionWithReference(llm=evaluator_llm),
        LLMContextRecall(llm=evaluator_llm),
    ],
    batch_size=5
)

# # Print results
# print(results)
print(results.to_pandas())

# Save to CSV
results.to_pandas().to_csv("groq_llama3_ragas_eval.csv", index=False)


Evaluating:   0%|          | 0/120 [00:00<?, ?it/s]

Batch 1/24:   0%|          | 0/5 [00:00<?, ?it/s]

ERROR:ragas.executor:Exception raised in Job[15]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[17]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[26]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[33]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[38]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[36]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[45]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[56]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[60]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[63]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[64]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[71]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[72]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[81]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[83]: TimeoutError()
ERROR:ragas.executor:Exce

{'faithfulness': 0.8428, 'llm_context_precision_with_reference': 0.8717, 'context_recall': 0.9067}
                                           user_input  \
0    What are the risk factors associated with Tesla?   
1                            What is Tesla’s mission?   
2          What business segments does Tesla operate?   
3   What products and services are included in Tes...   
4   What products and services are included in Tes...   
5   Which electric vehicle models does Tesla manuf...   
6     What details are provided about the Cybertruck?   
7        What information is given on the Tesla Semi?   
8      What energy storage products does Tesla offer?   
9          What is Powerwall and what is it used for?   
10  What is Megapack and what applications does it...   
11    What solar energy offerings does Tesla provide?   
12           What is Solar Roof and how does it work?   
13  What proprietary lithium‑ion battery cell tech...   
14  How does Tesla describe its Full Self‑Driv