# Automotive Equity Research: A Multi-Step Agentic Workflow

<a href="https://colab.research.google.com/github/run-llama/llama_cloud_services-demo/blob/main/examples/extract/automotive_sector_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook demonstrates an end‑to‑end agentic workflow using LlamaExtract and the LlamaIndex event‑driven workflow framework for automotive sector analysis.

In this workflow, we:
1. **Extract** key financial metrics from Q2 2024 earnings reports for Tesla and Ford.
2. **Generate** a preliminary financial model summary for each company using an LLM.
3. **Cross‑reference** Tesla's metrics with Ford's data to produce a final equity research memo.
4. **Output** the memo as structured JSON.

This workflow is designed for equity research analysts and investment professionals.

In [1]:
llama_api_key = ""
openai_api_key = ""
openai_api_url = ""
project_id=""
organization_id=""

In [2]:
import os
import urllib.request

# Create the directory if it doesn't exist
os.makedirs("data/automotive_sector_analysis", exist_ok=True)

# Download the Tesla Q2 earnings PDF
tesla_url = "https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q2-2024-Update.pdf"
tesla_path = "data/automotive_sector_analysis/tesla_q2_earnings.pdf"
urllib.request.urlretrieve(tesla_url, tesla_path)

# Download the Ford Q2 earnings PDF
ford_url = "https://s205.q4cdn.com/882619693/files/doc_financials/2024/q2/Q2-2024-Ford-Earnings-Press-Release.pdf"
ford_path = "data/automotive_sector_analysis/ford_q2_earnings_press_release.pdf"
urllib.request.urlretrieve(ford_url, ford_path)

('data/automotive_sector_analysis/ford_q2_earnings_press_release.pdf',
 <http.client.HTTPMessage at 0x1fc80063230>)

## Define the Output Schema

We define a schema to represent the final equity research memo. This includes the company name, a summary of the financial model, a comparative analysis, and an overall recommendation.

In [3]:
from pydantic import BaseModel, Field, ConfigDict
from typing import Optional


class RawFinancials(BaseModel):
    revenue: Optional[float] = Field(
        None, description="Extracted revenue (in million USD)"
    )
    operating_income: Optional[float] = Field(
        None, description="Extracted operating income (in million USD)"
    )
    eps: Optional[float] = Field(None, description="Extracted earnings per share")
    # Add more metrics as needed
    model_config = ConfigDict(extra='forbid')


class InitialFinancialDataOutput(BaseModel):
    company_name: str = Field(
        ..., description="Company name as extracted from the earnings deck"
    )
    ticker: str = Field(..., description="Stock ticker symbol")
    report_date: str = Field(..., description="Date of the earnings deck/report")
    raw_financials: RawFinancials = Field(
        ..., description="Structured raw financial metrics"
    )
    narrative: Optional[str] = Field(
        None, description="Additional narrative content (if any)"
    )
    model_config = ConfigDict(extra='forbid')

In [4]:
from pydantic import BaseModel, Field
from typing import List, Dict


# Define the structured output schema for each company's financial model
class FinancialModelOutput(BaseModel):
    revenue_projection: float = Field(
        ..., description="Projected revenue for next year (in million USD)"
    )
    operating_income_projection: float = Field(
        ..., description="Projected operating income for next year (in million USD)"
    )
    growth_rate: float = Field(..., description="Expected revenue growth rate (%)")
    discount_rate: float = Field(
        ..., description="Discount rate (%) used for valuation"
    )
    terminal_growth_rate: float = Field(
        ..., description="Terminal growth rate (%) used in the model"
    )
    valuation_estimate: float = Field(
        ..., description="Estimated enterprise value (in million USD)"
    )
    key_assumptions: str = Field(
        ..., description="Key assumptions such as tax rate, CAPEX ratio, etc."
    )
    summary: str = Field(
        ..., description="A brief summary of the preliminary financial model analysis."
    )
    model_config = ConfigDict(extra='forbid')


class ComparativeAnalysisOutput(BaseModel):
    comparative_analysis: str = Field(
        ..., description="Comparative analysis between Company A and Company B"
    )
    overall_recommendation: str = Field(
        ..., description="Overall investment recommendation with rationale"
    )
    model_config = ConfigDict(extra='forbid')


# Define the final equity research memo schema, which aggregates the outputs for Company A and B
class FinalEquityResearchMemoOutput(BaseModel):
    company_a_model: FinancialModelOutput = Field(
        ..., description="Financial model summary for Company A"
    )
    company_b_model: FinancialModelOutput = Field(
        ..., description="Financial model summary for Company B"
    )
    comparative_analysis: ComparativeAnalysisOutput = Field(
        ..., description="Comparative analysis between Company A and Company B"
    )
    model_config = ConfigDict(extra='forbid')


print("FinalEquityResearchMemoOutput schema defined.")

FinalEquityResearchMemoOutput schema defined.


## Initialize the Extraction Agent

We create (or replace) an extraction agent using our automotive sector analysis schema.

In [5]:
from dotenv import load_dotenv
from llama_cloud_services import LlamaExtract
from llama_cloud.core.api_error import ApiError
from llama_cloud import ExtractConfig


llama_extract = LlamaExtract(
    project_id=project_id,
    organization_id=organization_id,
    api_key=llama_api_key
)

try:
    existing_agent = llama_extract.get_agent(name="automotive-sector-analysis")
    if existing_agent:
        llama_extract.delete_agent(existing_agent.id)
except ApiError as e:
    if e.status_code == 404:
        pass
    else:
        raise

extract_config = ExtractConfig(
    extraction_mode="BALANCED"
    # extraction_mode="MULTIMODAL"
)

agent = llama_extract.create_agent(
    name="automotive-sector-analysis",
    data_schema=InitialFinancialDataOutput,
    config=extract_config,
)
print("Automotive sector analysis extraction agent created.")

ValueError: The API key is required.

## Define the Workflow

This workflow analyzes Q2 2024 earnings reports for two major automotive companies:

- **Tesla (TSLA)**: Focus on electric vehicles, energy storage, and regulatory credits
- **Ford (F)**: Traditional automotive manufacturer with growing EV segment

Key metrics extracted and analyzed:
- Revenue and revenue projections
- Operating income
- Growth rates
- Valuation estimates
- Key business segment performance

In this workflow, the steps are:
1. **parse_transcript:** Extract text (with page citations) from the earnings call transcript PDF.
2. **load_modeling_data:** Load financial modeling assumptions from a text file.
3. **generate_financial_model:** Generate a preliminary financial model summary using an LLM.
4. **load_comparable_data:** **Extract** comparable financial metrics from a PDF file (Company B).
5. **cross_reference:** Compare Company A’s metrics with Company B’s data using the LLM.
6. **output_final_memo:** Assemble the final equity research memo and output it as JSON.

In [6]:
from openai import AsyncAzureOpenAI
import os
load_dotenv(override=True)

def create_response_format(schema):
    response_format = {
        "type": "json_schema",
        "json_schema": {
            "name": "json_output_schema",
            "schema": {**schema},
            "strict": True
        }
    }
    return response_format

class AzureChatLLM:
    def __init__(self, deployment_name: str, api_version: str = "2025-01-01-preview"):
        self.client = AsyncAzureOpenAI(
            api_key=openai_api_key,
            azure_endpoint=openai_api_url,
            api_version=api_version,
        )
        self.deployment_name = deployment_name

    async def astructured_predict(self, output_class, prompt_template, **kwargs):
        user_message = prompt_template.format(**kwargs)
        messages = [{"role": "user", "content": user_message}]

        response = await self.client.chat.completions.create(
            model=self.deployment_name,
            messages=messages,
            response_format=create_response_format(output_class.model_json_schema())
            # response_format=output_class
        )

        content = response.choices[0].message.content
        return output_class.model_validate_json(content)


In [21]:
from llama_index.core.workflow import (
    Event,
    StartEvent,
    StopEvent,
    Context,
    Workflow,
    step,
)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms.llm import LLM
from llama_index.core.prompts import ChatPromptTemplate


# Define custom events for each step
class DeckAParseEvent(Event):
    deck_content: InitialFinancialDataOutput


class DeckBParseEvent(Event):
    deck_content: InitialFinancialDataOutput


class CompanyModelEvent(Event):
    model_output: FinancialModelOutput


class ComparableDataLoadEvent(Event):
    company_a_output: FinancialModelOutput
    company_b_output: FinancialModelOutput


class LogEvent(Event):
    msg: str
    delta: bool = False


class AutomotiveSectorAnalysisWorkflow(Workflow):
    """
    Workflow to generate an equity research memo for automotive sector analysis.
    """

    def __init__(
        self,
        agent: LlamaExtract,
        modeling_path: str,
        llm: Optional[LLM] = None,
        **kwargs
    ):
        super().__init__(**kwargs)
        self.agent = agent
        self.llm = llm or AzureChatLLM(deployment_name="gpt-4.1")
        # Load financial modeling assumptions from file
        with open(modeling_path, "r") as f:
            self.modeling_data = f.read()
        # Instead of loading comparable data from a text file, we load from a PDF

    async def _parse_deck(self, ctx: Context, deck_path) -> InitialFinancialDataOutput:
        extraction_result = await self.agent.aextract(deck_path)
        initial_output = extraction_result.data  # expected to be a string
        ctx.write_event_to_stream(LogEvent(msg="Transcript parsed successfully."))
        return initial_output

    @step
    async def parse_deck_a(self, ctx: Context, ev: StartEvent) -> DeckAParseEvent:
        initial_output = await self._parse_deck(ctx, ev.deck_path_a)
        await ctx.set("initial_output_a", initial_output)
        return DeckAParseEvent(deck_content=initial_output)

    @step
    async def parse_deck_b(self, ctx: Context, ev: StartEvent) -> DeckBParseEvent:
        initial_output = await self._parse_deck(ctx, ev.deck_path_b)
        await ctx.set("initial_output_b", initial_output)
        return DeckBParseEvent(deck_content=initial_output)

    async def _generate_financial_model(
        self, ctx: Context, financial_data: InitialFinancialDataOutput
    ) -> FinancialModelOutput:
        prompt_str = """
    You are an expert financial analyst.
    Using the following raw financial data from an earnings deck and financial modeling assumptions,
    refine the data to produce a financial model summary. Adjust the assumptions based on the company-specific context.
    Please use the most recent quarter's financial data from the earnings deck.

    Raw Financial Data:
    {raw_data}
    Financial Modeling Assumptions:
    {assumptions}

    Return your output as JSON conforming to the FinancialModelOutput schema.
    You MUST make sure all fields are filled in the output JSON.

    """
        prompt = ChatPromptTemplate.from_messages([("user", prompt_str)])
        refined_model = await self.llm.astructured_predict(
            FinancialModelOutput,
            prompt,
            raw_data=financial_data.model_dump_json(),
            assumptions=self.modeling_data,
        )
        return refined_model

    @step
    async def refine_financial_model_company_a(
        self, ctx: Context, ev: DeckAParseEvent
    ) -> CompanyModelEvent:
        print("deck content A", ev.deck_content)
        refined_model = await self._generate_financial_model(ctx, ev.deck_content)
        print("refined_model A", refined_model)
        print(type(refined_model))
        await ctx.set("CompanyAModelEvent", refined_model)
        return CompanyModelEvent(model_output=refined_model)

    @step
    async def refine_financial_model_company_b(
        self, ctx: Context, ev: DeckBParseEvent
    ) -> CompanyModelEvent:
        print("deck content B", ev.deck_content)
        refined_model = await self._generate_financial_model(ctx, ev.deck_content)
        print("refined_model B", refined_model)
        print(type(refined_model))
        await ctx.set("CompanyBModelEvent", refined_model)
        return CompanyModelEvent(model_output=refined_model)

    @step
    async def cross_reference_models(
        self, ctx: Context, ev: CompanyModelEvent
    ) -> StopEvent:
        # Assume CompanyAModelEvent and CompanyBModelEvent are stored in the context
        company_a_model = await ctx.get("CompanyAModelEvent", default=None)
        company_b_model = await ctx.get("CompanyBModelEvent", default=None)
        if company_a_model is None or company_b_model is None:
            return

        prompt_str = """
    You are an expert investment analyst.
    Compare the following refined financial models for Company A and Company B.
    Based on this comparison, provide a specific investment recommendation for Tesla (Company A).
    Focus your analysis on:
    1. Key differences in revenue projections, operating income, and growth rates
    2. Valuation estimates and their implications
    3. Clear recommendation for Tesla with supporting rationale
    Return your analysis as plain text.
    Company A Model:
    {company_a_model}
    Company B Model:
    {company_b_model}
    """
        prompt = ChatPromptTemplate.from_messages([("user", prompt_str)])
        comp_analysis = await self.llm.astructured_predict(
            ComparativeAnalysisOutput,
            prompt,
            company_a_model=company_a_model.model_dump_json(),
            company_b_model=company_b_model.model_dump_json(),
        )
        final_memo = FinalEquityResearchMemoOutput(
            company_a_model=company_a_model,
            company_b_model=company_b_model,
            comparative_analysis=comp_analysis,
        )
        return StopEvent(result={"memo": final_memo})

## Running the Workflow

Now we run the workflow with the pre-loaded modeling assumptions and the deck from both companies.

In [22]:
import nest_asyncio

nest_asyncio.apply()

In [23]:
modeling_path = "./data/automotive_sector_analysis/modeling_assumptions.txt"
workflow = AutomotiveSectorAnalysisWorkflow(
    agent=agent, modeling_path=modeling_path, verbose=True, timeout=240
)

#### Visualize the Workflow

![](data/automotive_sector_analysis/workflow_img.png)

In [30]:
result = await workflow.run(
    deck_path_a=deck_path_a,
    deck_path_b=deck_path_b,
)
final_memo = result["memo"]
print("\n********Final Equity Research Memo:********\n", final_memo)

Running step parse_deck_a


Uploading files:   0%|                                                                           | 0/1 [00:00<?, ?it/s]

Running step parse_deck_b



Uploading files:   0%|                                                                           | 0/1 [00:00<?, ?it/s][A
Uploading files:   0%|                                                                           | 0/1 [00:00<?, ?it/s]


WorkflowRuntimeError: Error in step 'parse_deck_a': Use SourceText to provide filename when uploading bytes or file-like objects.

In [15]:
final_memo.comparative_analysis

ComparativeAnalysisOutput(comparative_analysis="1. Revenue Projections, Operating Income, and Growth Rates:\n- Tesla (Company A) has a projected next-year revenue of $28.05B and operating income of $1.76B, with a 10% annual revenue growth rate for the next 5 years. Ford (Company B) projects a notably higher next-year revenue at $52.58B and operating income of $3.08B, also with 10% annual growth.\n- Both companies assume similar stabilization for operating margins and growth rates (10% near-term; 5% longer-term).\n- Tesla operates at a smaller revenue base but higher growth in some segments (notably Energy Storage), while Ford's higher revenue/operating income reflects its larger legacy business but possibly slower long-term transformation.\n\n2. Valuation Estimates and Implications:\n- Tesla’s model yields a valuation estimate of $795B, far exceeding Ford's $60B valuation (even though Ford’s revenue and operating income are higher in nominal terms).\n- This results in a much higher val