# Data Analysis and Q&A Project Using a Local LLM

## Project Overview

This project requires you to perform a comprehensive analysis of a company's stock data using only the provided data sources and a local LLM. Your analysis should answer the following six questions strictly based on the supplied data and documents—no external data is allowed. All generated answers must be firmly based on the provided data, without any fabricated content. In addition, your logic must be clear, and any attribution of events must be causally linked.

---

## Provided Data

You will be provided with the following data sets:

#### Stock Price Data (Json format)
* Timeframe: Jan 22 to Feb 5
* Fields: Open, High, Low, Close, Volume

#### Quarterly Earnings Data for the Past Year (Json format)
* Contains key financial indicators (e.g., revenue, eps) for each quarter.

#### Full Earnings Transcript Call
* The complete transcript of the earnings call, including management discussions and Q&A.

#### Balance Sheet Data for the Past Year (Json format)
* Includes assets, liabilities, and shareholders' equity information.

#### News Articles
* Full text of 10 news articles related to the company during the analysis period.

---

## Questions
Using the provided data and a local LLM, you need to answer the following six questions:

1. What is the performance of the Tesla stock during this period (Jan 22 to Feb 5)?

2. Why did the price increase on Jan 30? Please provide potential factors.

3. Compared with previous quarters, how is the performance of this quarter?

4. With unsupervised Full Self Driving scheduled to launch in limited markets like Austin by June, what regulatory challenges does Tesla foresee for a nationwide or international rollout, and how is the company strategically preparing to address these hurdles?

5. What insights can be concluded from the earnings call?

6. Which key news events influenced the stock performance, and what insights do they offer?

---

## Project Requirements
#### Data Source Restriction:
- Only use the provided data and documents. No external data or information is allowed.

#### Answer Generation:
- All generated answers must strictly be based on the provided data and documents. The LLM should not "invent" information.

#### Clear Logic and Causal Relationships:
- For each question, your answers must clearly demonstrate logical reasoning, and any attribution of cause must be explicitly linked to events in the data.

#### Prompt Design:
- You must design your own prompts for calling the local LLM to ensure that the responses are generated strictly based on the analysis results.

#### Result Evaluation:
- After generating the answers, implement an evaluation step to assess whether the responses meet the above requirements in terms of data reliance, logical clarity, and correct causation.

#### Please put the answers to these 6 questions in a dict at the end of your submitted Python nodebook file.

For example
```code
{ "Q1 answer": "Answer1", "Q2 answer": "Answer2", "Q3 answer": "Answer3", "Q4 answer4": "Answer4", "Q5 answer": "Answer5", "Q6 answer": "Answer6"}
```

In [47]:
%pip install transformers accelerate pandas ipython
# %pip install torch # Install PyTorch if you dont have it downloading 

DATA_DIR = "447_dataset"

import pandas as pd
import json
import os
from IPython.display import Markdown, display
import datetime
import textwrap


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


## Loading and Running the Local LLM

1. **Imports Transformers utilities**  
   - `AutoModelForCausalLM`: generic class for loading any GPT‑style model  
   - `AutoTokenizer`: matching tokenizer for converting text ↔ tokens  
   - `pipeline`: high‑level helper that ties model + tokenizer into one callable  

2. **Specifies the model repository**  
   ```python
   model_path = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

In [48]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_path = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="cuda"
)

tokenizer = AutoTokenizer.from_pretrained(model_path)

# The pipeline will automatically use the model and tokenizer you just loaded
tokenizer = AutoTokenizer.from_pretrained(model_path) # Load the tokenizer

 # Create a pipeline for text generation
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 2000, # Limit the number of tokens generated
    "return_full_text": False, # Return only the generated text
    "do_sample": True, # Use sampling to generate text
    "temperature": 0.1,# Control the randomness of the output
    "repetition_penalty": 1.1,
    "top_p": 0.9, # Control the diversity of the output
    "top_k": 50, # Control the diversity of the output
}

Device set to use cuda


In [106]:
general_system_prompt = ( 
    "You are an expert financial data analyst LLM.\n"
    "You must ensure the following rules are followed:\n"
    "1. Use only the data summaries provided in the prompt.\n"
    "2. Show clear step-by-step reasoning, linking each claim directly to the data.\n"
    "3. Ensure all explanations are clear, logical, and accurate.\n"
    "4. Do not invent or hallucinate any information.\n"
    "After your answer, provide a checklist summary indicating whether each criterion is satisfied. "
    "If any criterion is not met, include a brief explanation."
)


def trim_reasoning(raw):
    tag = "</think>"
    idx = raw.find(tag)

    if idx != -1:
        answer = raw[idx + len(tag):].strip()
    else:
        answer = raw.strip()

    return answer

## Evaluation of the LLM response using the LLM as an evaluator

The bellow code cell will show how we built our function to evaluate the LLM response based on the following parameters:
* 

In [124]:
evaluation_system_prompt = (
    "You are a critical and strict reviewer tasked with evaluating the quality and integrity of a financial analysis answers generated by another LLM.\n"
    "Your job is to verify that the answer:\n"
    "1. You need to strictly make sure only the data provided is used in the response (no outside info).\n"
    "2. Clearly explains its logic and reasoning.\n"
    "3. Attributes any cause-effect relationships directly to the data.\n"
    "4. You must strictly avoid speculation or using made-up facts.\n"
    "Please provide a checklist review with Yes/No for each point, followed by a short explanation of any issues you found."
)

evaluation_system_prompt_extra = ("YOU MUST JUST EVALUATE THE ANSWER GIVEN BY THE LLM!")


evaluation_user_prompt_template = """
    Below is a question, the data provided, and the answer generated by the LLM:

    **Question Answered by the LLM:**
    {question}

    **Data Provided to the LLM:**
    {data_snippet}

    **LLM's Answer:**
    {llm_answer}

    You must carefully evaluate the LLM's answer based on the criteria described as follows:
    1. You need to strictly make sure only the data provided is used in the response (no outside info).
    2. Clearly explains its logic and reasoning.
    3. Attributes any cause-effect relationships directly to the data.
    4. You must strictly avoid speculation or using made-up facts.
    """
def evaluate_response(generated_answer, data, question):
    evaluation_prompt = evaluation_user_prompt_template.format(
        question= question,
        data_snippet="\n\n".join(data),  # or a trimmed version of summaries
        llm_answer=generated_answer
    )

    messages = [
        {"role": "system", "content": evaluation_system_prompt},
        {"role": "system", "content": evaluation_system_prompt_extra},
        {"role": "user", "content": evaluation_prompt}
    ]

    evaluation_result = pipe(messages, **generation_args)
    display(Markdown(trim_reasoning(evaluation_result[0]['generated_text'])))


### Question 1:What is the performance of the Tesla stock during this period (Jan 22 to Feb 5)?

#### Prompt construction

* Here we need to monitor the performance of the tesla stock over the specified days. To monitor the performance of the stock the LLM just need to understand the how the pricing of the stock was through the given period hence why the LLM will need to see the infomation in the `prices.json` file.

In [102]:
"""   
This Function extracts the price information and converts the informations
into a simple readable format to be included in the prompt to the LLM
"""
def get_prices_summary():
    prices_path = os.path.join(DATA_DIR, "prices.json")

    # Load raw JSON into a DataFrame
    with open(prices_path, "r") as f:
        prices = pd.DataFrame(json.load(f))

    # Parse dates and index
    prices["Date"] = pd.to_datetime(prices["Date"])
    prices = prices.set_index("Date").sort_index()

    # Build a human-readable summary for each day
    daily_summaries = []
    for date, row in prices.iterrows():
        daily_summaries.append(
            f"{date.strftime('%Y-%m-%d')}: "
            f"Open ${row['Open']:.2f}, "
            f"High ${row['Hight']:.2f}, "
            f"Low ${row['Low']:.2f}, "
            f"Close ${row['Close']:.2f}, "
            f"Volume {int(row['Volume']):,}"
        )

    # Join them into one block of text
    daily_summary_text = "\n".join(daily_summaries)
    return daily_summary_text


#print(get_prices_summary())

In [71]:
user_prompt_template_q1 =( """
    Here is the summarized Tesla data for Jan 22 - Feb 5:
    {price_info}

    **Question:**  
    What was the performance of Tesla stock over this period?
                          
    **Please:**
    - Write a brief **Introduction** stating the question and data scope.  
    - In your **Analysis**, cite the exact figures (dates, prices, percent changes, volumes).  
    - Draw any causal insights clearly (e.g., “the drop on Feb 1 may be linked to…”).  
    - Finish with a concise **Conclusion** summarizing overall performance.
    - You must strictly avoid speculation or using made-up facts outside of the data provided.
""")

# Fill in the template with specific data
filled_user_prompt = user_prompt_template_q1.format(
    price_info = get_prices_summary()
)

messages = [
    {"role": "system", "content": general_system_prompt},
    {"role": "user", "content": filled_user_prompt},
]

question1Answer = pipe(messages, **generation_args)
display(Markdown(trim_reasoning(question1Answer[0]['generated_text'])))


### Introduction  
Tesla stock performance from January 22nd to February 5th was analyzed using provided data summaries. The study examined opening, high, low, close prices, and trading volumes across these dates. The goal was to assess the stock's stability and volatility during this period.

---

### Analysis  

#### Key Dates and Figures  
| Date       | Open Price | High Price | Low Price | Close Price | Trading Volume | Volume % Change (%) |
|------------|-------------|------------|-----------|-------------|----------------|--------------------|
| 2025-01-22 | $416.81     | $428.00    | $414.59   | $415.11      | 60,963         | +0.5%              |
| 2025-01-23 | $416.06     | $420.73    | $408.95   | $412.38      | 50,690         | -0.6%             |
| 2025-01-24 | $414.45     | $418.88    | $405.78   | $406.58      | 56,427         | -1.2%             |
| 2025-01-27 | $394.80     | $406.69    | $389.00   | $397.15      | 58,125         | -2.1%             |
| 2025-01-28 | $396.91     | $400.59    | $386.50   | $398.09      | 48,910         | -1.4%             |
| 2025-01-29 | $395.21     | $398.59    | $384.48   | $389.10      | 68,033         | -2.0%             |

#### Trading Activity  
- **Volume % Change**: Trading volumes peaked on the 22nd at 60,963 shares, suggesting strong interest. The volume dropped on the 23rd to 50,690 shares, likely due to selling pressure or market sentiment. The volume increased on the 24th to 56,427 shares, indicating some buying momentum.
- **Volume Analysis**: The volume remained consistent on the second half of the period, suggesting continued activity despite some price declines.

#### Price Movements  
- **High to Low**: The stock experienced small increases on the 27th and 28th, reflecting potential positive news or earnings reports.
- **Low to High**: The stock faced significant drops on the 24th and 29th, indicating possible selling pressure or market reactions.

#### Percentage Changes  
- The stock maintained a relatively stable level throughout the period, with minor fluctuations. The percentage changes were mostly within ±1%, indicating moderate volatility.

#### Causal Insights  
1. **Earnings Report Impact**: The increase on the 27th could be linked to Tesla's earnings report, which often triggers positive news and increases investor interest.
2. **Market Corrections**: The drops on the 24th and 29th may have been due to market corrections or external factors affecting Tesla's performance.

---

### Conclusion  
Over the period from January 22nd to February 5th, Tesla stock exhibited mixed performance. The stock generally maintained a steady level, with minor volatility observed on the 24th and 29th. The stock rose temporarily towards the end of the second week, driven by potential earnings reports. Overall, Tesla's stock performed moderately well during this time frame, with some days marked by heightened volatility.

In [73]:
# Evaluating the response for Question 1 using the LLM
question_1 = "What is the performance of the Tesla stock during this period (Jan 22 to Feb 5)?"
price_summary = get_prices_summary()
evaluate_response(trim_reasoning(question1Answer[0]['generated_text']), price_summary, question_1)

**Evaluation of Answer:**

The answer provided is comprehensive and adheres strictly to the specified criteria. Here's a breakdown of how it meets each criterion:

1. **Uses Only Data Provided**:  
   The answer meticulously uses all the data points provided. It includes every single day from January 22 to February 5, detailing the open, high, low, close prices, and trading volumes for each day. There is no addition or speculation beyond the data presented.

2. **Clearly Explains Its Logic and Reasoning**:  
   The answer begins with an introduction that outlines the purpose of the analysis. It then proceeds to explain the rationale behind each segment of the data. For instance, it mentions the peak trading volume on January 22, which suggests strong interest, and the subsequent decline on January 29, possibly due to market corrections. This structured approach ensures clarity and understanding among readers.

3. **Attività Cause-Effect Relationships Directly to Data**:  
   The answer effectively attributes price movements to underlying factors. For example, the rise on January 27 is linked to the assumption that Tesla's earnings report would positively impact the stock price. Similarly, the decline on January 29 is attributed to market corrections, both of which are directly supported by the data provided.

4. **Avoids Speculation or Making Up Facts**:  
   The answer avoids any speculative claims or assumptions beyond what is evident in the data. All attributions of price movements to specific events are grounded in the provided information, ensuring factual accuracy.

In conclusion, the answer is thorough, uses only the data provided, explains its reasoning clearly, attributes causes correctly, and avoids speculation or making up facts. Therefore, it receives a high score according to the specified criteria.

### Chathila's and Induwara's evaluation of the response to Question 1

Overall, this answer was more detailed and data-driven than our previous attempts, which we liked. It clearly broke down the price movements, trading volumes, and even grouped insights around volatility and potential causes. We appreciated that it highlighted specific dates and referenced volume spikes accurately. However, some of the causal links — like attributing a rise on the 27th to earnings or calling a drop on the 24th a market correction — still feel speculative because the provided data doesn’t include any earnings report directly tied to that day. It was nice that the answer attempted to analyze sentiment, but we agreed that it still slightly overstepped the “no hallucination” rule. The formatting was clean, and the conclusion wrapped up the trends well. Overall, it’s a solid response but could improve by sticking closer to just what the numbers show.


# Question 2 
### Why did the price increase on Jan 30? Please provide potential factors.

In [77]:
import re

system_prompt_ts = (
    """
    You are an expert financial analysis assistant.  
    Always respond in clear, concise English.  
    Use only the information explicitly provided in the user’s inputs.  
    Do not invent, fetch, or reference any external data.  
    Produce only bullet points—no prose.
    """
)

user_prompt_ts = (
    """
Please summarize the following earnings‑call excerpt into 3–4 bullet points,
focusing on key financial metrics, guidance, or other actionable insights.
Do not add commentary or restate the question—just the bullets.
    """
)

with open("447_dataset/earning_transcript.md", "r") as f:
    earnings_transcript_data = f.read()

lines = earnings_transcript_data.splitlines()

chunks = []
current_chunk = []
max_lines_per_chunk = 50

for line in lines:
    current_chunk.append(line)
    # once we've collected 50 lines, seal off a chunk
    if len(current_chunk) >= max_lines_per_chunk:
        # join back into one string
        chunk_text = "\n".join(current_chunk)
        chunks.append(chunk_text)
        # reset for the next chunk
        current_chunk = []

if current_chunk:
    chunk_text = "\n".join(current_chunk)
    chunks.append(chunk_text)

transcript_summary = []

def clean_summary(raw):
    # 1) remove any explicit <think>…</think> if they ever appear
    s = re.sub(r"<think>.*?</think>", "", raw, flags=re.S)
    # 2) drop any lines beginning with “Thought:” or “Thinking”
    s = re.sub(r"(?m)^(Thought:|Thinking:?).*\n?", "", s)
    # 3) keep only from the first bullet (dash or •) onward
    m = re.search(r"(?m)^[-•]\s+", s)
    if m:
        return s[m.start():].strip()
    # fallback: strip leading/trailing whitespace
    return s.strip()


for i in range(len(chunks)):
    filled_user_ts = user_prompt_ts.format(earnings_transcript=chunks[i])
    messages_ts = [
        {"role": "system", "content": system_prompt_ts},
        {"role": "user",   "content": filled_user_ts},
    ] 
    out = pipe(messages_ts, **generation_args)
    raw = out[0]["generated_text"]
    summary = clean_summary(raw)
    transcript_summary.append(summary)

full_transcript_summary = "\n\n".join(transcript_summary)



# you can print it or write it to a file
print(full_transcript_summary)

- Improve Product Quality and Customer Satisfaction  
- Cost Management and Efficiency Improvements  
- Revenue Growth of 15%

- Company performance showed strong Q2 results with revenue growth of 15% YoY and operating income improvement of 2% YOY.  
- Key financial metrics include revenue of $867 million, operating income of $195 million, EBITDA of $180 million, and net income of $175 million.  
- Management emphasized continued innovation and cost control efforts as part of the strategy.  
- Future outlook highlights positive expectations for growth and diversification opportunities.

- Revenue growth from $10 billion to $12 billion.  
- Net profit increased by 15%.  
- Cost management strategies implemented to reduce expenses.  
- Future expansion plans targeting new markets.

- Revenue growth of 15% YoY, driven by strong demand for new products.  
- Cost management improvements, with operating expenses reduced by 20%.  
- Strong segment margins across all product lines, particularl

### Extract News

### Extract Earnings

In [None]:
import json
from datetime import datetime


with open("447_dataset/news.json", "r") as f:
    all_news = json.load(f)

def parse_date(s: str) -> datetime.date:
    return datetime.strptime(s.strip(), "%B %d, %Y").date()



# define your window
start = datetime(2025, 1, 25).date()
end   = datetime(2025, 1,  30).date()

news_window = [
    article
    for article in all_news
    if start <= parse_date(article["date"]) <= end
]

print(f"Found {len(news_window)} articles between {start} and {end}:")
for a in news_window:
    print(f"- {a['date']}: {a['title']}")

In [None]:
import json

# load all earnings entries
with open("447_dataset/earning.json", "r") as f:
    earnings = json.load(f)

# print the extracted data
earnings = json.dumps(earnings, indent=2, ensure_ascii=False)
print(earnings)

[
  {
    "EpsActual": 0.71,
    "EpsForecast": 0.74037,
    "EpsSurprise": -0.03,
    "RevenueActual": 25167000000,
    "RevenueSurprise": -596790000,
    "RevenueForecast": 25763787850,
    "EarningReportDate": "2024-01-24T21:15:00Z"
  },
  {
    "EpsActual": 0.45,
    "EpsForecast": 0.49506,
    "EpsSurprise": -0.04,
    "RevenueActual": 21301000000,
    "RevenueForecast": 22255924550,
    "RevenueSurprise": -954920000,
    "EarningReleaseDate": "2024-04-23T20:10:00Z"
  },
  {
    "EpsActual": 0.52,
    "EpsForecast": 0.62013,
    "EpsSurprise": -0.1,
    "RevenueActual": 25500000000,
    "RevenueForecast": 24740776400,
    "RevenueSurprise": 759220000,
    "EarningReleaseDate": "2024-07-24T00:00:00Z"
  },
  {
    "EpsActual": 0.72,
    "EpsForecast": 0.59756,
    "EpsSurprise": 0.12,
    "RevenueActual": 25182000000,
    "RevenueForecast": 25441566840,
    "RevenueSurprise": -259570000,
    "EarningReleaseDate": "2024-10-23T20:09:00Z"
  },
  {
    "EpsActual": 0.73,
    "EpsForecas

### Relavant Price Window

In [None]:
import json
from datetime import datetime

def load_prices(path: str):
    """Load the full list of price bars from a JSON file."""
    with open(path, "r") as f:
        return json.load(f)

def filter_by_date(prices, start_date: str, end_date: str):
    """
    Return only those price entries whose 'Date' field
    falls between start_date and end_date, inclusive.
    Dates must be in 'YYYY-MM-DD' format.
    """
    start = datetime.fromisoformat(start_date).date()
    end   = datetime.fromisoformat(end_date).date()

    return [
        p for p in prices
        if start <= datetime.fromisoformat(p["Date"]).date() <= end
    ]


    
start_date = "2025-01-20"
end_date   = "2025-01-30"

prices = load_prices("447_dataset/prices.json")
window = filter_by_date(prices, start_date, end_date)

relevant_prices = json.dumps(window, indent=2, ensure_ascii=False)

In [82]:
system_prompt_q2 = ("""
You are an expert financial analysis assistant.  
Use only the information explicitly provided in the user’s inputs.  
Do not invent, fetch, or reference any external data.
You must suport all answers with evidence and reasoning
""")

user_prompt_q2 = """
Below is all the data—no other sources allowed:

You must suport all answers with evidence and reasoning

Be verbrose

1) Earnings results (earning.json):
{earnings_data}

2) Stock price summary:
{relevant_price_data}

3) News articles during the time:
{news_data}

4) Summary of an earnings talk the prior day:
{transcript_data}

Question
Why did the stock price increase on January 30?
List the likely drivers with an explanation, and cite exactly which data (earnings, transcript, news article, or price bar) supports it.
"""

filled_user_q2 = user_prompt_q2.format(earnings_data = earnings, relevant_price_data = relevant_prices, news_data = news_window, transcript_data = full_transcript_summary)

messages_q2 = [
        {"role": "system", "content": system_prompt_q2},
        {"role": "user",   "content": filled_user_q2},
    ] 



output_q2 = pipe(messages_q2, **generation_args)
print(output_q2[0]["generated_text"])

KeyboardInterrupt: 

### Question 3:Compared with previous quarters, how is the performance of this quarter?

#### Data Used for Prompt construction

* For the LLM to have any insight of quaterly financial information, I will provide it the `balancesheet.json` as this file contains the quaterly financial figures. We will use this data so the LLM is a aware of the performance of the previous 1-2 years. This gives the LLM a base to compare against the current financial quater( 2025 Q1). Since the balance sheet has a lot of infomation, we dont want to overload the LLM with too many features to consider when it comes up with its decison so we will handpick some important features from the balance sheet from each quater.


* The Data found in the `earnings.json` also contains relavant financial information for the past five quaters, which the LLM can use compare the performance of the company 

In [None]:
def get_earnings_summary():
    """
    Reads earning.json and returns a plain summary with:
    - EPS predicted / actual
    - Revenue predicted / actual (in billions)
    """
    earnings_path = os.path.join(DATA_DIR, "earning.json")

    with open(earnings_path, "r") as f:
        earnings_json = json.load(f)

    lines = []
    for rec in earnings_json:
        dt = rec.get("EarningReleaseDate") or rec.get("EarningReportDate")
        if not dt:
            continue

        date = pd.to_datetime(dt).strftime("%Y-%m-%d")

        eps_actual   = rec.get("EpsActual", 0)
        eps_forecast = rec.get("EpsForecast", 0)
        eps_surprise   = rec.get("EpsSurprise", 0)

        rev_actual   = rec.get("RevenueActual", 0) / 1e9
        rev_forecast = rec.get("RevenueForecast", 0) / 1e9

        line = (
            f"{date}: "
            f"EPS predicted: {eps_forecast:.2f}, EPS actual: {eps_actual:.2f}, EPS surprise: {eps_surprise:.2f}; "
            f"Revenue predicted: ${rev_forecast:.2f}B, Revenue actual: ${rev_actual:.2f}B"
        )
        lines.append(line)

    return "\n".join(lines)

def get_balance_sheet_summary():
    """
    This function extracts financial statement information from balencesheet.json
    and returns a readable summary line for each quarter.
    """
    bs_path = os.path.join(DATA_DIR, "balencesheet.json")

    # Load raw JSON into a DataFrame
    with open(bs_path, "r") as f:
        balance_sheet = pd.DataFrame(json.load(f))

    # Parse dates and sort
    balance_sheet["Date"] = pd.to_datetime(balance_sheet["Date"])
    balance_sheet = balance_sheet.set_index("Date").sort_index()

    # Build human-readable lines
    summaries = []
    for date, row in balance_sheet.iterrows():
        rev       = row.get("Total Revenue", 0) / 1e6
        gp        = row.get("Gross Profit", 0) / 1e6
        op_inc    = row.get("Operating Income", 0) / 1e6
        net_inc   = row.get("Net Income Common Stockholders", 0) / 1e6
        ebitda    = row.get("EBITDA", 0) / 1e6
        tot_exp   = row.get("Total Expenses", 0) / 1e6

        # Calculate margins and ratios safely
        gross_margin   = gp / rev * 100 if rev else 0
        op_margin      = op_inc / rev * 100 if rev else 0
        net_margin     = net_inc / rev * 100 if rev else 0

        summaries.append(
            f"{date.strftime('%Y-%m-%d')}: "
            f"Revenue ${rev:.1f}M, Gross Profit ${gp:.1f}M ({gross_margin:.1f}%), "
            f"Operational Income ${op_inc:.1f}M ({op_margin:.1f}%), Net Income ${net_inc:.1f}M ({net_margin:.1f}%), "
            f"EBITDA ${ebitda:.1f}M, Total Expenses ${tot_exp:.1f}M, "
        )

    return "\n".join(summaries)

print("----------Balance Sheet Summary----------")
print(get_balance_sheet_summary())
print("\n-----------Earnings Summary---------------")
print(get_earnings_summary())

----------Balance Sheet Summary----------
2023-12-31: Revenue $25.2M, Gross Profit $4.4M (17.6%), Operational Income $2.1M (8.2%), Net Income $7.9M (31.5%), EBITDA $3.5M, Total Expenses $23.1M, 
2024-03-31: Revenue $21.3M, Gross Profit $3.7M (17.4%), Operational Income $1.2M (5.5%), Net Income $1.2M (5.5%), EBITDA $2.9M, Total Expenses $20.1M, 
2024-06-30: Revenue $25.5M, Gross Profit $4.6M (18.0%), Operational Income $2.2M (8.7%), Net Income $1.5M (5.8%), EBITDA $3.3M, Total Expenses $23.3M, 
2024-09-30: Revenue $25.2M, Gross Profit $5.0M (19.8%), Operational Income $2.8M (11.0%), Net Income $2.2M (8.6%), EBITDA $4.2M, Total Expenses $22.4M, 
2024-12-31: Revenue $25.7M, Gross Profit $4.2M (16.3%), Operational Income $1.6M (6.2%), Net Income $2.3M (9.0%), EBITDA $4.4M, Total Expenses $24.1M, 

-----------Earnings Summary---------------
2024-01-24: EPS predicted: 0.74, EPS actual: 0.71, EPS surprise: -0.03; Revenue predicted: $25.76B, Revenue actual: $25.17B
2024-04-23: EPS predicted: 0

In [85]:
question3_user_prompt = f"""
Here are your two data summaries:

**Balance Sheet Summary** 
- Balance sheet summary containing the quaterly financial information for the previous quaters 
{get_balance_sheet_summary()}


**Earnings Summary**  
- Earnings summary containing the quaterly earnings information for the previous quaters and the current quater(2025 Q1)
{get_earnings_summary()}

**Question:**  
Compared with previous quarters, how is the performance of this quarter (ending 2025‑01‑29)?

**Please:**  
- Write a brief **Introduction** stating the question and data scope.  
- In your **Analysis**, compare each metric (Revenue, Gross Profit, Operational Income, Net Income, EBITDA, EPS and Revenue surprises) quarter‑over‑quarter, citing the exact figures.  
- Highlight any notable trends (e.g., margin expansions or EPS misses).  
- Finish with a concise **Conclusion** summarizing whether performance improved or deteriorated and why.
- You must strictly avoid speculation or using made-up facts outside of the data provided.
"""

messages = [
    {"role": "system", "content": general_system_prompt},
    {"role": "user",   "content": question3_user_prompt},
]

question3Answer = pipe(messages, **generation_args)
display(Markdown(trim_reasoning(question3Answer[0]['generated_text'])))

**Introduction**

The question asks us to evaluate the performance of the company during the quarter ending on 2025-01-29 compared to the previous year's same quarter. By examining the provided data, we can identify key trends and insights that shed light on the company's financial health and future prospects.

**Analysis**

To assess the performance, we analyze each relevant metric:

1. **Revenue**: The revenue for the quarter ending on 2025-01-29 is $25.7M, matching the revenue reported for the same period in 2024. This indicates no change in total revenue during this reporting period.

2. **Gross Profit**: The gross profit for 2025-01-29 is $4.2M, whereas it was $4.6M in 2024. This represents an 8% decrease in gross profit, suggesting potential cost increases or higher operating expenses.

3. **Operating Income**: Operating income for 2025-01-29 is $3.3M, down from $3.5M in 2024. This decline reflects additional expenses beyond gross profit, such as administrative costs or interest payments.

4. **Net Income**: The net income for 2025-01-29 is $2.2M, a significant drop from $2.9M in 2024. This indicates reduced profitability post-tax, likely due to higher operational costs or tax implications.

5. **EBITDA**: EBITDA for 2025-01-29 is $4.4M, up by approximately 33% compared to 2024. This positive figure highlights improved profitability before tax, signaling better efficiency.

6. **EPS**: The earnings per share (EPS) for 2025-01-29 is $0.73, down from $0.71 in 2024. While the revenue growth was modest, the drop in EPS suggests underlying issues related to profitability.

7. **Revenue Surprises**: The revenue surprise for 2025-01-29 is -0.04%, indicating actual revenue was slightly lower than expected. In contrast, the 2024-12-31 quarter had a surprise of -0.03%.

**Conclusion**

The performance of the company during the quarter ending on 2025-01-29 shows a mixed result. Although there was a slight improvement in revenue, the significant drop in net income and EPS indicate a deterioration in overall profitability. Key areas of concern include increased operational expenses, particularly in gross profit and operating income, which could impact future financial stability. Further analysis of these metrics will help determine the next steps for the company's financial strategy.

In [86]:
# Evaluation of the response for question 3 using the LLM

question_3 = "Compared with previous quarters, how is the performance of this quarter (ending 2025‑01‑29)?"
data_q3 = "\n".join([
    "----------Balance Sheet Summary----------", 
    get_balance_sheet_summary(), 
    "\n-----------Earnings Summary---------------", 
    get_earnings_summary()
])

evaluate_response(trim_reasoning(question3Answer[0]['generated_text']), data_q3, question_3)

### Evaluation of the Answer

#### **Criteria Overview**
The answer needs to meet three main criteria:
1. **Data Usage**: Only the provided data should be used.
2. **Logic Explanation**: Clear reasoning behind the conclusions.
3. **Cause-Effect Relationships**: Direct links between data points.

#### **Evaluation Analysis**

1. **Data Usage**:
   - The answer uses data from multiple quarters, including 2024 and 2025, to compare performance.
   - Data points such as revenue, gross profit, operating income, net income, EBITDA, and EPS are included.
   - All necessary data is referenced, ensuring compliance with the first criterion.

2. **Logic Explanation**:
   - The answer logically compares each financial metric across the years.
   - For example, it notes that revenue remained similar while net income decreased, explaining why.
   - The explanation is clear and concise, addressing the second criterion effectively.

3. **Cause-Effect Relationships**:
   - The answer identifies specific causes for changes, such as decreased gross profit and higher operating expenses.
   - It connects these causes to their respective effects, providing a logical flow.
   - However, the answer does not provide specific numerical values for EBITDA and EPS, which may limit precision.

#### **Strengths**
- The answer effectively addresses the primary criteria.
- It provides sufficient detail to understand the performance trends without unnecessary complexity.

#### **Areas for Improvement**
- Including specific numerical examples for EBITDA and EPS could enhance clarity and precision.
- The lack of quantitative data for EBITDA and EPS might reduce the answer's effectiveness in meeting the third criterion.

#### **Final Assessment**
The answer is well-structured and meets the essential criteria. It clearly explains the reasons behind the performance differences and establishes cause-effect relationships where applicable. While minor improvements could further enhance its precision, the current version is sufficiently effective.

### Chathila's and Induwara's evaluation of the response to Question 3

The structure of the response is clear and well-organized, breaking down each metric effectively. However, it compares the wrong quarters — claiming a year-over-year comparison with 2024-01 but actually pulling numbers from 2024-12. Several values are also inaccurate: for example, revenue wasn’t flat, it increased slightly (~2%), and operating income was misreported. The revenue surprise was also miscalculated (off by several percentage points).The tone is professional, and the format works well. The model seems to struggle a bit with showing relationship to the data but it was abale to structure a decent answer

# DO QUESTION 4 INDUWARA

### Question 5:What insights can be concluded from the earnings call?

#### Data Used for Prompt construction

* To gain insights from the earnings call, we used the transcript available in the `earning_transcript.md` file. Since the full transcript is quite lengthy, our strategy was to break it into smaller sections and summarize each section individually using the LLM. By combining these individual summaries, we created a comprehensive overview of the entire earnings call. For the final prompt, we provided the LLM with this summarized version of the transcript and asked it to identify the key insights.

In [None]:
# get function from indy to summarise the earnings call

In [119]:
user_prompt_template_q5 = f"""
"Note: In this response, please pay particular attention to executive tone, guidance, and strategic priorities.\n\n"
Here is the summarized earnings call transcript:
{full_transcript_summary}

Question:
What key insights can be concluded from the Tesla earnings call?

Please:
- Start with a **Overview** of the call’s focus.
- Identify the **Key Themes** or concerns mentioned and briefly explain then with context from the call(e.g., demand, margin pressure, product roadmap).
- Highlight any **management tone or forward-looking statements**.
- Try to identify any statements about future goals of the company.
- Finish with a concise **Conclusion** that summarizes the strategic outlook or market signal from the call.
- You must strictly avoid speculation or using made-up facts outside of the data provided.
"""

messages = [
    {"role": "system", "content": general_system_prompt},
    {"role": "user", "content": user_prompt_template_q5},
]

question5Answer = pipe(messages, **generation_args)
display(Markdown(trim_reasoning(question5Answer[0]['generated_text'])))


### Summary of Key Insights from the Tesla Earnings Call:

#### Overview:
The earnings call highlighted Tesla's strong fundamentals, driven by robust revenue growth and effective cost management. The company has expanded its product line, focusing on electric vehicles (EVs) and other high-demand areas. Management emphasized innovation and cost control, aiming to sustain profitability while driving future growth.

#### Key Themes:
1. **Revenue Growth**: Tesla reported a 15% YoY revenue growth, primarily due to strong demand for new products, particularly EVs. This expansion aligns with Tesla's strategy to diversify into new markets.
   
2. **Cost Management**: Despite the 15% revenue growth, operating income improved by 2% YOY, indicating effective cost management. Automation and enhanced supply chain management were key initiatives to achieve this improvement.

3. **Customer Satisfaction**: While customer retention rates increased by 5%, there was no explicit mention of specific improvements beyond this. This could suggest room for further enhancement in service offerings.

4. **Future Expansion Plans**: Tesla has already begun expanding into new markets, including EVs, which aligns with their overall strategy to diversify and grow.

#### Management Tone:
Management consistently emphasized innovation and cost control as core drivers of long-term success. This focus reflects Tesla's commitment to sustainability and profitability, which are critical for sustained growth.

#### Future Goals:
Tesla aims to continue driving growth and diversifying into emerging markets. Their expansion into EVs and other high-demand areas demonstrates a proactive approach to future success.

#### Conclusion:
The earnings call provides strong evidence of Tesla's fundamentals, expansion plans, and innovation focus. The company's ability to manage costs effectively and expand into new markets underscores their potential for sustained success.

In [120]:
# Evaluation of the response for question 5 using the LLM

question_5 = "What key insights can be concluded from the Tesla earnings call?"
evaluate_response(trim_reasoning(question5Answer[0]['generated_text']), full_transcript_summary, question_5)

### Summary of Key Insights from the Tesla Earnings Call:

#### Overview:
The earnings call highlighted Tesla's strong fundamentals, driven by robust revenue growth and effective cost management. The company has expanded its product line, focusing on electric vehicles (EVs) and other high-demand areas. Management emphasized innovation and cost control, aiming to sustain profitability while driving future growth.

#### Key Themes:
1. **Revenue Growth**: Tesla reported a 15% YoY revenue growth, primarily due to strong demand for new products, particularly EVs. This expansion aligns with Tesla's strategy to diversify into new markets.
   
2. **Cost Management**: Despite the 15% revenue growth, operating income improved by 2% YOY, indicating effective cost management. Automation and enhanced supply chain management were key initiatives to achieve this improvement.

3. **Customer Satisfaction**: While customer retention rates increased by 5%, there was no explicit mention of specific improvements beyond this. This could suggest room for further enhancement in service offerings.

4. **Future Expansion Plans**: Tesla has already started expanding into new markets, including EVs, which aligns with their overall strategy to diversify and grow.

#### Management Tone:
Management consistently emphasized innovation and cost control as core drivers of long-term success. This focus reflects Tesla's strategy to sustain profitability while driving future growth.

#### Future Goals:
Tesla aims to continue driving growth and diversifying into new markets. Their expansion into EVs and other high-demand areas aligns with their overall strategy to grow.

#### Conclusion:
The earnings call provides strong evidence of Tesla's fundamentals, driven by robust revenue growth and effective cost management. The company has successfully managed these factors to achieve sustainable profitability while driving future growth.

### Chathila's and Induwara's evaluation of the response to Question 5

# INDY HAS TO FINIsh

### Question 6:Which key news events influenced the stock performance, and what insights do they offer?

#### Prompt construction

* To analyse what news events affected the stock perferomance we will take a look at the news articles found in `news.json`. This file has new events but it seems to be large sections of text which might be hard to include in the prompt for the LLM. We will use a similar strategy to extract the most important infomation out of the news articles. We will split the news articles into batches of 3 and then ask the LLM to summarise the key information. Then we will combined these summaries to create a comprehensive summary that will be included in the final prompt to the LLM, where it will use the summary to answer the question.

In [None]:
def get_all_news_articles_individually():

    news_path = os.path.join(DATA_DIR, "news.json")

    with open(news_path, "r") as f:
        news_json = json.load(f)

    # Create a list of formatted articles
    formatted_articles = [
        textwrap.dedent(
            f"Article {i + 1}:\n"
            f"Title: {article['title']}\n"
            f"Date: {article['date']}\n"
            f"Content:\n{article['content']}"
        )
        for i, article in enumerate(news_json)
    ]

    return formatted_articles

def summarize_each_news_article_one_by_one():
    news_path = os.path.join(DATA_DIR, "news.json")

    with open(news_path, "r") as f:
        news_json = json.load(f)

    # System prompt
    summarise_news_system_prompt = (
        "You are a professional financial news summarizer.\n"
        "Given a news article about Tesla, produce a standalone summary that:\n"
        "- Is approximately 100–150 words long.\n"
        "- Clearly identifies the main event or announcement.\n"
        "- Highlights any financial, strategic, or regulatory implications.\n"
        "- Remains neutral and grounded in the content provided.\n"
        "Do not invent or assume any details beyond what's in the article."
        "You must respond in English!"
    )

    summaries = []

    for i, article in enumerate(news_json):
        article_text = textwrap.dedent(f"""
        Title: {article['title']}
        Date: {article['date']}
        Content:
        {article['content']}
        """)

        user_prompt = f"Summarize the following article:\n\n{article_text}"

        messages = [
            {"role": "system", "content": summarise_news_system_prompt},
            {"role": "user", "content": user_prompt}
        ]

        print(f"Summarizing article {i + 1}/{len(news_json)}...")
        response = pipe(messages, **generation_args)
        summaries.append(response[0]["generated_text"])

    return summaries

news_summaries = summarize_each_news_article_one_by_one()



Summarizing article 1/10...
Summarizing article 2/10...
Summarizing article 3/10...
Summarizing article 4/10...
Summarizing article 5/10...
Summarizing article 6/10...
Summarizing article 7/10...
Summarizing article 8/10...
Summarizing article 9/10...
Summarizing article 10/10...


In [100]:
def get_comprehensive_news_summary(news_summaries):
  comprehensive_news_summary = []
  for idx, summary in enumerate(news_summaries):
    comprehensive_news_summary.append(f"Article {idx + 1}: {(trim_reasoning(summary))}")
    comprehensive_news_summary.append("\n")
  return "\n".join(comprehensive_news_summary)

comprehensive_news_summary =get_comprehensive_news_summary(news_summaries)
#display(Markdown(comprehensive_news_summary))

In [97]:
news_analysis_user_prompt_template = f"""
You are given a summary of key news events related to Tesla over a recent period.

**Question:**  
Which key news events influenced Tesla’s stock performance, and what insights do they offer?

**News Summary:**  
{comprehensive_news_summary}

**Please:**
- Identify the most impactful events and explain why they mattered.
- Highlight whether the impact was positive or negative, and on which part of Tesla’s business (e.g., automotive, energy, robotics, AI, international).
- Connect events to possible market sentiment (e.g., uncertainty, optimism, risk).
- End with a concise **Conclusion** about the overall narrative and strategic outlook based on this news.
- You must strictly avoid speculation or using made-up facts outside of the data provided.
"""

messages = [
    {"role": "system", "content": general_system_prompt},
    {"role": "user", "content": news_analysis_user_prompt_template},
]

question6Answer = pipe(messages, **generation_args)
display(Markdown(trim_reasoning(question6Answer[0]['generated_text'])))

### Analysis of Tesla News Events and Strategic Outlook

#### Key Events and Their Impact

1. **Fourth-Qtr Earnings Update (Article 1):**
   - **Event:** Tesla reported mixed fourth-quarter results, with operating margins and revenue declining significantly. Despite this, the company saw strong demand for its energy storage products and regulatory credits, contributing to both revenue growth and improved margins.
   - **Impact:** This highlighted Tesla's ability to leverage its energy storage solutions and regulatory credits to drive growth, even in challenging economic conditions. It also underscored the importance of customer trust and brand reputation in maintaining market share.
   - **Business Area:** Automotive (Energy Storage Products)

2. **Recalls and Pricing Strategy (Articles 3 & 6):**
   - **Event:** Tesla faced recalls in China and the US, primarily due to safety concerns related to driver-assistance features. In Canada, the Model 3 was recalled, raising prices by up to $9,000 CAD.
   - **Impact:** These recalls highlighted Tesla's sensitivity to safety risks and the difficulty of securing global supply chains. The pricing strategy reflected Tesla's commitment to maintaining consumer trust and protecting its brand image.
   - **Business Area:** Energy (Driver-Assistance Features)

3. **Global Competition and Supply Chain Risks (Article 4):**
   - **Event:** Tesla's auto sector face challenges from weaker demand and intense competition, particularly in China. While Tesla's recent partnerships with DeepSeek and intentions to release a new Model Y suggest resilience, the automotive industry's competitive landscape remains unstable.
   - **Impact:** This highlighted Tesla's vulnerability to global supply chain disruptions and the importance of domestic dependencies in maintaining market leadership.
   - **Business Area:** Automotive (Global Competition)

4. **Strategic Commitments (Article 7):**
   - **Event:** Tesla announced a 2025 return to growth, including the delivery of Full Self-Driving (FSD) technology and plans to launch the Model 3 in Europe and China.
   - **Impact:** This strategic commitment reflects Tesla's long-term vision for diversification into autonomous driving and advanced technologies, positioning it as a leader in sustainable and innovative industries.
   - **Business Area:** Robotics/AI (Full Self-Driving Technology)

5. **Market Sentiment and Tariffs (Articles 5 & 9):**
   - **Event:** President Trump's tariffs on China and Mexico's import of Tesla vehicles, plus a 25% U.S. tariff on global car production, impacted Tesla's profitability and stock performance.
   - **Impact:** These tariffs created a competitive environment, forcing Tesla to adjust pricing strategies and navigate complex market dynamics. The current state of U.S. tariffs adds to the broader economic context of trade tensions.
   - **Business Area:** Global (Tariff Impact)

#### Conclusion

Tesla's news events reflect a blend of innovation, diversification, and strategic planning, positioned as a leader in sustainable and autonomous technologies. The company's ability to adapt to changing market conditions, including global supply chain risks and regulatory changes, demonstrates resilience. The strategic commitment to return to growth in 2025 aligns with Tesla's long-term vision, aiming to become a dominant player in the automotive industry. While the current tariffs pose a challenge, Tesla's focus on innovation and diversification positions it for continued success in navigating the evolving global economy.

### Chathila's and Induwara's evaluation of the response to Question 6

As a team, we felt the response did a good job overall in structuring the analysis around key themes like earnings, tariffs, product recalls, and strategic commitments. Breaking down the news into categories helped make the insights easier to follow, and we appreciated that each event included a potential impact on Tesla’s stock or operations.

That said, we noticed a few areas for improvement. First, Article 2 about Tesla and BMW suing the EU wasn’t included in the summary, even though it’s relevant to Tesla’s global operations and regulatory risks. We also felt that while the analysis mentions investor sentiment and resilience, it could have more clearly connected specific news events to actual stock movement. For example, how much the stock rose or fell after a specific announcement would have added more depth.

Well-organized and insightful overall, but the response could be strengthened by ensuring all articles are covered, all claims are grounded in the data, and that cause-effect relationships are more explicitly shown.

## LLM responses to the Questions

In [128]:
LLM_answers = {
    "Question 1 ": question1Answer[0]["generated_text"],
    "Question 2 ": "",
    "Question 3 ": question3Answer[0]["generated_text"],
    "Question 4 ": " ",
    "Question 5 ": " ",
    "Question 6 ":  question6Answer[0]['generated_text'],
}