# Data Analysis and Q&A Project Using a Local LLM

## Project Overview

This project requires you to perform a comprehensive analysis of a company's stock data using only the provided data sources and a local LLM. Your analysis should answer the following six questions strictly based on the supplied data and documents—no external data is allowed. All generated answers must be firmly based on the provided data, without any fabricated content. In addition, your logic must be clear, and any attribution of events must be causally linked.

---

## Provided Data

You will be provided with the following data sets:

#### Stock Price Data (Json format)
* Timeframe: Jan 22 to Feb 5
* Fields: Open, High, Low, Close, Volume

#### Quarterly Earnings Data for the Past Year (Json format)
* Contains key financial indicators (e.g., revenue, eps) for each quarter.

#### Full Earnings Transcript Call
* The complete transcript of the earnings call, including management discussions and Q&A.

#### Balance Sheet Data for the Past Year (Json format)
* Includes assets, liabilities, and shareholders' equity information.

#### News Articles
* Full text of 10 news articles related to the company during the analysis period.

---

## Questions
Using the provided data and a local LLM, you need to answer the following six questions:

1. What is the performance of the Tesla stock during this period (Jan 22 to Feb 5)?

2. Why did the price increase on Jan 30? Please provide potential factors.

3. Compared with previous quarters, how is the performance of this quarter?

4. With unsupervised Full Self Driving scheduled to launch in limited markets like Austin by June, what regulatory challenges does Tesla foresee for a nationwide or international rollout, and how is the company strategically preparing to address these hurdles?

5. What insights can be concluded from the earnings call?

6. Which key news events influenced the stock performance, and what insights do they offer?

---

## Project Requirements
- #### Data Source Restriction:
Only use the provided data and documents. No external data or information is allowed.

- #### Answer Generation:
All generated answers must strictly be based on the provided data and documents. The LLM should not "invent" information.

- #### Clear Logic and Causal Relationships:
For each question, your answers must clearly demonstrate logical reasoning, and any attribution of cause must be explicitly linked to events in the data.

- #### Prompt Design:
You must design your own prompts for calling the local LLM to ensure that the responses are generated strictly based on the analysis results.

- #### Result Evaluation:
After generating the answers, implement an evaluation step to assess whether the responses meet the above requirements in terms of data reliance, logical clarity, and correct causation.

- #### Please put the answers to these 6 questions in a dict at the end of your submitted Python nodebook file.

For example
```code
{ "Q1 answer": "Answer1", "Q2 answer": "Answer2", "Q3 answer": "Answer3", "Q4 answer4": "Answer4", "Q5 answer": "Answer5", "Q6 answer": "Answer6"}
```

## Dependencies

* Transformers
* Torch (PyTorch)
* Accelerate

In [2]:
%pip install transformers accelerate pandas ipython
# %pip install torch # Install PyTorch if you dont have it downloading 

DATA_DIR = "447_dataset"

import pandas as pd
import json
import os
from IPython.display import Markdown, display


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


## Loading and Running the Local LLM

1. **Imports Transformers utilities**  
   - `AutoModelForCausalLM`: generic class for loading any GPT‑style model  
   - `AutoTokenizer`: matching tokenizer for converting text ↔ tokens  
   - `pipeline`: high‑level helper that ties model + tokenizer into one callable  

2. **Specifies the model repository**  
   ```python
   model_path = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

In [17]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_path = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="cuda"
)

tokenizer = AutoTokenizer.from_pretrained(model_path)

# The pipeline will automatically use the model and tokenizer you just loaded
tokenizer = AutoTokenizer.from_pretrained(model_path) # Load the tokenizer

 # Create a pipeline for text generation
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 2000, # Limit the number of tokens generated
    "return_full_text": False, # Return only the generated text
    "do_sample": True, # Use sampling to generate text
    "temperature": 0.1,# Control the randomness of the output
    "repetition_penalty": 1.1,
    "top_p": 0.9, # Control the diversity of the output
    "top_k": 50, # Control the diversity of the output
}

Device set to use cuda


In [4]:
general_system_prompt = ( 
    "You are an expert financial data analyst LLM.\n"
    "Please ensure that bellow rules are followed:\n"
    "1. Use only the data summaries provided in the prompt\n"
    "2. Show clear step-by-step reasoning , linking each claim to the data.\n"
    "3. The explanation is clear, logical, and accurate.\n"
    "4. Do not invent or hallucinate any information.\n"
    "After your answer, provide a checklist summary indicating whether each criterion is satisfied. "
    "If any criterion is not met, include a brief explanation."
)

### Question 1:What is the performance of the Tesla stock during this period (Jan 22 to Feb 5)?

#### Prompt construction

* Here we need to monitor the performance of the tesla stock over the specified days. To monitor the performance of the stock the LLM just need to understand the how the pricing of the stock was through the given period hence why the LLM will need to see the infomation in the `prices.json` file.
* To supplement the LLM to construct its answer we will also show the information in `balancesheet.json` so it can pickup on any trends to as why stock prices deviated and change through the mentioned days

In [5]:
"""   
This Function extracts the price information and converts the informations
into a simple readable format to be included in the prompt to the LLM
"""
def get_prices_summary():
    prices_path = os.path.join(DATA_DIR, "prices.json")

    # Load raw JSON into a DataFrame
    with open(prices_path, "r") as f:
        prices = pd.DataFrame(json.load(f))

    # Parse dates and index
    prices["Date"] = pd.to_datetime(prices["Date"])
    prices = prices.set_index("Date").sort_index()

    # Build a human-readable summary for each day
    daily_summaries = []
    for date, row in prices.iterrows():
        daily_summaries.append(
            f"{date.strftime('%Y-%m-%d')}: "
            f"Open ${row['Open']:.2f}, "
            f"High ${row['Hight']:.2f}, "
            f"Low ${row['Low']:.2f}, "
            f"Close ${row['Close']:.2f}, "
            f"Volume {int(row['Volume']):,}"
        )

    # Join them into one block of text
    daily_summary_text = "\n".join(daily_summaries)
    return daily_summary_text

print(get_prices_summary())

2025-01-22: Open $416.81, High $428.00, Low $414.59, Close $415.11, Volume 60,963,300
2025-01-23: Open $416.06, High $420.73, Low $408.95, Close $412.38, Volume 50,690,600
2025-01-24: Open $414.45, High $418.88, Low $405.78, Close $406.58, Volume 56,427,100
2025-01-27: Open $394.80, High $406.69, Low $389.00, Close $397.15, Volume 58,125,500
2025-01-28: Open $396.91, High $400.59, Low $386.50, Close $398.09, Volume 48,910,700
2025-01-29: Open $395.21, High $398.59, Low $384.48, Close $389.10, Volume 68,033,600
2025-01-30: Open $410.78, High $412.50, Low $384.41, Close $400.28, Volume 98,092,900
2025-01-31: Open $401.53, High $419.99, Low $401.34, Close $404.60, Volume 83,568,200
2025-02-03: Open $386.68, High $389.17, Low $374.36, Close $383.68, Volume 93,732,100
2025-02-04: Open $382.63, High $394.00, Low $381.40, Close $392.21, Volume 57,072,200
2025-02-05: Open $387.51, High $388.39, Low $375.53, Close $378.17, Volume 57,223,300


In [None]:
user_prompt_template_q1 =( """
    Here is the summarized Tesla data for Jan 22 - Feb 5:
    {price_info}

    **Question:**  
    What was the performance of Tesla stock over this period?
                          
    **Please:**
    - Write a brief **Introduction** stating the question and data scope.  
    - In your **Analysis**, cite the exact figures (dates, prices, percent changes, volumes).  
    - Draw any causal insights clearly (e.g., “the drop on Feb 1 may be linked to…”).  
    - Finish with a concise **Conclusion** summarizing overall performance.
""")

# Fill in the template with specific data
filled_user_prompt = user_prompt_template_q1.format(
    price_info = get_prices_summary()
)

messages = [
    {"role": "system", "content": general_system_prompt},
    {"role": "user", "content": filled_user_prompt},
]

question1Answer = pipe(messages, **generation_args)
display(Markdown(question1Answer[0]['generated_text']))


Alright, let me try to figure out how to approach this query. The user has asked about the performance of Tesla stock from January 22 to February 5. They've provided a detailed dataset with dates, opening, high, low, close, volume, and percentage changes.

First, I need to understand what exactly they're looking for. It seems like they want a clear analysis of the stock's behavior during this specific period. They might be a student working on a project or someone analyzing market trends. Their main goal is probably to assess whether Tesla was performing well, if there were any significant events affecting its price, or if there were any volatility.

Looking at their data, I can see that the stock started around $416 on January 22 and ended around $378 by February 5. That's a decrease, which suggests some downward trend. The user wants to know why it went down—could be due to economic factors, company news, or other market events.

I should structure the response into Introduction, Analysis, and Conclusion sections as per their instructions. In the Introduction, I'll mention the time frame and the stock's performance. Then, in the Analysis, I'll go through each day, highlighting key points like the closing prices and percentage changes. For example, on January 22, the stock closed just below its opening, showing a slight dip. On February 3, it dropped further, possibly due to earnings reports or geopolitical issues.

Causal insights could link these drops to specific events, like earnings calls or economic indicators. Also, mentioning the volume fluctuations will show market interest during these periods.

In the Conclusion, I'll summarize the overall performance, noting the steady decline despite some days of stability. This gives a balanced view without being overly negative.

I also need to make sure I don't add anything beyond the data provided. No hallucinations or invented info. Keeping the language clear and straightforward will help meet the user's requirements.

Finally, after writing, I'll check if all criteria are covered: using only the provided data, showing steps, explanations, and ensuring each point meets the standards set. That way, the response is both informative and compliant with the guidelines.
</think>

### Summary of Tesla Stock Performance (Jan 22 - Feb 5)

#### Introduction  
The user is interested in analyzing the performance of Tesla stock (TSLA) over the period from January 22 to February 5. The data provided includes daily closing prices, percentage changes, and trading volumes, offering insights into the stock's behavior during this time frame.

#### Analysis  

| Date       | Closing Price ($) | % Change from Previous Day | Trading Volume (units) |
|------------|-------------------|-----------------------------|------------------------|
| 2025-01-22 | $415.11           | -0.24%                      | 60,963                 |
| 2025-01-23 | $412.38           | -0.59%                      | 50,690                 |
| 2025-01-24 | $406.58           | -1.73%                      | 56,427                 |
| 2025-01-27 | $397.15           | -2.11%                      | 58,125                 |
| 2025-01-28 | $398.09           | -0.17%                      | 48,910                 |
| 2025-01-29 | $389.10           | -2.03%                      | 68,033                 |
| 2025-01-30 | $400.28           | +0.62%                      | 98,092                 |
| 2025-01-31 | $404.60           | +0.60%                      | 83,568                 |
| 2025-02-03 | $383.68           | -1.35%                      | 93,732                 |
| 2025-02-04 | $392.21           | +0.88%                      | 57,072                 |
| 2025-02-05 | $378.17           | -2.17%                      | 57,223                 |

From the data, we observe the following key points:

1. **Overall Decline**: The stock experienced a net decline of approximately 1.35% over the period, moving from $416.81 on January 22 to $378.17 on February 5.

2. **Key Days of Decline**:
   - **January 22**: The stock closed just below its opening price, indicating a small dip.
   - **February 3**: The stock dropped significantly, likely due to earnings reports or market sentiment.
   - **February 5**: The stock experienced another sharp decline, possibly influenced by geopolitical tensions or corporate earnings.

3. **Percentage Changes**:
   - The stock showed mixed performance, with some days experiencing increases (e.g., February 30) and others seeing significant declines (e.g., January 29, 30, and 31).

4. **Trading Volume**: The trading volume remained relatively stable throughout the period, suggesting that the stock was less liquid during this time.

5. **Volume Fluctuations**: There were notable spikes in trading volume on certain days, such as February 3, where the stock closed near its lowest price.

#### Causal Insights  

While the stock declined, several potential causal factors could explain these movements:

1. **Earnings Reports**: Tesla's earnings report on February 3 may have impacted investor sentiment, leading to a decline in stock price.

2. **Market Sentiment**: The sudden drop in stock price on February 3 could have caused market panic, reducing investor confidence and driving the price lower.

3. **Geopolitical Events**: The rise of China's influence in the U.S. could have affected Tesla's operations and profitability, potentially leading to a decline in stock price.

4. **Corporation Strategy**: Tesla's decision to increase production or expand its global presence may have contributed to the stock's decline.

5. **Market Volatility**: The relatively low trading volume on certain days could indicate increased volatility, making the stock more sensitive to short-term price changes.

#### Conclusion  

Over the period from January 22 to February 5, Tesla stock experienced a net decline, reflecting various factors including earnings reports, market sentiment, geopolitical events, and corporate strategy. While the stock did not consistently rise, the overall performance indicates that Tesla faced challenges during this time, particularly on February 3 when the stock closed near its lowest price.

# Question 2 
### Why did the price increase on Jan 30? Please provide potential factors.

In [9]:
import json

system_prompt_q2 = (
    "You are an expert financial analyst assistant in the United States. "
    "Your job is to explain stock price moves using only the provided data. "
    "You must be concise and give only the most critical information in your answer"
    "Do not invent or fetch any outside information, everything must be grounded in the inputs."
)

user_prompt_q2 = """

Using only the news articles data and Tesla’s Jan 30 stock data, identify all plausible factors that contributed to the price uptick.
Reference the specific news item (by title and date) or data point you’re using.  
Explain the causal link to the price move. 
Do not use any external sources.  

**Question** 
Why did Tesla’s stock price increase on January 30?

Here are the news articles and price info:
{input_data}

"Please respond in English!"
"""

with open("447_dataset/news.json", "r") as f:
    news_data = json.load(f)

news_json_str = json.dumps(news_data)

# Example of how you'd fill & call it:
filled_user = user_prompt_q2.format(input_data=news_json_str)
messages = [
    {"role": "system", "content": system_prompt_q2},
    {"role": "user",   "content": filled_user},
]
output = pipe(messages, **generation_args)
print(output[0]["generated_text"])

</think>

### What Happened?

#### **Shares of Electric Vehicle pioneer Tesla (TSLA) fell 7.1% in the morning session after the Trump administration's new tariffs (25% on Canadian goods and 10% on Chinese products) shook markets.**

#### **Separately, Tesla shares fell 5.1% from $1,000 to $7,394 five years ago.**

---

### **Key Points from the Article**

1. **Tesla shares have risen 1.4% since the beginning of the year, but at $384.48 per share, they are still trading 19.9% below its 52-week high of $479.86 from December 2024.**
   
   - **Wall Street analysts believe Tesla shares will bump higher in mid-Thursday trading after its disappointing fourth-quarter earnings update.**

2. **CEO Elon Musk predicted Tesla's stock would reach $7,394 in mid-Thursday trading.**
   
   - **Musk also emphasized Tesla's long-term vision, saying 'this is the most important year for Tesla,' and 'when we look back on 2025 and the launch of unsupervised Full Self-Driving, true real-world AI that actuall