# Data Analysis and Q&A Project Using a Local LLM

## Project Overview

This project requires you to perform a comprehensive analysis of a company's stock data using only the provided data sources and a local LLM. Your analysis should answer the following six questions strictly based on the supplied data and documents—no external data is allowed. All generated answers must be firmly based on the provided data, without any fabricated content. In addition, your logic must be clear, and any attribution of events must be causally linked.

---

## Provided Data

You will be provided with the following data sets:

#### Stock Price Data (Json format)
* Timeframe: Jan 22 to Feb 5
* Fields: Open, High, Low, Close, Volume

#### Quarterly Earnings Data for the Past Year (Json format)
* Contains key financial indicators (e.g., revenue, eps) for each quarter.

#### Full Earnings Transcript Call
* The complete transcript of the earnings call, including management discussions and Q&A.

#### Balance Sheet Data for the Past Year (Json format)
* Includes assets, liabilities, and shareholders' equity information.

#### News Articles
* Full text of 10 news articles related to the company during the analysis period.

---

## Questions
Using the provided data and a local LLM, you need to answer the following six questions:

1. What is the performance of the Tesla stock during this period (Jan 22 to Feb 5)?

2. Why did the price increase on Jan 30? Please provide potential factors.

3. Compared with previous quarters, how is the performance of this quarter?

4. With unsupervised Full Self Driving scheduled to launch in limited markets like Austin by June, what regulatory challenges does Tesla foresee for a nationwide or international rollout, and how is the company strategically preparing to address these hurdles?

5. What insights can be concluded from the earnings call?

6. Which key news events influenced the stock performance, and what insights do they offer?

---

## Project Requirements
- #### Data Source Restriction:
Only use the provided data and documents. No external data or information is allowed.

- #### Answer Generation:
All generated answers must strictly be based on the provided data and documents. The LLM should not "invent" information.

- #### Clear Logic and Causal Relationships:
For each question, your answers must clearly demonstrate logical reasoning, and any attribution of cause must be explicitly linked to events in the data.

- #### Prompt Design:
You must design your own prompts for calling the local LLM to ensure that the responses are generated strictly based on the analysis results.

- #### Result Evaluation:
After generating the answers, implement an evaluation step to assess whether the responses meet the above requirements in terms of data reliance, logical clarity, and correct causation.

- #### Please put the answers to these 6 questions in a dict at the end of your submitted Python nodebook file.

For example
```code
{ "Q1 answer": "Answer1", "Q2 answer": "Answer2", "Q3 answer": "Answer3", "Q4 answer4": "Answer4", "Q5 answer": "Answer5", "Q6 answer": "Answer6"}
```

## Dependencies

* Transformers
* Torch (PyTorch)
* Accelerate

In [5]:
%pip install transformers torch accelerate


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Loading and Running the Local LLM

1. **Imports Transformers utilities**  
   - `AutoModelForCausalLM`: generic class for loading any GPT‑style model  
   - `AutoTokenizer`: matching tokenizer for converting text ↔ tokens  
   - `pipeline`: high‑level helper that ties model + tokenizer into one callable  

2. **Specifies the model repository**  
   ```python
   model_path = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_path = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="cuda"
)

tokenizer = AutoTokenizer.from_pretrained(model_path)

# The pipeline will automatically use the model and tokenizer you just loaded
tokenizer = AutoTokenizer.from_pretrained(model_path) # Load the tokenizer

 # Create a pipeline for text generation
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 1200, # Limit the number of tokens generated
    "return_full_text": True, # Return only the generated text
    "do_sample": True, # Use sampling to generate text
    "temperature": 0.1,# Control the randomness of the output
    "repetition_penalty": 1.1,
    "top_p": 0.9, # Control the diversity of the output
    "top_k": 50, # Control the diversity of the output
}

ImportError: dlopen(/Users/indy/Desktop/ECE447_Python/ECE447/lib/python3.13/site-packages/torch/_C.cpython-313-darwin.so, 0x0002): Library not loaded: @rpath/libtorch_cpu.dylib
  Referenced from: <5A57F560-9327-31F6-83C1-A26553C563DC> /Users/indy/Desktop/ECE447_Python/ECE447/lib/python3.13/site-packages/torch/lib/libtorch_python.dylib
  Reason: tried: '/Users/indy/Desktop/ECE447_Python/ECE447/lib/python3.13/site-packages/torch/lib/libtorch_cpu.dylib' (no such file), '/Users/runner/work/_temp/anaconda/envs/wheel_py313/lib/libtorch_cpu.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/runner/work/_temp/anaconda/envs/wheel_py313/lib/libtorch_cpu.dylib' (no such file), '/Users/indy/Desktop/ECE447_Python/ECE447/lib/python3.13/site-packages/torch/lib/libtorch_cpu.dylib' (no such file), '/Users/runner/work/_temp/anaconda/envs/wheel_py313/lib/libtorch_cpu.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/runner/work/_temp/anaconda/envs/wheel_py313/lib/libtorch_cpu.dylib' (no such file), '/Users/indy/Desktop/ECE447_Python/ECE447/lib/python3.13/site-packages/torch/lib/libtorch_cpu.dylib' (no such file), '/opt/homebrew/lib/libtorch_cpu.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libtorch_cpu.dylib' (no such file), '/opt/homebrew/lib/libtorch_cpu.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/libtorch_cpu.dylib' (no such file)

### Question 1:What is the performance of the Tesla stock during this period (Jan 22 to Feb 5)?

#### Prompt construction

* Here we need to monitor the performance of the tesla stock over the specified days. To monitor the performance of the stock the LLM just need to understand the how the pricing of the stock was through the given period hence why the LLM will need to see the infomation in the `prices.json` file.
* To supplement the LLM to construct its answer we will also show the information in `balancesheet.json` so it can pickup on any trends to as why stock prices deviated and change through the mentioned days

In [6]:
system_prompt = (
    "You are a professional financial analyst with deep expertise in stocks, bonds, mutual funds, and derivatives. "
    "Your responses should be data-driven, professionally rigorous, and provide clear, step-by-step explanations. "
    "Include cautionary advice regarding potential risks, but do not offer direct investment advice."
)

# Define the user prompt template
user_prompt_template = (
    "Based on the following financial data, please analyze the company's financial health and provide insights on potential risks and opportunities for future growth.\n\n"
    "Company Name: {CompanyName}\n"
    "Time Period: {TimePeriod}\n"
    "Key Financial Data:\n"
    "- Revenue: {Revenue} million USD\n"
    "- Net Income: {NetIncome} million USD\n"
    "- Debt-to-Equity Ratio: {DebtEquityRatio}%\n"
    "- Earnings Per Share (EPS): {EPS}\n\n"
    "Answer the following questions:\n"
    "1. How strong is the company's profitability? Please explain the main factors.\n"
    "2. Is the company's debt level sustainable? Are there any financial risks?\n"
    "3. What potential risks and opportunities do you foresee based on the current data?\n\n"
    "Please detail your analysis process and provide clear conclusions and recommendations."
)

# Fill in the template with specific data
filled_user_prompt = user_prompt_template.format(
    CompanyName="ABC Corporation",
    TimePeriod="Q1 2023",
    Revenue="5000",
    NetIncome="800",
    DebtEquityRatio="45",
    EPS="1.2"
)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "system", "content": "Make you final answer show the most critical and required informations the user requires. Do not use Markdown in your answer"},
    {"role": "user", "content": filled_user_prompt},
]

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

Okay, so I need to help ABC Corporation by analyzing their financial health. They provided some key data: revenue of 5 billion USD, net income of 800 million, a debt-to-equity ratio of 45%, and EPS of 1.2. Let me break this down step by step.

First, looking at profitability. The company has a high revenue but only an 8% net income compared to revenue. That seems like a big loss. But wait, they have a lot of debt, which could mean higher interest expenses. Maybe that's why their net income is low despite having good margins. So, while they're making money, it's not healthy because they're spending too much on debt.

Next, the debt level. A 45% debt-to-equity ratio is pretty high. That means they owe more than they have equity. High debt can lead to financial risks like default, which would hurt their reputation and possibly limit their growth. Also, if they take on more debt, interest payments might eat into profits, reducing future earnings.

Looking at EPS, it's 1.2, which is decent 

# Question 2 
### Why did the price increase on Jan 30? Please provide potential factors.

In [None]:
import json

system_prompt_q2 = (
    "You are an expert financial analyst. "
    "All output must be in English."
    "Use only the provided inputs—do not fetch or invent anything. "
    "Do not restate the question or repeat the prompts. Answer directly and concisely."
)

user_prompt_q2 = """

Please respond in English.

Using only the news articles data and Tesla’s January 30, 2025 stock data, identify all plausible factors that contributed to the price uptick.
Reference the specific news item (by title and date) or data point you’re using.  
Explain the causal link to the price move. 
Do not use any external sources.  

**Question** 
Why did Tesla’s stock price increase on January 30?

Here are the news articles and price info:
{input_data}

"Please respond in English!"
"""

with open("447_dataset/news.json", "r") as f:
    all_news = json.load(f)

jan30_news = []

for article in all_news:
    if article.get("date") == "January 30, 2025":
        jan30_news.append(article)

jan30_news_json = json.dumps(jan30_news, ensure_ascii=False)


# Example of how you'd fill & call it:
filled_user = user_prompt_q2.format(input_data=jan30_news_json)
messages = [
    {"role": "system", "content": system_prompt_q2},
    {"role": "user",   "content": filled_user},
]
output = pipe(messages, **generation_args)
print(output[0]["generated_text"])

NameError: name 'pipe' is not defined

In [11]:
print(len(output))

1
