# 1. Exploratory Data Analysis (EDA)

**Objective:** To understand the fundamental characteristics of our selected assets: Tesla (TSLA), S&P 500 ETF (SPY), and Vanguard Total Bond Market ETF (BND). This phase involves fetching, cleaning, and visualizing the data to uncover trends, volatility patterns, and statistical properties that will inform our modeling strategy.

**Stakeholder Insight:** This initial analysis is crucial for risk assessment. It helps us visually confirm the roles these assets are expected to play in a portfolio: TSLA for high-growth, SPY for market diversification, and BND for stability.

## 1.1. Setup

Import necessary libraries and configure the environment. We will use functions from our `src` package to ensure consistency with the production pipeline.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import sys
import os
import seaborn as sns
import logging
import numpy as np
from statsmodels.tsa.stattools import adfuller

# Ensure the 'src' directory is in the Python path to allow for modular imports
sys.path.append(os.path.abspath(os.path.join('..', 'src')))

from data_ingestion import get_data, check_stationarity, perform_eda_and_risk_analysis
from config import TICKERS, REPORTS_DIR

# Configure plots for better visualization and consistency
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (15, 7)
logging.basicConfig(level=logging.INFO)

ModuleNotFoundError: No module named 'pandas'

## 1.2. Data Ingestion and Preparation
Here, we fetch the historical data for the defined tickers and save the processed DataFrame for use in subsequent analysis steps. This ensures data consistency across all notebooks.

In [None]:
# Fetch live or synthetic data for the required assets from the data ingestion module
all_data = get_data(TICKERS)

# Create the processed data directory if it doesn't exist
os.makedirs('../data/processed', exist_ok=True)
# Save the cleaned and prepared DataFrame to a CSV file
all_data.to_csv('../data/processed/all_data.csv')

print("Data successfully ingested and saved to ../data/processed/all_data.csv")

## 1.3. Volatility Analysis: Daily Returns
This section focuses on analyzing the volatility of each asset. We visualize the distribution of daily returns and track how volatility changes over time using a rolling standard deviation.

In [None]:
# Create the reports directory to save analysis artifacts
os.makedirs(REPORTS_DIR, exist_ok=True)

for asset in all_data.columns:
    print(f"\n--- Volatility Analysis for {asset} ---")
    
    # Calculate daily returns
    daily_returns = all_data[asset].pct_change().dropna()
    
    # Visualize daily returns distribution to check for normality and fat tails
    plt.figure()
    daily_returns.hist(bins=50)
    plt.title(f'Daily Returns Distribution for {asset}')
    plt.xlabel('Daily Return')
    plt.ylabel('Frequency')
    plt.savefig(f'{REPORTS_DIR}/{asset}_daily_returns.png')
    plt.show()
    
    # Visualize rolling volatility to observe how risk changes over time
    rolling_std = daily_returns.rolling(window=20).std()
    plt.figure()
    rolling_std.plot()
    plt.title(f'20-Day Rolling Volatility for {asset}')
    plt.xlabel('Date')
    plt.ylabel('Volatility')
    plt.savefig(f'{REPORTS_DIR}/{asset}_rolling_volatility.png')
    plt.show()

### Stakeholder Insight

The plot clearly shows:
*   **TSLA:** Extreme volatility and exponential growth, confirming its status as a high-risk, high-reward asset.
*   **SPY:** Steady, consistent growth that mirrors the overall US market.
*   **BND:** Relative price stability, making it an effective hedge against equity market volatility.

## 1.4. Data Stationarity Test
Stationarity is a critical assumption for many time series models like ARIMA. Here, we perform the Augmented Dickey-Fuller (ADF) test for each asset to check if its time series is stationary.

In [None]:
for asset in all_data.columns:
    print(f"\n--- Stationarity Test for {asset} ---")
    # Check for stationarity, a critical assumption for many time series models
    check_stationarity(all_data, asset)

## 1.5. Conclusion and Next Steps
This section summarizes the key findings from the EDA and outlines the next steps in the pipeline, which involves using this prepared data for forecasting and portfolio optimization.

In [None]:
print("\n--- EDA and Data Ingestion Complete ---")
print("Key findings:")
print("- The data has been successfully fetched, cleaned, and stored.")
print("- Daily returns and rolling volatility have been visualized to understand risk dynamics.")
print("- Stationarity tests have been performed for each asset, providing insights for time series modeling.")
print("All visualizations and reports have been saved to the 'reports' directory.")
print("\nNext Steps: The prepared data is ready to be used for the Time Series Forecasting stage, where we will build predictive models.")