# Task 1: Preprocess and Explore the Data
## Time Series Forecasting for Portfolio Management Optimization
### Guide Me in Finance (GMF) Investments

**Analysis Period:** July 1, 2015 to December 31, 2024
**Assets:** TSLA, BND, SPY

This notebook implements the complete Task 1 workflow:
1. Load historical financial data for TSLA, BND, and SPY
2. Preprocess and clean the data
3. Calculate comprehensive financial metrics and perform EDA
4. Generate advanced visualizations with interactive Plotly dashboards
5. Perform statistical tests and stationarity analysis
6. Generate comprehensive analysis reports

## Setup and Imports

First, let's set up our environment and import all necessary libraries.

In [None]:
#!/usr/bin/env python3
"""
Task 1: Preprocess and Explore the Data
Time Series Forecasting for Portfolio Management Optimization
Guide Me in Finance (GMF) Investments
"""

import os
import sys
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Add src directory to path
sys.path.append(os.path.join(os.path.dirname('__file__'), 'src'))

from data_loader import FinancialDataLoader
from preprocessing import FinancialDataPreprocessor
from financial_metrics import FinancialMetricsCalculator
from eda import FinancialEDA

print("✓ All required libraries imported successfully!")

## Step 1: Loading Financial Data

We'll load historical financial data for TSLA, BND, and SPY using the YFinance API.

In [None]:
# Create output directories
os.makedirs('data', exist_ok=True)
os.makedirs('results', exist_ok=True)
os.makedirs('results/plots', exist_ok=True)

# Analysis parameters
start_date = "2015-07-01"
end_date = "2024-12-31"
assets = ["TSLA", "BND", "SPY"]

print(f"Analysis Period: {start_date} to {end_date}")
print(f"Assets: {', '.join(assets)}")

# Load data
loader = FinancialDataLoader(start_date=start_date, end_date=end_date)
raw_data = {}

for asset in assets:
    print(f"Loading data for {asset}...")
    data = loader.fetch_asset_data(asset)
    if data is not None:
        raw_data[asset] = data
        print(f"  ✓ {asset}: {len(data)} records loaded")
    else:
        print(f"  ✗ Failed to load {asset}")

print(f"\nTotal assets loaded: {len(raw_data)}")

## Step 2: Data Preprocessing

Now we'll clean, preprocess, and engineer features for our financial data.

In [None]:
# Preprocess data
preprocessor = FinancialDataPreprocessor()
processed_data = preprocessor.preprocess_asset_data(raw_data)

for asset, data in processed_data.items():
    print(f"  ✓ {asset}: Preprocessed {len(data)} records")
    print(f"    Features: {len(data.columns)}")
    print(f"    Date range: {data.index.min().strftime('%Y-%m-%d')} to {data.index.max().strftime('%Y-%m-%d')}")
    print()

## Step 3: Calculating Financial Metrics

We'll calculate comprehensive financial metrics including risk measures, returns, and statistical tests.

In [None]:
# Calculate comprehensive metrics
metrics_calc = FinancialMetricsCalculator()
all_metrics = {}

for asset, data in processed_data.items():
    if 'Daily_Return' in data.columns:
        returns = data['Daily_Return'].dropna()
        if len(returns) > 0:
            print(f"Calculating metrics for {asset}...")
            
            # Basic statistics
            basic_stats = {
                'Mean_Return': returns.mean(),
                'Std_Return': returns.std(),
                'Skewness': returns.skew(),
                'Kurtosis': returns.kurtosis(),
                'Min_Return': returns.min(),
                'Max_Return': returns.max()
            }
            
            # Risk metrics
            var_metrics = metrics_calc.calculate_var(returns)
            sharpe_metrics = metrics_calc.calculate_sharpe_ratio(returns)
            max_drawdown_metrics = metrics_calc.calculate_maximum_drawdown(data['Close'])
            sortino_metrics = metrics_calc.calculate_sortino_ratio(returns)
            
            # Stationarity tests
            stationarity_tests = {}
            try:
                stationarity_tests['ADF'] = metrics_calc.test_stationarity(returns, 'adf')
                stationarity_tests['KPSS'] = metrics_calc.test_stationarity(returns, 'kpss')
            except Exception as e:
                print(f"    Warning: Stationarity tests failed for {asset}: {e}")
            
            # Compile all metrics
            all_metrics[asset] = {
                'Basic_Statistics': basic_stats,
                'VaR': var_metrics,
                'Sharpe_Ratio': sharpe_metrics,
                'Maximum_Drawdown': max_drawdown_metrics,
                'Sortino_Ratio': sortino_metrics,
                'Stationarity_Tests': stationarity_tests
            }
            
            print(f"    ✓ {asset} metrics calculated successfully")
            print(f"    Sharpe Ratio: {sharpe_metrics.get('Sharpe_Ratio', 'N/A')}")
            print(f"    Max Drawdown: {max_drawdown_metrics.get('Max_Drawdown_Pct', 'N/A'):.2f}%")
            print()

## Step 4: Comprehensive Exploratory Data Analysis

We'll generate both static Matplotlib plots and interactive Plotly dashboards with zoom/pan controls.

In [None]:
# Initialize EDA
eda = FinancialEDA()

print("Generating comprehensive EDA visualizations...")

# Static Matplotlib plots
print("\n1. Static Matplotlib Plots:")
eda.create_price_analysis_plots(processed_data, save_path='results/plots/')
eda.create_return_distribution_plots(processed_data, save_path='results/plots/')
eda.create_correlation_analysis(processed_data, save_path='results/plots/')
eda.create_trend_and_seasonality_analysis(processed_data, save_path='results/plots/')
eda.create_volatility_clustering_analysis(processed_data, save_path='results/plots/')
eda.create_outlier_analysis(processed_data, save_path='results/plots/')
eda.create_statistical_tests_analysis(processed_data, save_path='results/plots/')
eda.create_risk_metrics_summary(all_metrics, save_path='results/plots/')

print("✓ All static plots generated successfully!")

## Interactive Plotly Dashboards

Now we'll create interactive visualizations with zoom/pan controls, range sliders, and hover tooltips.

In [None]:
# Interactive Plotly visualizations with zoom/pan controls
print("\n2. Interactive Plotly Dashboards:")

print("  - Interactive price analysis dashboard...")
eda.create_interactive_price_analysis(processed_data, save_path='results/plots/')

print("  - Interactive correlation analysis...")
eda.create_interactive_correlation_analysis(processed_data, save_path='results/plots/')

print("  - Interactive outlier analysis dashboard...")
eda.create_interactive_outlier_analysis(processed_data, save_path='results/plots/')

print("  - Interactive trend and seasonality analysis...")
eda.create_interactive_trend_analysis(processed_data, save_path='results/plots/')

print("  - Interactive risk metrics dashboard...")
eda.create_interactive_risk_metrics(all_metrics, save_path='results/plots/')

print("✓ All interactive dashboards generated successfully!")

## Comprehensive EDA Report

Generate a detailed text report with actionable insights and recommendations.

In [None]:
# Generate comprehensive EDA report
print("\nGenerating comprehensive EDA report...")
eda_report = eda.generate_eda_report(processed_data, all_metrics, save_path='results/')
print("✓ EDA report generation completed!")

# Display first 500 characters of the report
print("\nReport Preview:")
print("=" * 80)
print(eda_report[:500] + "...")
print("=" * 80)
print(f"\nFull report saved to: results/eda_report.txt ({len(eda_report)} characters)")

## Step 5: Advanced Portfolio Analysis

Analyze portfolio-level metrics and correlations between assets.

In [None]:
# Calculate portfolio-level metrics
portfolio_returns = {}
for asset, data in processed_data.items():
    if 'Daily_Return' in data.columns:
        portfolio_returns[asset] = data['Daily_Return'].dropna()

if len(portfolio_returns) > 1:
    # Equal-weight portfolio analysis
    equal_weights = {asset: 1.0/len(portfolio_returns) for asset in portfolio_returns.keys()}
    
    print("Equal-Weight Portfolio Analysis:")
    for asset, weight in equal_weights.items():
        print(f"  {asset}: {weight:.1%}")
    
    # Portfolio risk analysis
    print("\nPortfolio Risk Analysis:")
    
    # Create returns DataFrame for portfolio calculations
    returns_df = pd.DataFrame(portfolio_returns)
    
    # Calculate portfolio statistics
    portfolio_mean = returns_df.mean()
    portfolio_std = returns_df.std()
    
    print("  Individual Asset Statistics:")
    for asset in returns_df.columns:
        annual_return = portfolio_mean[asset] * 252
        annual_vol = portfolio_std[asset] * (252 ** 0.5)
        sharpe = annual_return / annual_vol if annual_vol > 0 else 0
        print(f"    {asset}: Return={annual_return:.2%}, Vol={annual_vol:.2%}, Sharpe={sharpe:.3f}")
    
    # Correlation insights
    correlation_matrix = returns_df.corr()
    print("\n  Correlation Matrix:")
    for i, asset1 in enumerate(correlation_matrix.columns):
        for j, asset2 in enumerate(correlation_matrix.columns):
            if i < j:  # Avoid duplicate pairs
                corr_value = correlation_matrix.loc[asset1, asset2]
                print(f"    {asset1} vs {asset2}: {corr_value:.4f}")
    
    print("\n  Note: Full portfolio optimization will be implemented in Task 4")

## Step 6: Data Quality Assessment

Comprehensive assessment of data quality, missing values, and extreme observations.

In [None]:
# Data quality assessment
for asset, data in processed_data.items():
    print(f"\n{asset} Data Quality:")
    print(f"  - Total Records: {len(data):,}")
    print(f"  - Date Range: {data.index.min().strftime('%Y-%m-%d')} to {data.index.max().strftime('%Y-%m-%d')}")
    print(f"  - Missing Values: {data.isnull().sum().sum():,}")
    
    if 'Daily_Return' in data.columns:
        returns = data['Daily_Return'].dropna()
        print(f"  - Valid Returns: {len(returns):,}")
        print(f"  - Return Range: {returns.min():.4f} to {returns.max():.4f}")
        
        # Check for extreme values
        z_scores = abs((returns - returns.mean()) / returns.std())
        extreme_returns = len(returns[z_scores > 3])
        print(f"  - Extreme Returns (|Z| > 3): {extreme_returns} ({extreme_returns/len(returns)*100:.1f}%)")

## Task 1 Completion Summary

Let's summarize what we've accomplished and outline the next steps.

In [None]:
# Summary and next steps
print("\n" + "="*50)
print("TASK 1 COMPLETION SUMMARY")
print("="*50)

print("✓ Data Loading: Historical financial data loaded for all assets")
print("✓ Data Preprocessing: Data cleaned, missing values handled, features engineered")
print("✓ Financial Metrics: Comprehensive risk and return metrics calculated")
print("✓ EDA: Advanced exploratory analysis with 8 visualization categories")
print("✓ Interactive Visualizations: Plotly dashboards with zoom/pan controls")
print("✓ Statistical Tests: Stationarity, normality, and autocorrelation tests performed")
print("✓ Portfolio Analysis: Basic portfolio metrics and correlation analysis")
print("✓ Data Quality: Comprehensive data quality assessment completed")

print(f"\nOutput files saved to:")
print(f"  - Data: data/")
print(f"  - Results: results/")
print(f"  - Plots: results/plots/ (16 visualization files)")
print(f"  - Report: results/eda_report.txt")

print("\nGenerated Visualizations:")
print("  Static Matplotlib Plots: 8 files")
print("  Interactive Plotly Dashboards: 5 HTML files with zoom/pan controls")

print("\nInteractive Features:")
print("  - Zoom in/out with mouse wheel or zoom tools")
print("  - Pan across charts by clicking and dragging")
print("  - Range sliders for time-based navigation")
print("  - Hover tooltips with detailed information")
print("  - Time range selection buttons (1M, 3M, 6M, 1Y, All)")
print("  - Metric visibility toggles")
print("  - Export to HTML for web sharing")

print("\nNext Steps:")
print("  - Task 2: Develop Time Series Forecasting Models (ARIMA/SARIMA + LSTM)")
print("  - Task 3: Forecast Future Market Trends (6-12 months)")
print("  - Task 4: Optimize Portfolio Based on Forecast (MPT + Efficient Frontier)")
print("  - Task 5: Strategy Backtesting (Performance Validation)")

print("\n" + "="*80)
print("TASK 1 COMPLETED SUCCESSFULLY!")
print("="*80)

## Interactive Features Demo

Here's a quick demonstration of the interactive features available in our Plotly dashboards:

In [None]:
# Display information about interactive features
print("Interactive Dashboard Features:")
print("=" * 50)

features = [
    "🔍 Zoom Controls: Use mouse wheel or zoom tools to zoom in/out",
    "🖱️ Pan Controls: Click and drag to pan across charts",
    "📊 Range Sliders: Navigate through time periods with range sliders",
    "ℹ️ Hover Tooltips: Detailed information on hover",
    "⏰ Time Buttons: Quick time range selection (1M, 3M, 6M, 1Y, All)",
    "👁️ Visibility Toggles: Show/hide specific metrics",
    "💾 Export Options: Save as HTML for web sharing",
    "📱 Responsive Design: Works on desktop and mobile devices"
]

for feature in features:
    print(f"  {feature}")

print("\nHTML files generated:")
html_files = [
    "interactive_price_analysis.html",
    "interactive_correlation_analysis.html",
    "interactive_outlier_analysis.html",
    "interactive_trend_analysis.html",
    "interactive_risk_metrics.html"
]

for i, file in enumerate(html_files, 1):
    print(f"  {i}. {file}")

print("\nOpen these HTML files in any web browser to explore the interactive dashboards!")