# Phase 3: Analysis & Machine Learning (Microsoft Corp)

## Objective
In this phase, we connect to the processed SQL database, perform Exploratory Data Analysis (EDA) on the financial fundamentals, and build a Machine Learning model to forecast future Earnings Per Share (EPS) Growth.

**Methodology Update:**
To ensure statistical rigor, we are predicting **EPS Growth Rate** (Stationary) rather than raw EPS prices (Non-Stationary). We also evaluate the model against a **Naive Baseline**.

In [None]:
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Database Connection
DB_PATH = 'finnhub_data.db'
SYMBOL = 'MSFT'

## 1. Data Extraction
We query the `v_model_features` view. Note that `target_eps_growth` is the percentage change for the *next* quarter.

In [None]:
conn = sqlite3.connect(DB_PATH)
query = """
SELECT period, current_eps, target_eps_growth, sales_growth_yoy, eps_momentum_qoq, net_margin, total_debt_to_equity
FROM v_model_features
WHERE symbol = ?
ORDER BY period ASC
"""
df = pd.read_sql_query(query, conn, params=(SYMBOL,))
conn.close()

df['period'] = pd.to_datetime(df['period'])
df_clean = df.dropna()

print(f"Records available for ML: {len(df_clean)}")
df_clean.tail()

## 2. Exploratory Data Analysis (EDA)
Checking Stationarity of the Target Variable (EPS Growth).

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(df_clean['period'], df_clean['target_eps_growth'], label='EPS Growth Rate')
plt.axhline(0, color='red', linestyle='--')
plt.title('Stationarity Check: EPS Growth Rate over Time')
plt.ylabel('Growth Rate')
plt.legend()
plt.show()

## 3. Machine Learning: Growth Forecasting
**Features (X):** Sales Growth, Net Margin, Debt/Equity, Momentum.
**Target (Y):** Next Quarter EPS Growth Rate.

In [None]:
features = ['sales_growth_yoy', 'eps_momentum_qoq', 'net_margin', 'total_debt_to_equity']
target = 'target_eps_growth'

X = df_clean[features]
y = df_clean[target]

# Time-Series Split (80% Train, 20% Test)
split_idx = int(len(X) * 0.8)
X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]

# Metadata for reconstruction
current_eps_test = df_clean['current_eps'].iloc[split_idx:]

# Model Training
model = LinearRegression()
model.fit(X_train, y_train)
y_pred_lr = model.predict(X_test)

# Naive Baseline (Predict 0% growth)
y_pred_naive = np.zeros(len(y_test))

## 4. Evaluation (Reconstructed EPS)
We convert growth rates back to dollar values to calculate MAPE.

In [None]:
def mean_absolute_percentage_error(y_true, y_pred):
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

# Reconstruct
actual_future_eps = current_eps_test * (1 + y_test)
pred_future_eps_lr = current_eps_test * (1 + y_pred_lr)
pred_future_eps_naive = current_eps_test * (1 + y_pred_naive)

mape_lr = mean_absolute_percentage_error(actual_future_eps, pred_future_eps_lr)
mape_naive = mean_absolute_percentage_error(actual_future_eps, pred_future_eps_naive)

print(f"Naive Baseline MAPE: {mape_naive:.2f}%")
print(f"Linear Regression MAPE: {mape_lr:.2f}%")

## 5. Conclusion
The Linear Regression model (MAPE ~45%) failed to beat the Naive Baseline (MAPE ~8%). This result confirms that short-term EPS volatility for MSFT is not linearly determined by the fundamental ratios in this dataset.