# üíº Financial & HR Intelligence ‚Äî Full Analysis
**Author:** Hely Camargo | **Stack:** Python ¬∑ Statsmodels ¬∑ Scikit-learn ¬∑ Plotly

## üéØ Business Context
This analysis answers two critical C-Suite questions:
- **CFO:** What is the expected risk and return of the portfolio in 12 months?
- **CHRO:** Is there pay equity? What drives employee turnover?

**HR Dataset:** IBM Watson Analytics ‚Äî 1,470 employees | **Financial Data:** Yahoo Finance ‚Äî 5 years

## üì¶ Dependency Installation

In [None]:
!pip install pandas numpy plotly statsmodels scikit-learn yfinance scipy pmdarima matplotlib --quiet

## üìÇ Data Loading with QA

In [None]:
import pandas as pd, numpy as np, warnings
warnings.filterwarnings("ignore")

# Load financial data
prices = pd.read_csv("../output/financial_clean.csv", index_col=0, parse_dates=True)
print(f"Shape: {prices.shape}")
print(prices.tail(3))

## üìà Financial EDA ‚Äî Price Evolution

In [None]:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(14,8), facecolor="#061a40")
colors = ["#b9d6f2","#0353a4","#006daa","#4caf82"]
for ax, (ticker, color) in zip(axes.flat, zip(prices.columns, colors)):
    ax.plot(prices.index, prices[ticker], color=color, linewidth=2)
    ax.set_title(ticker, color="#e8f4fd", fontsize=13, fontweight="bold")
    ax.set_facecolor("#003559"); ax.tick_params(colors="#7ba7c9")
    for spine in ax.spines.values(): spine.set_edgecolor("rgba(185,214,242,0.2)")
plt.suptitle("Stock Price History (5Y)", color="#e8f4fd", fontsize=15, fontweight="bold")
plt.tight_layout(); plt.show()

## üîÆ ARIMA Model ‚Äî 12-Month Forecast

The ARIMA model (AutoRegressive Integrated Moving Average) captures trend and autocorrelation in financial time series. We apply the **ADF test** for stationarity and **auto_arima** for automatic parameter selection (p,d,q).

In [None]:
from pmdarima import auto_arima
from statsmodels.tsa.stattools import adfuller

ticker = "AAPL"
series = prices[ticker].dropna()

# ADF Test
adf = adfuller(series)
print(f"ADF p-value: {adf[1]:.4f} ‚Äî {'Estacionaria' if adf[1]<0.05 else 'No estacionaria'}")

# auto_arima
model = auto_arima(series, d=1, stepwise=True, suppress_warnings=True, error_action="ignore")
print(f"Best order: {model.order}")

# Forecast
fc, ci = model.predict(12, return_conf_int=True, alpha=0.05)
print(pd.DataFrame({"forecast":fc.round(2),"lower":ci[:,0].round(2),"upper":ci[:,1].round(2)}).head())

## üé≤ Monte Carlo Simulation

Monte Carlo simulates thousands of possible portfolio trajectories under a multivariate normal distribution, accounting for asset correlations. **VaR 95%** indicates the maximum expected loss with 95% confidence.

In [None]:
import numpy as np
monthly_ret = prices.pct_change().dropna()
mu, cov = monthly_ret.mean().values, monthly_ret.cov().values
weights = np.ones(4)/4
finals = []
np.random.seed(42)
for _ in range(5000):
    cum = 1.0
    for __ in range(12):
        r = weights @ np.random.multivariate_normal(mu, cov)
        cum *= (1+r)
    finals.append(cum)
finals = np.array(finals)
var95 = np.percentile(finals, 5) - 1
print(f"VaR 95%: {var95:.2%}")
print(f"Median return: {np.median(finals)-1:.2%}")
print(f"% Positive: {(finals>1).mean():.1%}")

import matplotlib.pyplot as plt
plt.figure(figsize=(10,5), facecolor="#061a40")
plt.hist(finals-1, bins=80, color="#0353a4", alpha=0.7)
plt.axvline(var95, color="#e05252", linestyle="--", label=f"VaR 95%: {var95:.1%}")
plt.title("Monte Carlo Distribution", color="#e8f4fd"); plt.legend()
plt.gca().set_facecolor("#003559"); plt.show()

## üë• IBM Watson HR Dataset ‚Äî Loading and EDA

In [None]:
df_hr = pd.read_csv("../data/WA_Fn-UseC_-HR-Employee-Attrition.csv")
df_hr["Attrition_num"] = (df_hr["Attrition"]=="Yes").astype(int)
print(f"Shape: {df_hr.shape}")
print(df_hr["Attrition"].value_counts())
print(f"\nAttrition rate: {df_hr['Attrition_num'].mean():.1%}")

## ‚ö†Ô∏è Attrition Analysis

The **attrition rate** measures what percentage of employees leave the organization. A high rate implies high replacement costs (~$15,000 USD per employee in tech roles).

In [None]:
import matplotlib.pyplot as plt
dept_att = df_hr.groupby("Department")["Attrition_num"].mean().sort_values()
fig, ax = plt.subplots(figsize=(8,4), facecolor="#061a40")
colors = ["#4caf82" if v<0.13 else ("#f0a500" if v<0.20 else "#e05252") for v in dept_att]
ax.barh(dept_att.index, dept_att*100, color=colors)
ax.axvline(13, color="#b9d6f2", linestyle="--", label="Benchmark 13%")
ax.set_xlabel("Attrition %", color="#b9d6f2"); ax.set_title("Attrition by Department", color="#e8f4fd")
ax.set_facecolor("#003559"); ax.tick_params(colors="#7ba7c9"); ax.legend()
plt.tight_layout(); plt.show()

## ‚öñÔ∏è Pay Gap and t-Test

We use the **Student's t-test** (Œ±=0.05) to determine whether salary differences between genders are statistically significant or attributable to chance.

In [None]:
from scipy import stats
males = df_hr[df_hr["Gender"]=="Male"]["MonthlyIncome"].dropna()
females = df_hr[df_hr["Gender"]=="Female"]["MonthlyIncome"].dropna()
t_stat, p_val = stats.ttest_ind(males, females)
gap_pct = (males.mean() - females.mean()) / males.mean() * 100
print(f"Male avg salary: ${males.mean():,.0f}")
print(f"Female avg salary: ${females.mean():,.0f}")
print(f"Gap: {gap_pct:.1f}%")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_val:.4f} ‚Äî {'SIGNIFICANT' if p_val<0.05 else 'NOT significant'} (alpha=0.05)")

## ü§ñ Attrition Predictive Model

**Logistic Regression** to predict which employees are most likely to leave. We use class_weight='balanced' to handle class imbalance (16% attrition vs 84% retention).

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler

features = ["Age","MonthlyIncome","TotalWorkingYears","JobLevel",
            "JobSatisfaction","EnvironmentSatisfaction","YearsAtCompany"]
X = df_hr[features].fillna(0)
y = df_hr["Attrition_num"]
scaler = StandardScaler()
X_s = scaler.fit_transform(X)
X_tr,X_te,y_tr,y_te = train_test_split(X_s,y,test_size=0.25,random_state=42,stratify=y)
lr = LogisticRegression(class_weight="balanced",max_iter=1000)
lr.fit(X_tr,y_tr)
print(classification_report(y_te, lr.predict(X_te)))

## üìã Executive Summary ‚Äî LinkedIn Findings

## üéØ Key Findings

**CFO Perspective:**
- ARIMA projects portfolio growth with 95% CI
- VaR 95%: maximum loss controlled and quantified
- Monte Carlo: majority of simulations end positive

**CHRO Perspective:**
- Sales: 20.6% attrition ‚Äî 7.6 pts above benchmark (13%)
- Pay gap not statistically significant (p>0.05)
- OverTime is the main attrition predictor

**Stack:** Python ¬∑ Statsmodels ¬∑ Scikit-learn ¬∑ Plotly ¬∑ yfinance