# üíº Intelig√™ncia Financeira e RH ‚Äî An√°lise Completa
**Autor:** Hely Camargo | **Stack:** Python ¬∑ Statsmodels ¬∑ Scikit-learn ¬∑ Plotly

## üéØ Contexto de Neg√≥cio
Esta an√°lise responde duas perguntas cr√≠ticas para a C-Suite:
- **CFO:** Qual √© o risco e retorno esperado do portf√≥lio em 12 meses?
- **CHRO:** Existe equidade salarial? O que impulsiona a rotatividade?

**Dataset RH:** IBM Watson Analytics ‚Äî 1.470 funcion√°rios | **Dados Financeiros:** Yahoo Finance ‚Äî 5 anos

## üì¶ Instala√ß√£o de Depend√™ncias

In [None]:
!pip install pandas numpy plotly statsmodels scikit-learn yfinance scipy pmdarima matplotlib --quiet

## üìÇ Carregamento de Dados com QA

In [None]:
import pandas as pd, numpy as np, warnings
warnings.filterwarnings("ignore")

# Load financial data
prices = pd.read_csv("../output/financial_clean.csv", index_col=0, parse_dates=True)
print(f"Shape: {prices.shape}")
print(prices.tail(3))

## üìà EDA Financeiro ‚Äî Evolu√ß√£o de Pre√ßos

In [None]:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(14,8), facecolor="#061a40")
colors = ["#b9d6f2","#0353a4","#006daa","#4caf82"]
for ax, (ticker, color) in zip(axes.flat, zip(prices.columns, colors)):
    ax.plot(prices.index, prices[ticker], color=color, linewidth=2)
    ax.set_title(ticker, color="#e8f4fd", fontsize=13, fontweight="bold")
    ax.set_facecolor("#003559"); ax.tick_params(colors="#7ba7c9")
    for spine in ax.spines.values(): spine.set_edgecolor("rgba(185,214,242,0.2)")
plt.suptitle("Stock Price History (5Y)", color="#e8f4fd", fontsize=15, fontweight="bold")
plt.tight_layout(); plt.show()

## üîÆ Modelo ARIMA ‚Äî Proje√ß√£o 12 Meses

O modelo ARIMA captura tend√™ncia e autocorrela√ß√£o em s√©ries temporais financeiras. Aplicamos o **teste ADF** para verificar estacionariedade e **auto_arima** para sele√ß√£o autom√°tica de par√¢metros (p,d,q).

In [None]:
from pmdarima import auto_arima
from statsmodels.tsa.stattools import adfuller

ticker = "AAPL"
series = prices[ticker].dropna()

# ADF Test
adf = adfuller(series)
print(f"ADF p-value: {adf[1]:.4f} ‚Äî {'Estacionaria' if adf[1]<0.05 else 'No estacionaria'}")

# auto_arima
model = auto_arima(series, d=1, stepwise=True, suppress_warnings=True, error_action="ignore")
print(f"Best order: {model.order}")

# Forecast
fc, ci = model.predict(12, return_conf_int=True, alpha=0.05)
print(pd.DataFrame({"forecast":fc.round(2),"lower":ci[:,0].round(2),"upper":ci[:,1].round(2)}).head())

## üé≤ Simula√ß√£o Monte Carlo

Monte Carlo simula milhares de trajet√≥rias poss√≠veis do portf√≥lio sob distribui√ß√£o normal multivariada. O **VaR 95%** indica a perda m√°xima esperada com 95% de confian√ßa.

In [None]:
import numpy as np
monthly_ret = prices.pct_change().dropna()
mu, cov = monthly_ret.mean().values, monthly_ret.cov().values
weights = np.ones(4)/4
finals = []
np.random.seed(42)
for _ in range(5000):
    cum = 1.0
    for __ in range(12):
        r = weights @ np.random.multivariate_normal(mu, cov)
        cum *= (1+r)
    finals.append(cum)
finals = np.array(finals)
var95 = np.percentile(finals, 5) - 1
print(f"VaR 95%: {var95:.2%}")
print(f"Median return: {np.median(finals)-1:.2%}")
print(f"% Positive: {(finals>1).mean():.1%}")

import matplotlib.pyplot as plt
plt.figure(figsize=(10,5), facecolor="#061a40")
plt.hist(finals-1, bins=80, color="#0353a4", alpha=0.7)
plt.axvline(var95, color="#e05252", linestyle="--", label=f"VaR 95%: {var95:.1%}")
plt.title("Monte Carlo Distribution", color="#e8f4fd"); plt.legend()
plt.gca().set_facecolor("#003559"); plt.show()

## üë• Dataset IBM Watson RH ‚Äî Carregamento e EDA

In [None]:
df_hr = pd.read_csv("../data/WA_Fn-UseC_-HR-Employee-Attrition.csv")
df_hr["Attrition_num"] = (df_hr["Attrition"]=="Yes").astype(int)
print(f"Shape: {df_hr.shape}")
print(df_hr["Attrition"].value_counts())
print(f"\nAttrition rate: {df_hr['Attrition_num'].mean():.1%}")

## ‚ö†Ô∏è An√°lise de Attrition

A **taxa de rotatividade** mede o percentual de funcion√°rios que saem da organiza√ß√£o. Alta rotatividade implica custos de reposi√ß√£o elevados (~$15.000 USD por funcion√°rio em tecnologia).

In [None]:
import matplotlib.pyplot as plt
dept_att = df_hr.groupby("Department")["Attrition_num"].mean().sort_values()
fig, ax = plt.subplots(figsize=(8,4), facecolor="#061a40")
colors = ["#4caf82" if v<0.13 else ("#f0a500" if v<0.20 else "#e05252") for v in dept_att]
ax.barh(dept_att.index, dept_att*100, color=colors)
ax.axvline(13, color="#b9d6f2", linestyle="--", label="Benchmark 13%")
ax.set_xlabel("Attrition %", color="#b9d6f2"); ax.set_title("Attrition by Department", color="#e8f4fd")
ax.set_facecolor("#003559"); ax.tick_params(colors="#7ba7c9"); ax.legend()
plt.tight_layout(); plt.show()

## ‚öñÔ∏è Lacuna Salarial e Teste t

Utilizamos o **teste t de Student** (Œ±=0,05) para determinar se as diferen√ßas salariais entre g√™neros s√£o estatisticamente significativas.

In [None]:
from scipy import stats
males = df_hr[df_hr["Gender"]=="Male"]["MonthlyIncome"].dropna()
females = df_hr[df_hr["Gender"]=="Female"]["MonthlyIncome"].dropna()
t_stat, p_val = stats.ttest_ind(males, females)
gap_pct = (males.mean() - females.mean()) / males.mean() * 100
print(f"Male avg salary: ${males.mean():,.0f}")
print(f"Female avg salary: ${females.mean():,.0f}")
print(f"Gap: {gap_pct:.1f}%")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_val:.4f} ‚Äî {'SIGNIFICANT' if p_val<0.05 else 'NOT significant'} (alpha=0.05)")

## ü§ñ Modelo Preditivo de Attrition

**Regress√£o Log√≠stica** para prever quais funcion√°rios t√™m maior probabilidade de sair. Usamos class_weight='balanced' para tratar o desbalanceamento de classes.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler

features = ["Age","MonthlyIncome","TotalWorkingYears","JobLevel",
            "JobSatisfaction","EnvironmentSatisfaction","YearsAtCompany"]
X = df_hr[features].fillna(0)
y = df_hr["Attrition_num"]
scaler = StandardScaler()
X_s = scaler.fit_transform(X)
X_tr,X_te,y_tr,y_te = train_test_split(X_s,y,test_size=0.25,random_state=42,stratify=y)
lr = LogisticRegression(class_weight="balanced",max_iter=1000)
lr.fit(X_tr,y_tr)
print(classification_report(y_te, lr.predict(X_te)))

## üìã Resumo Executivo ‚Äî Descobertas para LinkedIn

## üéØ Descobertas Principais

**Perspectiva CFO:**
- ARIMA projeta crescimento do portf√≥lio com IC de 95%
- VaR 95%: perda m√°xima controlada e quantificada
- Monte Carlo: maioria das simula√ß√µes terminam positivas

**Perspectiva CHRO:**
- Vendas: 20,6% attrition ‚Äî 7,6 pts acima do benchmark (13%)
- Lacuna salarial n√£o significativa estatisticamente (p>0,05)
- Horas extras √© o principal preditor de rotatividade

**Stack:** Python ¬∑ Statsmodels ¬∑ Scikit-learn ¬∑ Plotly ¬∑ yfinance