# üìä Notebook 3 ‚Äî An√°lises e Conclus√£o (Telecom X)
Este notebook carrega o **modelo final exportado** no Notebook 2 e produz o relat√≥rio de an√°lises:
- desempenho no conjunto de teste,
- fatores que mais influenciam a evas√£o (Permutation Importance p√≥s-OHE + import√¢ncias/coeficientes do modelo),
- recomenda√ß√µes de reten√ß√£o e pr√≥ximos passos.

## 1) Carregamento de artefatos

In [None]:

import os, joblib, numpy as np, pandas as pd
from sklearn.metrics import classification_report, confusion_matrix

MODEL_PATH = "artefatos/modelo_churn_telecomx.pkl"
ARTEFATOS_PATH = "artefatos/artefatos_modelagem.pkl"
TESTSET_PATH = "artefatos/conjunto_teste.pkl"

if not (os.path.exists(MODEL_PATH) and os.path.exists(ARTEFATOS_PATH) and os.path.exists(TESTSET_PATH)):
    raise FileNotFoundError("N√£o encontrei os artefatos esperados. Rode o Notebook 2 at√© a etapa de exporta√ß√£o.")

best_pipe = joblib.load(MODEL_PATH)
artefatos = joblib.load(ARTEFATOS_PATH)
teste = joblib.load(TESTSET_PATH)

X_test, y_test = teste["X_test"], teste["y_test"]

feature_names = artefatos.get("feature_names")
importances_df = artefatos.get("importances_df")
coefs_df = artefatos.get("coefs_df")

print("Origem do melhor modelo:", artefatos.get("origem_melhor_modelo"))


## 2) Desempenho no conjunto de teste

In [None]:

y_pred = best_pipe.predict(X_test)
print(classification_report(y_test, y_pred, digits=3))


## 3) Matriz de confus√£o

In [None]:

import matplotlib.pyplot as plt
import numpy as np
cm = confusion_matrix(y_test, y_pred)

fig = plt.figure()
plt.imshow(cm, interpolation='nearest')
plt.title('Matriz de Confus√£o ‚Äî Modelo Final')
plt.xlabel('Predito')
plt.ylabel('Real')
for (i, j), val in np.ndenumerate(cm):
    plt.text(j, i, int(val), ha='center', va='center')
plt.show()


## 4) Permutation Importance (p√≥s-OHE ‚Äî padr√£o)

In [None]:

from sklearn.inspection import permutation_importance

# Detecta automaticamente o nome do passo de pr√©-processamento
pre_step_name = None
for candidate in ["pre", "preprocess"]:
    if candidate in best_pipe.named_steps:
        pre_step_name = candidate
        break
if pre_step_name is None:
    raise RuntimeError("N√£o encontrei o passo de pr√©-processamento no pipeline ('pre' ou 'preprocess').")

pre = best_pipe.named_steps[pre_step_name]
model = best_pipe.named_steps["model"]

# Transforma X_test para o espa√ßo p√≥s-OHE
X_test_trans = pre.transform(X_test)

# Extrai nomes p√≥s-OHE
ohe = pre.named_transformers_["cat"]
num_names = np.array(pre.transformers_[0][2], dtype=object)  # colunas num√©ricas usadas no CT
cat_cols_ = pre.transformers_[1][2]                           # colunas categ√≥ricas usadas no CT
cat_names = ohe.get_feature_names_out(cat_cols_)
feature_names_perm = np.concatenate([num_names, cat_names])

# Import√¢ncia por permuta√ß√£o do estimador final (dados transformados)
perm = permutation_importance(
    model, X_test_trans, y_test,
    n_repeats=10, random_state=42, scoring="f1", n_jobs=-1
)

perm_df = (
    pd.DataFrame({
        "feature": feature_names_perm,
        "importance_mean": perm.importances_mean,
        "importance_std": perm.importances_std
    })
    .sort_values("importance_mean", ascending=False)
    .reset_index(drop=True)
)
display(perm_df.head(20))


## 5) Fatores do modelo (import√¢ncias/coeficientes)

In [None]:

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Se importances_df/coefs_df n√£o foram salvos, tenta reconstruir rapidamente
if feature_names is None:
    try:
        feature_names = feature_names_perm  # j√° calculados acima
    except NameError:
        feature_names = None

if importances_df is None and isinstance(best_pipe.named_steps['model'], RandomForestClassifier) and feature_names is not None:
    importances = best_pipe.named_steps['model'].feature_importances_
    importances_df = (pd.DataFrame({'feature': feature_names, 'importance': importances})
                      .sort_values('importance', ascending=False)
                      .reset_index(drop=True))

if coefs_df is None and isinstance(best_pipe.named_steps['model'], LogisticRegression) and feature_names is not None:
    coefs = best_pipe.named_steps['model'].coef_.ravel()
    coefs_df = (pd.DataFrame({'feature': feature_names, 'coef': coefs})
                .assign(abs_coef=lambda d: d['coef'].abs())
                .sort_values('abs_coef', ascending=False)
                .reset_index(drop=True))

display(importances_df.head(20) if importances_df is not None else coefs_df.head(20) if coefs_df is not None else "Sem dados de import√¢ncia/coeficientes.")


## 6) Gr√°ficos (Top-15 fatores)

In [None]:

import matplotlib.pyplot as plt

def plot_top(df, value_col, label_col="feature", top=15, title="Ranking"):
    if df is None or df.empty:
        print("Nada a plotar:", title)
        return
    top_df = df.nlargest(top, value_col).iloc[::-1]
    fig = plt.figure()
    plt.barh(top_df[label_col], top_df[value_col])
    plt.title(title)
    plt.xlabel(value_col)
    plt.ylabel(label_col)
    plt.tight_layout()
    plt.show()

# Permutation Importance
plot_top(perm_df, "importance_mean", title="Top-15 (Permutation Importance)")

# Espec√≠fico do modelo (se houver)
if 'importances_df' in globals() and importances_df is not None:
    plot_top(importances_df, "importance", title="Top-15 Import√¢ncias (Random Forest)")
if 'coefs_df' in globals() and coefs_df is not None:
    if "abs_coef" not in coefs_df.columns and "coef" in coefs_df.columns:
        coefs_df = coefs_df.assign(abs_coef=lambda d: d["coef"].abs())
    plot_top(coefs_df, "abs_coef", title="Top-15 |Coef| (Logistic Regression)")


## 7) Tabelas executivas (m√©tricas & compara√ß√µes)

In [None]:

metrics_tabela = artefatos.get("metrics_tabela")
if metrics_tabela is not None:
    display(metrics_tabela)
else:
    print("Matriz de m√©tricas agregadas n√£o encontrada nos artefatos (opcional).")


## 8) Conclus√µes e recomenda√ß√µes de reten√ß√£o

### 8.1 Destaques de desempenho
- **Modelo final**: `{{ origem_melhor_modelo }}` (do GridSearchCV ou Holdout)  
- **Resumo**: Comente **F1**, **Precis√£o** e **Recall** da classe *churn* (1) no teste.

### 8.2 Principais fatores associados √† evas√£o
- Use a **Permutation Importance** (acima) como base principal.
- Liste os **Top-5/10 fatores** e explique por que podem aumentar/diminuir a propens√£o ao churn.

### 8.3 Recomenda√ß√µes de reten√ß√£o
- **Segmenta√ß√£o proativa**: priorize clientes com alto risco previsto.
- **Ofertas direcionadas**: benef√≠cios/planos com maior ader√™ncia ao perfil de risco.
- **Experi√™ncia do cliente**: revisar pontos de atrito (suporte, faturamento, qualidade).
- **Ajuste de limiar**: alinhar threshold ao custo de FN/FP.

### 8.4 Pr√≥ximos passos
- Validar **ROC-AUC** e **PR-AUC**; rodar **valida√ß√£o cruzada** adicional.
- **Calibra√ß√£o** de probabilidades (ex.: `CalibratedClassifierCV`), **threshold tuning**.
- Enriquecer o dataset com novas vari√°veis relevantes.
