# 💳 Credit Risk Classification Project

This project aims to predict whether a customer is a **good** or **bad payer** based on financial and personal information, using the **Naive Bayes** algorithm.

📁 Dataset: `Credit.csv`
📁 New credit: `NewCredit.csv`

## 📦 Importing Libraries


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix



## 📊 1. Reading the Database

In [None]:

credit = pd.read_csv("Credit.csv")
print(credit.head())



## 🔍 2. Analysis and Coding of Categorical Variables

In [None]:

object_cols = credit.select_dtypes(include='object').columns
print("Colunas categóricas (object):")
print(object_cols)

#📘 Identify categorical type columns (object)

In [None]:

cols_to_encode = [
    'checking_status', 'credit_history', 'purpose', 'savings_status',
    'employment', 'personal_status', 'other_parties', 'property_magnitude',
    'other_payment_plans', 'housing', 'job', 'own_telephone',
    'foreign_worker', 'class'
]

#📘 Some columns have been removed to avoid unnecessary processing, but can be treated with One-Hot Encoding to retain the information


In [None]:

encoders = {}

for col in cols_to_encode:
    le = LabelEncoder()
    credit[col] = le.fit_transform(credit[col])
    encoders[col] = le

print(list(enumerate(le.classes_)))  # Mapeamento final

#📘 Encode categorical columns with LabelEncoder and store the encoders for future use

## 🧩 3. Creating Predictor and Class Arrays


In [None]:

previsores = credit.iloc[:, 0:20].values
classe = credit.iloc[:, 20].values



## ✂️ 4. Division of the Base into Training and Testing

In [None]:

X_train, X_test, y_train, y_test = train_test_split(previsores, classe, test_size=0.3, random_state=42)

#📘 Splits data into training (70%) and testing (30%) in a random and reproducible way

## 🤖 5. Naive Bayes Model Training

In [None]:

naive_bayes = GaussianNB()
naive_bayes.fit(X_train, y_train)



## 📈 6. Model Validation and Metrics

In [None]:

previsoes = naive_bayes.predict(X_test)
matrizconfusao = confusion_matrix(y_test, previsoes)
print("Matriz de Confusão:")
print(matrizconfusao)

acuracia = accuracy_score(y_test, previsoes)
print(f"Acurácia: {acuracia:.2f}")

#📘 Generates predictions on the test set, calculates the confusion matrix, and displays model accuracy


## 🆕 7. Forecast with New Record

In [None]:

novo_credit = pd.read_csv('NovoCredit.csv')
print(novo_credit.shape)

for col in cols_to_encode[:-1]:
    le2 = encoders[col]
    novo_credit[col] = le2.transform(novo_credit[col])

novo_credit_array = novo_credit.iloc[:, 0:20].values
nova_previsao = naive_bayes.predict(novo_credit_array)
print(f"Previsão para o novo crédito: {nova_previsao}")

#📘 Loads new data, applies already trained encoders and predicts credit risk


## ✅ Project Completion

In this project, we applied a **Credit Risk Classification** model with the **Naive Bayes** algorithm, using financial and personal data from customers.

🔧 **Steps developed:**
- Analysis and coding of categorical variables with `LabelEncoder`
- Separation of data into training and testing
- Training of the Naive Bayes model
- Evaluation with **confusion matrix** and **accuracy**
- Forecasting with new data (practical simulation with `NovoCredit.csv`)

🎯 **Objective**: predict whether a customer will be a **good** or **bad payer**, providing an automated basis for decision-making in credit granting.

This project is a solid foundation for real credit analysis applications and can be expanded with new algorithms, cross-validation techniques, and additional metrics.
