# Credit Scoring

## 1. Introduction
This notebook builds a credit scoring model to predict whether a loan applicant is a good or bad credit risk. The project uses a Credit Scoring dataset and involves data preprocessing, exploratory data analysis, and the implementation of a classification model.

## 2. Data Loading and Initial Exploration

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('credit_scoring.csv')

# Display the first few rows
df.head()

In [None]:
# Get a summary of the dataframe
df.info()

## 3. Data Cleaning and Preprocessing

In [None]:
# The target variable 'Status' is coded as 1 for good and 2 for bad. We'll change it to 0 for good and 1 for bad.
df['Status'] = df['Status'].replace({1: 0, 2: 1})

# Check for missing values
df.isnull().sum()

The dataset appears to be clean with no missing values.

## 4. Exploratory Data Analysis (EDA)

In [None]:
# Target variable distribution
sns.countplot(x='Status', data=df)
plt.title('Distribution of Credit Status (0: Good, 1: Bad)')
plt.show()

In [None]:
# Correlation matrix
plt.figure(figsize=(14, 12))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

## 5. Model Building and Training

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Define features and target
X = df.drop('Status', axis=1)
y = df['Status']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Train a Random Forest Classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## 6. Model Evaluation

In [None]:
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print('\nClassification Report:')
print(classification_report(y_test, y_pred))
print('\nConfusion Matrix:')
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='g')
plt.show()

## 7. Conclusion
The Random Forest model provides a solid foundation for a credit scoring system. The model's performance, as detailed in the classification report, shows its ability to distinguish between good and bad credit risks. Further tuning and feature engineering could potentially improve the model's predictive power.