# German Credit Risk

![credict risk](https://images.unsplash.com/photo-1553729459-efe14ef6055d?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=800&q=80)

## Table Contents
* [Introduction](#intro)
    * [About the Data](#about_data)
* [Required Modules and Data](#required)

## <a id="intro"></a>Introduction

Credit risk prediction is a critical process for financial institutions and lending organizations to evaluate the likelihood of potential borrowers defaulting on their loans. It is essential to identify and manage credit risk, as it has a significant impact on the financial performance and stability of these organizations. By accurately predicting credit risk, lenders can make informed decisions about whether to grant loans to applicants and determine appropriate interest rates. Additionally, credit risk prediction enables lenders to assess their overall risk exposure and develop strategies to mitigate it. With the increasing availability of data and the development of advanced analytics techniques, credit risk prediction has become more precise and efficient. As a result, it has become an integral part of credit risk management, helping financial institutions and lending organizations to make more informed decisions and minimize losses.

Within this context, this project aims to analyse data of loans from a german bank and build a predictive model capable of classify future credit takers in good or bad, so the bank can make an informed decision about credit approval/disapproval.

## <a id="about_data"></a>About the Data

The dataset used in this project is the [Statlog (German Credit Data) Data Set](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29) from UCI. It contains 20 attributes and 1000 instances and its values are encoded, so data translation will be necessary in the preprocessing step.

The dataset description is available in the following [link](https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc).

## <a id="required"></a>Required Modules and Data

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

import random

seed = 42
random.seed(seed)
np.random.seed(seed)

In [2]:
df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data",
    delimiter=r"\s+",
    header=None
)

df.columns = [
    "status_checking_account",
    "duration",
    "credit_history",
    "purpose",
    "credit_amount",
    "savings",
    "employed_since",
    "pct_installment_rate",
    "personal_status_and_sex",
    "other_debtors_guarantors",
    "present_residence_since",
    "property",
    "age",
    "other_installment_plans",
    "housing",
    "existing_credits",
    "job",
    "maintenance_people",
    "telephone",
    "foreign_worker",
    "good_or_bad"
]

In [3]:
df.head()

Unnamed: 0,status_checking_account,duration,credit_history,purpose,credit_amount,savings,employed_since,pct_installment_rate,personal_status_and_sex,other_debtors_guarantors,...,property,age,other_installment_plans,housing,existing_credits,job,maintenance_people,telephone,foreign_worker,good_or_bad
0,A11,6,A34,A43,1169,A65,A75,4,A93,A101,...,A121,67,A143,A152,2,A173,1,A192,A201,1
1,A12,48,A32,A43,5951,A61,A73,2,A92,A101,...,A121,22,A143,A152,1,A173,1,A191,A201,2
2,A14,12,A34,A46,2096,A61,A74,2,A93,A101,...,A121,49,A143,A152,1,A172,2,A191,A201,1
3,A11,42,A32,A42,7882,A61,A74,2,A93,A103,...,A122,45,A143,A153,1,A173,2,A191,A201,1
4,A11,24,A33,A40,4870,A61,A73,3,A93,A101,...,A124,53,A143,A153,2,A173,2,A191,A201,2


As we can see, some features contains encoded values that need to be translated into meaningful values for our purposes.

In [4]:
df.columns.values

array(['status_checking_account', 'duration', 'credit_history', 'purpose',
       'credit_amount', 'savings', 'employed_since',
       'pct_installment_rate', 'personal_status_and_sex',
       'other_debtors_guarantors', 'present_residence_since', 'property',
       'age', 'other_installment_plans', 'housing', 'existing_credits',
       'job', 'maintenance_people', 'telephone', 'foreign_worker',
       'good_or_bad'], dtype=object)

In [6]:
def status_checking_account(df: pd.DataFrame):
    mapping = {
        "A11": "< 0 DM",
        "A12": ">= 0 & < 200 DM",
        "A13": ">= 200 DM",
        "A14": "No Account"
    }

    df["status_checking_account"] = df["status_checking_account"].map(mapping)

def credit_history(df: pd.DataFrame):
    mapping = {
        "A30": "None Taken",
        "A31": "Paid Duly This Bank",
        "A32": "Paid Duly Existing",
        "A33": "Delayed in the Past",
        "A34": "Critical Account"
    }

    df["credit_history"] = df["credit_history"].map(mapping)

def purpose(df: pd.DataFrame):
    mapping = {
        "A40": "Car (new)",
        "A41": "Car (used)",
        "A42": "Furniture/Equipment",
        "A43": "Radio/TV",
        "A44": "Domestic Appliances",
        "A45": "Repairs",
        "A46": "Education",
        "A47": "Vacation",
        "A48": "Retraining",
        "A49": "Business",
        "A410": "Others"
    }

    df["purpose"] = df["purpose"].map(mapping)

def savings(df: pd.DataFrame):
    mapping = {
        "A61" : "< 100 DM",
        "A62" : ">= 100 DM & < 500 DM",
        "A63" : ">= 500 DM & < 1000 DM",
        "A64" : ">= 1000 DM",
        "A65" : "No Account"
    }

    df["savings"] = df["savings"].map(mapping)

def employed_since(df: pd.DataFrame):
    mapping = {
        "A71" : "Unemployed",
        "A72" : "< 1 year",
        "A73" : ">= 1 year & < 4 years",
        "A74" : "4 years >= & < 7 years",
        "A75" : ">= 7 years"
    }

    df["employed_since"] = df["employed_since"].map(mapping)

def personal_status_and_sex(df: pd.DataFrame):
    mapping = {
        "A91" : "Male - Divorced/Separated",
        "A92" : "Female - Divorced/Separated/Married",
        "A93" : "Male - Single",
        "A94" : "Male - Married/Widowed",
        "A95" : "Female - Single"
    }

    df["personal_status_and_sex"] = df["personal_status_and_sex"].map(mapping)

def other_debtors_guarantors(df: pd.DataFrame):
    mapping = {
        "A101" : "None",
        "A102" : "Co-applicant",
        "A103" : "Guarantor"
    }

    df["other_debtors_guarantors"] = df["other_debtors_guarantors"].map(mapping)

def property(df: pd.DataFrame):
    mapping = {
        "A121" : "Real E",
        "A122" : "if not A121 : building society savings agreement/life insurance",
        "A123" : "if not A121/A122 : car or other, not in attribute 6",
        "A124" : "unknown / no property"
    }

    df["property"] = df["property"].map(mapping)