# Credit Prediction

## Context

### Credit risk
Credit Risk is the probable risk of loss resulting from a borrower's failure to repay a loan or meet contractual obligations. If a company offers credit to its client,then there is a risk that its clients may not pay their invoices.

### Types of Credit Risk
Good Risk: An investment that one believes is likely to be profitable. The term most often refers to a loan made to a creditworthy person or company. Good risks are considered exceptionally likely to be repaid.
Bad Risk: A loan that is unlikely to be repaid because of bad credit history, insufficient income, or some other reason. A bad risk increases the risk to the lender and the likelihood of default on the part of the borrower.

### Objective:
Based on the attributes, classify a person as good or bad credit risk.

### Dataset Description:
The dataset contains 1000 entries with 20 independent variables (7 numerical, 13 categorical) and 1 target variable prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes.The attributes are:

### Features
* Status of existing checking account, in Deutsche Mark.
* Duration in months
* Credit history (credits taken, paid back duly, delays, critical accounts)
* Purpose of the credit (car, television,...)
* Credit amount
* Status of savings account/bonds, in Deutsche Mark.
* Present employment, in number of years.
* Installment rate in percentage of disposable income
* Personal status (married, single,...) and sex
* Other debtors / guarantors
* Present residence since X years
* Property (e.g. real estate)
* Age in years
* Other installment plans (banks, stores)
* Housing (rent, own,...)
* Number of existing credits at this bank
* Job
* Number of people being liable to provide maintenance for
* Telephone (yes,no)
* Foreign worker (yes,no)

### Target
* Grant credit `good` or denied credit `bad`

In [None]:
from xautoml.util.datasets import openml_task

X_train, y_train = openml_task(31, 0, train=True)
X_train

## Start the Model Building

You load the data set in an AutoML tool you have found on the internet, to create a predictive model. After starting the optimization, the AutoML tool tests various possible models and evaluates how good each candidate is. In the meantime you have to wait for the program to finish its optimization.

In [None]:
import pickle
import autosklearn.classification
from autosklearn.metrics import accuracy

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=900,
    per_run_time_limit=10,
    tmp_folder='/opt/xautoml/autosklearn/output/',
    max_models_on_disc=None,
    delete_tmp_folder_after_terminate=False,
    metric=accuracy
)
automl.fit(X_train, y_train, dataset_name='credit-g')

with open(f'/opt/xautoml/autosklearn/output/autosklearn.pkl', 'wb') as f:
    pickle.dump(automl, f)

In [None]:
import pickle

with open(f'/opt/xautoml/autosklearn/output/autosklearn.pkl', 'rb') as f:
    automl = pickle.load(f)

After waiting for 15 minutes, you are presented with the following results:

### The score of the Final Model

Internally, the AutoML tool uses a measure to determine how good a candidate is, for example the number of correct predictions (accuracy). After the optimization, you want to test how good the model actually is before using it with patients. Therefore, you have hidden a part of the data set which you will now use to test how good the best model actually is:

In [None]:
from sklearn.metrics import accuracy_score

X_test, y_test = openml_task(31, 0, test=True)

predictions = automl.predict(X_test)
accuracy_score(y_test, predictions)

Meaning, that the generated model is able to predict that many new patients, it has never seen before, correctly.


### View the Models found by auto-sklearn

Besides the raw performance, the tool also tells you which the best models are

In [None]:
automl.leaderboard()

With this information you are good to go and can decide if you actually want to use the generated model.

## Load the Same Results in XAutoML

In [1]:
import pickle

from xautoml.main import XAutoML
from xautoml.adapter import import_auto_sklearn
from xautoml.util.datasets import openml_task


with open(f'/opt/xautoml/autosklearn/output/autosklearn.pkl', 'rb') as f:
    automl = pickle.load(f)

X_test, y_test = openml_task(31, 0, test=True)


rh = import_auto_sklearn(automl)
main = XAutoML(rh, X_test, y_test)

main



<xautoml.main.XAutoML at 0x7f20c0cb3790>