## Context
The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The link to the original dataset can be found below.

## Content
It is almost impossible to understand the original dataset due to its complicated system of categories and symbols. Thus, I wrote a small Python script to convert it into a readable CSV file. Several columns are simply ignored, because in my opinion either they are not important or their descriptions are obscure. The selected attributes are:

- Age (numeric)
- Sex (text: male, female)
- Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)
- Housing (text: own, rent, or free)
- Saving accounts (text - little, moderate, quite rich, rich)
- Checking account (numeric, in DM - Deutsch Mark)
- Credit amount (numeric, in DM)
- Duration (numeric, in month)
- Purpose(text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others
- Risk (Value target - Good or Bad Risk)

#Importation des bibliotheques necessaires

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Chargement des datasets

In [2]:
# installing gdown to download files from Google Drive

!pip install gdown



In [3]:

# importing gdown
import gdown


In [4]:
#Chargement des datasets via google drive


      # Chargement des donnees d'entrainement (german_credit_data_with_target.xls)

      #https://drive.google.com/file/d/1qmvv7yUKxzIVCqKHTYLFf55uKWfGHo72/view?usp=sharing

file1_id = "1qmvv7yUKxzIVCqKHTYLFf55uKWfGHo72" #  id actuel du fichier
#output_file: fichier de sortie
output_file1 = "german_credit_data_with_target.csv"
#download  the file: telechargement du fichier
gdown.download(id=file1_id, output=output_file1, quiet=False)

df_train = pd.read_csv(output_file1,index_col = 0)

Downloading...
From: https://drive.google.com/uc?id=1qmvv7yUKxzIVCqKHTYLFf55uKWfGHo72
To: /content/german_credit_data_with_target.csv
100%|██████████| 53.4k/53.4k [00:00<00:00, 16.3MB/s]


In [5]:
df_train.head()

Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose,Risk
0,67,male,2,own,,little,1169,6,radio/TV,good
1,22,female,2,own,little,moderate,5951,48,radio/TV,bad
2,49,male,1,own,little,,2096,12,education,good
3,45,male,2,free,little,little,7882,42,furniture/equipment,good
4,53,male,2,free,little,little,4870,24,car,bad


In [6]:
df_train.shape

(1000, 10)

In [7]:
# Chargement des donnees de test (german_credit_data.xls)


#https://drive.google.com/file/d/1iY9l5AuTRbr3_GyT_DJcJ1CiqHs9p1EF/view?usp=sharing

file2_id = "1iY9l5AuTRbr3_GyT_DJcJ1CiqHs9p1EF" #  id actuel du fichier
#output_file: fichier de sortie
output_file2 = "german_credit_data_test.csv"
#download  the file: telechargement du fichier
gdown.download(id=file2_id, output=output_file2, quiet=False)

df_test = pd.read_csv(output_file2, index_col=0)

Downloading...
From: https://drive.google.com/uc?id=1iY9l5AuTRbr3_GyT_DJcJ1CiqHs9p1EF
To: /content/german_credit_data_test.csv
100%|██████████| 49.7k/49.7k [00:00<00:00, 20.7MB/s]


In [8]:
df_test.head()

Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose
0,67,male,2,own,,little,1169,6,radio/TV
1,22,female,2,own,little,moderate,5951,48,radio/TV
2,49,male,1,own,little,,2096,12,education
3,45,male,2,free,little,little,7882,42,furniture/equipment
4,53,male,2,free,little,little,4870,24,car


In [9]:
df_test.shape

(1000, 9)

In [10]:
# Analyse