# German Credit Data Exploration

## Conceptual description

| Variable    | Descripción                                    | Tipo    | Valores       |
| ----------- | ---------------------------------------------- | ------- | ------------- |
| credit-risk | Etiqueta binaria del comportamiento crediticio | Binaria | `good`, `bad` |
| checking\_status        | Estado de la cuenta corriente del solicitante      | Categórica ordinal  | `<0`, `0<=X<200`, `>=200`, `no checking`                          |
| duration                | Duración del crédito en meses                      | Numérica (discreta) | 6, 12, 24, 48, 60, ...                                            |
| credit\_history         | Historial crediticio previo del solicitante        | Categórica nominal  | `critical/other`, `existing paid`, `delayed previously`, ...      |
| purpose                 | Finalidad declarada del crédito                    | Categórica nominal  | `radio/tv`, `education`, `furniture`, `car`, `business`, ...      |
| credit\_amount          | Monto del crédito solicitado                       | Numérica (continua) | 250 – 18424 (ej.: 1169, 5951, 9055, ...)                          |
| savings\_status         | Nivel de ahorro declarado                          | Categórica ordinal  | `<100`, `100<=X<500`, `500<=X<1000`, `>=1000`, `no known savings` |
| employment              | Tiempo en el empleo actual                         | Categórica ordinal  | `<1`, `1<=X<4`, `4<=X<7`, `>=7`, `unemployed`                     |
| installment\_commitment | Porcentaje del ingreso asignado a la cuota mensual | Categórica ordinal  | 1, 2, 3, 4                                                        |
| other\_parties          | Existencia de co-solicitante o avalista            | Categórica nominal  | `none`, `guarantor`, `co applicant`                               |
| residence\_since        | Años de residencia en el domicilio actual          | Numérica (discreta) | 1, 2, 3, 4                                                        |
| property\_magnitude     | Tipo de propiedad o garantía declarada             | Categórica nominal  | `real estate`, `life insurance`, `car`, `no known property`       |
| age                     | Edad del solicitante                               | Numérica (continua) | 19 – 75 (ej.: 22, 45, 53, ...)                                    |
| other\_payment\_plans   | Otros planes de pago disponibles                   | Categórica nominal  | `none`, `bank`, `stores`                                          |
| housing                 | Régimen de vivienda                                | Categórica nominal  | `own`, `rent`, `for free`                                         |
| existing\_credits       | Número de créditos vigentes                        | Numérica (discreta) | 1, 2, 3, 4                                                        |
| job                     | Tipo de ocupación                                  | Categórica nominal  | `skilled`, `unskilled resident`, `high qualif`, `unemp/unskilled` |
| num\_dependents         | Número de personas dependientes                    | Numérica (discreta) | 1, 2                                                              |
| own\_telephone          | Si el solicitante tiene teléfono                   | Binaria             | `yes`, `none`                                                     |
| foreign\_worker         | Si el solicitante es trabajador extranjero         | Binaria             | `yes`, `no`                                                       |
| sex                     | Género del solicitante                             | Categórica nominal  | `male`, `female`                                                  |
| marital\_status         | Estado civil del solicitante                       | Categórica nominal  | `single`, `div/dep/mar`, `div/sep`, `mar/wid`                     |


## Requirements

In [51]:
import os
import polars as pl

current_path = os.getcwd()
data_path = os.path.join(current_path, '..', '..', 'data')
filename = 'uci_german_credit.csv'
data_file_path = os.path.join(data_path, filename)

## Data view

In [52]:
df = pl.read_csv(source=data_file_path)

In [53]:
df

credit-risk,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,other_parties,residence_since,property_magnitude,age,other_payment_plans,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker,sex,marital_status
str,str,i64,str,str,i64,str,str,i64,str,i64,str,i64,str,str,i64,str,i64,str,str,str,str
"""good""","""<0""",6,"""critical/other existing credit""","""radio/tv""",1169,"""no known savings""",""">=7""",4,"""none""",4,"""real estate""",67,"""none""","""own""",2,"""skilled""",1,"""yes""","""yes""","""male""","""single"""
"""bad""","""0<=X<200""",48,"""existing paid""","""radio/tv""",5951,"""<100""","""1<=X<4""",2,"""none""",2,"""real estate""",22,"""none""","""own""",1,"""skilled""",1,"""none""","""yes""","""female""","""div/dep/mar"""
"""good""","""no checking""",12,"""critical/other existing credit""","""education""",2096,"""<100""","""4<=X<7""",2,"""none""",3,"""real estate""",49,"""none""","""own""",1,"""unskilled resident""",2,"""none""","""yes""","""male""","""single"""
"""good""","""<0""",42,"""existing paid""","""furniture/equipment""",7882,"""<100""","""4<=X<7""",2,"""guarantor""",4,"""life insurance""",45,"""none""","""for free""",1,"""skilled""",2,"""none""","""yes""","""male""","""single"""
"""bad""","""<0""",24,"""delayed previously""","""new car""",4870,"""<100""","""1<=X<4""",3,"""none""",4,"""no known property""",53,"""none""","""for free""",2,"""skilled""",2,"""none""","""yes""","""male""","""single"""
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""good""","""no checking""",12,"""existing paid""","""furniture/equipment""",1736,"""<100""","""4<=X<7""",3,"""none""",4,"""real estate""",31,"""none""","""own""",1,"""unskilled resident""",1,"""none""","""yes""","""female""","""div/dep/mar"""
"""good""","""<0""",30,"""existing paid""","""used car""",3857,"""<100""","""1<=X<4""",4,"""none""",4,"""life insurance""",40,"""none""","""own""",1,"""high qualif/self emp/mgmt""",1,"""yes""","""yes""","""male""","""div/sep"""
"""good""","""no checking""",12,"""existing paid""","""radio/tv""",804,"""<100""",""">=7""",4,"""none""",4,"""car""",38,"""none""","""own""",1,"""skilled""",1,"""none""","""yes""","""male""","""single"""
"""bad""","""<0""",45,"""existing paid""","""radio/tv""",1845,"""<100""","""1<=X<4""",4,"""none""",4,"""no known property""",23,"""none""","""for free""",1,"""skilled""",1,"""yes""","""yes""","""male""","""single"""


In [54]:
df.shape

(1000, 22)

## Data types

In [55]:
print('Data types:\n')

for col, dtype in zip(df.columns, df.dtypes):
    print(f'- {col}: {dtype}\n')

Data types:

- credit-risk: String

- checking_status: String

- duration: Int64

- credit_history: String

- purpose: String

- credit_amount: Int64

- savings_status: String

- employment: String

- installment_commitment: Int64

- other_parties: String

- residence_since: Int64

- property_magnitude: String

- age: Int64

- other_payment_plans: String

- housing: String

- existing_credits: Int64

- job: String

- num_dependents: Int64

- own_telephone: String

- foreign_worker: String

- sex: String

- marital_status: String



## Unique values

In [56]:
print('Unique values:\n')

for col in df.columns:
    try:
        unique_values =df [col].unique().to_numpy()
    except:
        unique_values = df[col].unique()

    print(f'- {col}: {unique_values}\n')

Unique values:

- credit-risk: ['good' 'bad']

- checking_status: ['0<=X<200' 'no checking' '>=200' '<0']

- duration: [ 4  5  6  7  8  9 10 11 12 13 14 15 16 18 20 21 22 24 26 27 28 30 33 36
 39 40 42 45 47 48 54 60 72]

- credit_history: ['all paid' 'delayed previously' 'no credits/all paid' 'existing paid'
 'critical/other existing credit']

- purpose: ['retraining' 'domestic appliance' 'other' 'repairs' 'business'
 'furniture/equipment' 'education' 'used car' 'new car' 'radio/tv']

- credit_amount: [  250   276   338   339   343   362   368   385   392   409   426   428
   433   448   454   458   484   518   522   571   585   590   601   609
   618   625   626   629   639   640   652   654   660   662   666   672
   674   682   683   684   685   691   697   700   701   707   708   709
   717   719   727   730   731   741   745   750   753   754   759   760
   763   766   776   781   783   790   795   797   802   804   806   836
   841   846   860   866   874   882   884   886   888

## Processing

### Encode categorical variables

In [64]:
exceptions = ['checking_status', 'savings_status', 'employment']
categorical_cols = [col for col in df.columns if df[col].dtype == pl.String and col not in exceptions]
encoding = {}

for col in categorical_cols: 
    print(col)
        
    unique_values_sorted = sorted(df[col].unique().to_list())
    new_values = list(range(0, len(unique_values_sorted)))
    encoding[col] = dict(zip(unique_values_sorted, new_values))
    df = df.with_columns(pl.col(col).replace_strict(encoding[col]))

In [None]:
encoding['checking_status'] = TO DO
encoding['savings_status'] = TO DO
encoding['employment'] = TO DO

In [None]:
encoding

In [None]:
df