# Credit Score Analysis

## Contents <a id='back'></a>


* [Introduction](#intro)
* [Step 1. Description of the data](#data_review)
* [Step 2. Data preprocessing](#data_preprocessing)
* [Step 3. Data Classification](#data_classification)
* [Step 4. Hypotheses testing](#hypotheses)
* [Conclusions](#end)


# Introduction <a id='intro'></a>

The project consists of preparing a report for the lending division of a bank. You will need to find out if a client's marital status and number of children have an impact on loan default. The bank already has some data on the creditworthiness of customers.

It will consider creating a **credit score** for a potential customer. The **Credit Score** is used to assess a potential borrower's ability to repay their loan.


### Objetive

1. Pre-process the data from the bank's customer base, to ensure its quality prior to data analysis.

2. Analysis of the bank's customer data to answer research questions to create a credit score for a potential customer.

[Back to Contents](#back)

## Step 1. Description of the data <a id='data_review'></a>

Open the data and browse it.

In [268]:
# import pandas
import pandas as pd

# the file is read and stored in the variable df

try:
    df=pd.read_csv('/datasets/credit_scoring_eng.csv')
except:
    df=pd.read_csv('credit_scoring_eng.csv')

df

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,-8437.673028,42,bachelor's degree,0,married,0,F,employee,0,40620.102,purchase of the house
1,1,-4024.803754,36,secondary education,1,married,0,F,employee,0,17932.802,car purchase
2,0,-5623.422610,33,Secondary Education,1,married,0,M,employee,0,23341.752,purchase of the house
3,3,-4124.747207,32,secondary education,1,married,0,M,employee,0,42820.568,supplementary education
4,0,340266.072047,53,secondary education,1,civil partnership,1,F,retiree,0,25378.572,to have a wedding
...,...,...,...,...,...,...,...,...,...,...,...,...
21520,1,-4529.316663,43,secondary education,1,civil partnership,1,F,business,0,35966.698,housing transactions
21521,0,343937.404131,67,secondary education,1,married,0,F,retiree,0,24959.969,purchase of a car
21522,1,-2113.346888,38,secondary education,1,civil partnership,1,M,employee,1,14347.610,property
21523,3,-3112.481705,38,secondary education,1,married,0,M,employee,1,39054.888,buying my own car


## Data Exploration

**Description of the data
- `children` - the number of children in the family
- `days_employed` - work experience in days
- `dob_years` - the age of the client in years
- `education` - the client's education
- `education_id` - education identifier
- `family_status` - marital status
- `family_status_id` - marital status identifier
- `gender` - customer's gender
- `income_type` - type of employment
- `debt` - was there any debt on a loan payment?
- `total_income` - monthly income
- `purpose` - the purpose of obtaining a loan


In [269]:
# we look at how many rows and columns our dataset has

df.shape

(21525, 12)

In [270]:
df.head()

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,-8437.673028,42,bachelor's degree,0,married,0,F,employee,0,40620.102,purchase of the house
1,1,-4024.803754,36,secondary education,1,married,0,F,employee,0,17932.802,car purchase
2,0,-5623.42261,33,Secondary Education,1,married,0,M,employee,0,23341.752,purchase of the house
3,3,-4124.747207,32,secondary education,1,married,0,M,employee,0,42820.568,supplementary education
4,0,340266.072047,53,secondary education,1,civil partnership,1,F,retiree,0,25378.572,to have a wedding


It can be observed that the days_amployed column has negative numbers; in the education column the qualsification 'secondary education' is written in some with capitalized initials and some without and these duplicates need to be corrected; duplicate and missing data need to be checked.

In [271]:
# Obtain information about the data

df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21525 entries, 0 to 21524
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   children          21525 non-null  int64  
 1   days_employed     19351 non-null  float64
 2   dob_years         21525 non-null  int64  
 3   education         21525 non-null  object 
 4   education_id      21525 non-null  int64  
 5   family_status     21525 non-null  object 
 6   family_status_id  21525 non-null  int64  
 7   gender            21525 non-null  object 
 8   income_type       21525 non-null  object 
 9   debt              21525 non-null  int64  
 10  total_income      19351 non-null  float64
 11  purpose           21525 non-null  object 
dtypes: float64(2), int64(5), object(5)
memory usage: 2.0+ MB


The presence of missing data in some columns can be observed, as some columns do not show the 21525 non-missing data that should be present.

In [272]:

df[df['days_employed'].isna()]

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
12,0,,65,secondary education,1,civil partnership,1,M,retiree,0,,to have a wedding
26,0,,41,secondary education,1,married,0,M,civil servant,0,,education
29,0,,63,secondary education,1,unmarried,4,F,retiree,0,,building a real estate
41,0,,50,secondary education,1,married,0,F,civil servant,0,,second-hand car purchase
55,0,,54,secondary education,1,civil partnership,1,F,retiree,1,,to have a wedding
...,...,...,...,...,...,...,...,...,...,...,...,...
21489,2,,47,Secondary Education,1,married,0,M,business,0,,purchase of a car
21495,1,,50,secondary education,1,civil partnership,1,F,employee,0,,wedding ceremony
21497,0,,48,BACHELOR'S DEGREE,0,married,0,F,business,0,,building a property
21502,1,,42,secondary education,1,married,0,F,employee,0,,building a real estate


It is observed that the missing values in the days_employed and total_income columns are equal, therefore we can observe a symmetry and there is a probability that there is a relationship between these values.

In [273]:
# multiple conditions are applied to filter data and let's see the number of rows in the filtered table.

datos_faltantes=df['days_employed'].isna()
len(df[(datos_faltantes)&(df['total_income'].isna())])

2174

**Intermediate conclusion**

A test was made, imposing the first 50 values and we can observe that in the columns days_employed and total_income there are missing values, so it is considered that these missing data are related, therefore, it is considered to eliminate the missing data, if they represent a considerable sample population. We can observe that in the columns days_employed and total_income there are missing values, so we consider eliminating the missing data, if they represent a considerable population of the sample, as both columns have floating data, we need to check them to verify if they are filled with the mean or the median of them.

We will look for a relationship between the missing data of the two columns, where we will make a relation between the missing data between the two columns on the overall table df.

The percentage of missing data in the columns will also be calculated, with the intention of observing if they are a considerable number to fill them.

[Back to Contents](#back)

In [274]:
# The percentage of missing values compared to the complete data set is calculated.

datos_completos=len(df['education'])
porcentaje_ausentes= datos_faltantes.sum()*100 /datos_completos
porcentaje_ausentes

10.099883855981417

In [275]:
# We will investigate customers who have no data on the identified characteristic and the column with missing values.

df_nulos=df[(df['days_employed'].isna())&(df['total_income'].isna())]
df_nulos.head(50)

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
12,0,,65,secondary education,1,civil partnership,1,M,retiree,0,,to have a wedding
26,0,,41,secondary education,1,married,0,M,civil servant,0,,education
29,0,,63,secondary education,1,unmarried,4,F,retiree,0,,building a real estate
41,0,,50,secondary education,1,married,0,F,civil servant,0,,second-hand car purchase
55,0,,54,secondary education,1,civil partnership,1,F,retiree,1,,to have a wedding
65,0,,21,secondary education,1,unmarried,4,M,business,0,,transactions with commercial real estate
67,0,,52,bachelor's degree,0,married,0,F,retiree,0,,purchase of the house for my family
72,1,,32,bachelor's degree,0,married,0,M,civil servant,0,,transactions with commercial real estate
82,2,,50,bachelor's degree,0,married,0,F,employee,0,,housing
83,0,,52,secondary education,1,married,0,M,employee,0,,housing


In [276]:

df_nulos.describe()


Unnamed: 0,children,days_employed,dob_years,education_id,family_status_id,debt,total_income
count,2174.0,0.0,2174.0,2174.0,2174.0,2174.0,0.0
mean,0.552438,,43.632015,0.800828,0.975161,0.078197,
std,1.469356,,12.531481,0.530157,1.41822,0.268543,
min,-1.0,,0.0,0.0,0.0,0.0,
25%,0.0,,34.0,0.25,0.0,0.0,
50%,0.0,,43.0,1.0,0.0,0.0,
75%,1.0,,54.0,1.0,1.0,0.0,
max,20.0,,73.0,3.0,4.0,1.0,


The percentage that represents the missing data in the totality of the column was calculated and we can see that it represents a little more than 10%, which may be something considerable and if we could fill it.

As previously suspected, in the same rows where there is missing data in the 'days_employed' column there is missing data in the 'total_income' column, no pattern is found in these missing data.

In [277]:
# Checking the distribution over the entire data set
df_income_type=df['income_type'].value_counts()
df_education=df['education'].value_counts()

df_income_type, df_education

(employee                       11119
 business                        5085
 retiree                         3856
 civil servant                   1459
 entrepreneur                       2
 unemployed                         2
 student                            1
 paternity / maternity leave        1
 Name: income_type, dtype: int64,
 secondary education    13750
 bachelor's degree       4718
 SECONDARY EDUCATION      772
 Secondary Education      711
 some college             668
 BACHELOR'S DEGREE        274
 Bachelor's Degree        268
 primary education        250
 Some College              47
 SOME COLLEGE              29
 PRIMARY EDUCATION         17
 Primary Education         15
 graduate degree            4
 Graduate Degree            1
 GRADUATE DEGREE            1
 Name: education, dtype: int64)

**Intermediate conclusion**

[Is the distribution in the original dataset similar to the distribution in the filtered table? What does that mean for us].


We performed the comparison of some of the columns with respect to the missing data, in this case 'df_income_type' and 'df_education', where no pattern is found in the appearance of the missing data, so up to this point they would be considered random, however, we can reallize other tests to check their randomness.

[Back to Contents](#back)

In [278]:

df_family_status=df['family_status'].value_counts()
df_gender=df['gender'].value_counts()
df_children=df.value_counts('children',False)
df_dob_years=df.value_counts('dob_years',False)

df_family_status, df_gender, df_children , df_dob_years


(married              12380
 civil partnership     4177
 unmarried             2813
 divorced              1195
 widow / widower        960
 Name: family_status, dtype: int64,
 F      14236
 M       7288
 XNA        1
 Name: gender, dtype: int64,
 children
  0     14149
  1      4818
  2      2055
  3       330
  20       76
 -1        47
  4        41
  5         9
 dtype: int64,
 dob_years
 35    617
 40    609
 41    607
 34    603
 38    598
 42    597
 33    581
 39    573
 31    560
 36    555
 44    547
 29    545
 30    540
 48    538
 37    537
 50    514
 43    513
 32    510
 49    508
 28    503
 45    497
 27    493
 56    487
 52    484
 47    480
 54    479
 46    475
 58    461
 57    460
 53    459
 51    448
 59    444
 55    443
 26    408
 60    377
 25    357
 61    355
 62    352
 63    269
 64    265
 24    264
 23    254
 65    194
 22    183
 66    183
 67    167
 21    111
 0     101
 68     99
 69     85
 70     65
 71     58
 20     51
 72     33
 19     14


**Conclusions** 

After having analyzed the data we did not find any pattern between the missing data and the other columns, so we can say that the missing data is arbitrary. however, one reason that could cause the missing data could be that there are several duplicate data, in the column 'education' you can see several values of Secondary Education, written in several ways all in lowercase or uppercase, etc., which in the Dataframe even if it is the same value is called differently by these variables, which because of this there may be missing data because they are duplicated.

During the analysis of the missing data, we could observe several implicit duplicate data in the 'education' column, so we have to remove the duplicate data and correct them, unifying the values in this column, so after that we could remove the missing data.

[Back to Contents](#back)

# Step 2. Data preprocessing <a id='data_preprocessing'></a>


In [280]:
# Values in the education column to check if spelling correction will be necessary and what exactly needs to be corrected
df['education'].value_counts()

secondary education    13750
bachelor's degree       4718
SECONDARY EDUCATION      772
Secondary Education      711
some college             668
BACHELOR'S DEGREE        274
Bachelor's Degree        268
primary education        250
Some College              47
SOME COLLEGE              29
PRIMARY EDUCATION         17
Primary Education         15
graduate degree            4
Graduate Degree            1
GRADUATE DEGREE            1
Name: education, dtype: int64

In [281]:

df['education']=df['education'].replace('primary education','Primary Education')
df['education']=df['education'].replace('PRIMARY EDUCATION','Primary Education')
df['education']=df['education'].replace('secondary education','Secondary Education')
df['education']=df['education'].replace('SECONDARY EDUCATION','Secondary Education')
df['education']=df['education'].replace("bachelor's degree","Bachelor's Degree")
df['education']=df['education'].replace("BACHELOR'S DEGREE","Bachelor's Degree")
df['education']=df['education'].replace('graduate degree','Graduate Degree')
df['education']=df['education'].replace('GRADUATE DEGREE','Graduate Degree')
df['education']=df['education'].replace('SOME COLLEGE','Some College')
df['education']=df['education'].replace('some college','Some College')


In [282]:
# Check all the values in the column to make sure we have corrected them.
df['education'].value_counts()


Secondary Education    15233
Bachelor's Degree       5260
Some College             744
Primary Education        282
Graduate Degree            6
Name: education, dtype: int64

In [283]:
# Distribution of the values in the `children` column
df['children'].value_counts()

 0     14149
 1      4818
 2      2055
 3       330
 20       76
-1        47
 4        41
 5         9
Name: children, dtype: int64

In this case, there is a possibility that the data in the 'children' column of -1 and 20 may have been entered erroneously, which could be 1 and 2, which could be considered as implicit duplicates.

In [284]:
# [percentage of problematic data] 
datos_problematicos_children=df['children'][df['children']==-1].count()+df['children'][df['children']==20].count()
porcentaje_datos_problematicos_children= datos_problematicos_children*100/datos_completos
porcentaje_datos_problematicos_children

0.5714285714285714

In [285]:

df['children']=df['children'].replace(-1,1)
df['children']=df['children'].replace(20,2)

In [286]:
# Check the `children` column again to make sure everything is fixed

df['children'].value_counts()

0    14149
1     4865
2     2131
3      330
4       41
5        9
Name: children, dtype: int64

To detect unique values and their counts, we would use the value_counts() method, since as we could see in the column `days_employed` there are negative data we would have to correct them.

In [287]:
# Problematic data in `days_employed`

df['days_employed'].value_counts()


-327.685916     1
-1580.622577    1
-4122.460569    1
-2828.237691    1
-2636.090517    1
               ..
-7120.517564    1
-2146.884040    1
-881.454684     1
-794.666350     1
-3382.113891    1
Name: days_employed, Length: 19351, dtype: int64

In [288]:
# Calculate the percentage
datos_problematicos_days_employed=df['days_employed'].value_counts().count()
porcentaje_datos_problematicos_days_employed= datos_problematicos_days_employed*100/datos_completos
porcentaje_datos_problematicos_days_employed

89.90011614401858

In [289]:
df[df['days_employed']<0]

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,-8437.673028,42,Bachelor's Degree,0,married,0,F,employee,0,40620.102,purchase of the house
1,1,-4024.803754,36,Secondary Education,1,married,0,F,employee,0,17932.802,car purchase
2,0,-5623.422610,33,Secondary Education,1,married,0,M,employee,0,23341.752,purchase of the house
3,3,-4124.747207,32,Secondary Education,1,married,0,M,employee,0,42820.568,supplementary education
5,0,-926.185831,27,Bachelor's Degree,0,civil partnership,1,M,business,0,40922.170,purchase of the house
...,...,...,...,...,...,...,...,...,...,...,...,...
21519,1,-2351.431934,37,Graduate Degree,4,divorced,3,M,employee,0,18551.846,buy commercial real estate
21520,1,-4529.316663,43,Secondary Education,1,civil partnership,1,F,business,0,35966.698,housing transactions
21522,1,-2113.346888,38,Secondary Education,1,civil partnership,1,M,employee,1,14347.610,property
21523,3,-3112.481705,38,Secondary Education,1,married,0,M,employee,1,39054.888,buying my own car


In [290]:
df[df['days_employed']<0].count()

children            15906
days_employed       15906
dob_years           15906
education           15906
education_id        15906
family_status       15906
family_status_id    15906
gender              15906
income_type         15906
debt                15906
total_income        15906
purpose             15906
dtype: int64

Now that we have determined the number of unique data, we can see that the percentage of problematic data is quite high, being more than 50%, which is a representative amount of the population of the column, so we have to correct them, these negative data may be the result of wrong data entry with the wrong sign, considering that only the data with negative sign represent a little more than 50%.

In [291]:
# Problematic values.
df['days_employed']=abs(df['days_employed'])
df['days_employed']

0          8437.673028
1          4024.803754
2          5623.422610
3          4124.747207
4        340266.072047
             ...      
21520      4529.316663
21521    343937.404131
21522      2113.346888
21523      3112.481705
21524      1984.507589
Name: days_employed, Length: 21525, dtype: float64

In [292]:
# Check the result - make sure it's fixed
df[df['days_employed']<0].count()

children            0
days_employed       0
dob_years           0
education           0
education_id        0
family_status       0
family_status_id    0
gender              0
income_type         0
debt                0
total_income        0
purpose             0
dtype: int64

In [293]:
# Check the result - make sure it's fixed
df[df['days_employed']<0].value_counts().count()

0

In [294]:
# Check `dob_years` for suspicious values and count percentage

df['dob_years'].value_counts()

35    617
40    609
41    607
34    603
38    598
42    597
33    581
39    573
31    560
36    555
44    547
29    545
30    540
48    538
37    537
50    514
43    513
32    510
49    508
28    503
45    497
27    493
56    487
52    484
47    480
54    479
46    475
58    461
57    460
53    459
51    448
59    444
55    443
26    408
60    377
25    357
61    355
62    352
63    269
64    265
24    264
23    254
65    194
66    183
22    183
67    167
21    111
0     101
68     99
69     85
70     65
71     58
20     51
72     33
19     14
73      8
74      6
75      1
Name: dob_years, dtype: int64

In [295]:
df['dob_years'].describe()

count    21525.000000
mean        43.293380
std         12.574584
min          0.000000
25%         33.000000
50%         42.000000
75%         53.000000
max         75.000000
Name: dob_years, dtype: float64

In [296]:
# Check `dob_years` for suspicious values and count percentage
df[df['dob_years']==0]

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
99,0,346541.618895,0,Secondary Education,1,married,0,F,retiree,0,11406.644,car
149,0,2664.273168,0,Secondary Education,1,divorced,3,F,employee,0,11228.230,housing transactions
270,3,1872.663186,0,Secondary Education,1,married,0,F,employee,0,16346.633,housing renovation
578,0,397856.565013,0,Secondary Education,1,married,0,F,retiree,0,15619.310,construction of own property
1040,0,1158.029561,0,Bachelor's Degree,0,divorced,3,F,business,0,48639.062,to own a car
...,...,...,...,...,...,...,...,...,...,...,...,...
19829,0,,0,Secondary Education,1,married,0,F,employee,0,,housing
20462,0,338734.868540,0,Secondary Education,1,married,0,F,retiree,0,41471.027,purchase of my own house
20577,0,331741.271455,0,Secondary Education,1,unmarried,4,F,retiree,0,20766.202,property
21179,2,108.967042,0,Bachelor's Degree,0,married,0,M,business,0,38512.321,building a real estate


In [297]:

datos_problematicos_dob_years=df[df['dob_years']==0].value_counts().count()
porcentaje_problematicos_dob_years=datos_problematicos_dob_years*100/datos_completos
porcentaje_problematicos_dob_years

0.42276422764227645

In the table we can see that the mean and median are similar, however, because there are no significant outliers, we will replace the data in the column 'dob_years' using fillna() with the mean in integers using int()

In [298]:
dob_years_media=df['dob_years'].mean()
dob_years_mediana=df['dob_years'].median()

dob_years_media , dob_years_mediana

(43.29337979094077, 42.0)

In [299]:

df['dob_years']=df['dob_years'].replace(0,int(dob_years_media))

In [300]:

df[df['dob_years']==0].value_counts().count()

0

In [301]:

df['dob_years'].value_counts()


35    617
43    614
40    609
41    607
34    603
38    598
42    597
33    581
39    573
31    560
36    555
44    547
29    545
30    540
48    538
37    537
50    514
32    510
49    508
28    503
45    497
27    493
56    487
52    484
47    480
54    479
46    475
58    461
57    460
53    459
51    448
59    444
55    443
26    408
60    377
25    357
61    355
62    352
63    269
64    265
24    264
23    254
65    194
22    183
66    183
67    167
21    111
68     99
69     85
70     65
71     58
20     51
72     33
19     14
73      8
74      6
75      1
Name: dob_years, dtype: int64

In [302]:
df['dob_years'].describe()

count    21525.000000
mean        43.495145
std         12.218213
min         19.000000
25%         34.000000
50%         43.000000
75%         53.000000
max         75.000000
Name: dob_years, dtype: float64

In this case when reviewing the data in the column 'family_status' no problematic data is found, since we have obj64 data types, there is no missing or duplicate data.

In [303]:

df['family_status'].value_counts()

married              12380
civil partnership     4177
unmarried             2813
divorced              1195
widow / widower        960
Name: family_status, dtype: int64

In [306]:

df['gender'].value_counts()

F      14236
M       7288
XNA        1
Name: gender, dtype: int64

In [307]:
df[df['gender']=='XNA']

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
10701,0,2358.600502,24,Some College,2,civil partnership,1,XNA,business,0,32624.825,buy real estate


In [308]:
datos_problematicos_gender=df[df['gender']=='XNA'].value_counts().count()
porcentaje_problematicos_gender=datos_problematicos_gender*100/datos_completos
porcentaje_problematicos_gender

0.004645760743321719

In [309]:

df['gender']=df['gender'].replace('XNA','F')

In [310]:

df['gender'].value_counts()

F    14237
M     7288
Name: gender, dtype: int64

In [311]:

df['income_type'].value_counts()

employee                       11119
business                        5085
retiree                         3856
civil servant                   1459
entrepreneur                       2
unemployed                         2
student                            1
paternity / maternity leave        1
Name: income_type, dtype: int64

In [314]:
# Check duplicates
df.duplicated().sum()

71

In [315]:
# Remove duplicates
df=df.drop_duplicates()
df

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,8437.673028,42,Bachelor's Degree,0,married,0,F,employee,0,40620.102,purchase of the house
1,1,4024.803754,36,Secondary Education,1,married,0,F,employee,0,17932.802,car purchase
2,0,5623.422610,33,Secondary Education,1,married,0,M,employee,0,23341.752,purchase of the house
3,3,4124.747207,32,Secondary Education,1,married,0,M,employee,0,42820.568,supplementary education
4,0,340266.072047,53,Secondary Education,1,civil partnership,1,F,retiree,0,25378.572,to have a wedding
...,...,...,...,...,...,...,...,...,...,...,...,...
21520,1,4529.316663,43,Secondary Education,1,civil partnership,1,F,business,0,35966.698,housing transactions
21521,0,343937.404131,67,Secondary Education,1,married,0,F,retiree,0,24959.969,purchase of a car
21522,1,2113.346888,38,Secondary Education,1,civil partnership,1,M,employee,1,14347.610,property
21523,3,3112.481705,38,Secondary Education,1,married,0,M,employee,1,39054.888,buying my own car


In [316]:

df.duplicated().sum()

0

In [317]:
# Check the size of the dataset you have now, after you have executed these first manipulations

df.shape

(21454, 12)

Duplicate data were eliminated, which decreased the data by 0.32985%, so it is said that the amount of data is not considerable, since 99.67015% of the data prevailed.

## Working with missing values

In [318]:
# dictionaries
diccionario_education={
    "Bachelor's Degree":0,
    "education":1,
    "Some College":2,
    "Primary Education":3,
    "Primary Education":4
}

diccionario_family_status={
    "married":0,
    "civil partnership":1,
    "widow / widower":2,
    "divorced":3,
    "unmarried":4
}

diccionario_education , diccionario_family_status

({"Bachelor's Degree": 0,
  'education': 1,
  'Some College': 2,
  'Primary Education': 4},
 {'married': 0,
  'civil partnership': 1,
  'widow / widower': 2,
  'divorced': 3,
  'unmarried': 4})

### Restore missing values in `total_income`.

In [319]:
# Function that calculates the age category
dob_years=df['dob_years']
def age_group_category (dob_years):
    if dob_years <= 18:
        return 'children'
    elif dob_years <= 26:
        return 'joven'
    elif dob_years <= 59:
        return 'adulto'
    else:
        return 'vejez'

In [320]:
# Test the function
print(age_group_category(17))
print(age_group_category(19))
print(age_group_category(32))
print(age_group_category(45))
print(age_group_category(65))

children
joven
adulto
adulto
vejez


In [321]:
# Create a new column based on the function
df.loc[:,'age_group']=df['dob_years'].apply(age_group_category)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


In [322]:
# Check how the values in the new column
df.head()


Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose,age_group
0,1,8437.673028,42,Bachelor's Degree,0,married,0,F,employee,0,40620.102,purchase of the house,adulto
1,1,4024.803754,36,Secondary Education,1,married,0,F,employee,0,17932.802,car purchase,adulto
2,0,5623.42261,33,Secondary Education,1,married,0,M,employee,0,23341.752,purchase of the house,adulto
3,3,4124.747207,32,Secondary Education,1,married,0,M,employee,0,42820.568,supplementary education,adulto
4,0,340266.072047,53,Secondary Education,1,civil partnership,1,F,retiree,0,25378.572,to have a wedding,adulto


In [323]:
# Create a table with no missing values and display some of its rows to make sure it looks good.
tabla_sin_ausentes=df.dropna(subset=['total_income'])
tabla_sin_ausentes.sample(5)

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose,age_group
17297,0,1584.490205,26,Some College,2,unmarried,4,M,business,0,29534.163,housing,joven
8695,0,395253.789925,54,Secondary Education,1,civil partnership,1,M,retiree,1,14375.058,building a property,adulto
4317,0,357.207058,48,Secondary Education,1,widow / widower,2,F,business,0,10195.465,purchase of the house for my family,adulto
12617,0,358166.492161,63,Secondary Education,1,widow / widower,2,F,retiree,0,39858.448,purchase of my own house,vejez
9141,0,356975.96045,66,Secondary Education,1,married,0,F,retiree,0,42790.466,to become educated,vejez


In [324]:
# Examine the average values of income in terms of the factors you identified.
df.groupby('age_group')['total_income'].mean()

age_group
adulto    27605.320834
joven     23912.098804
vejez     23021.639994
Name: total_income, dtype: float64

In [325]:
# Examine the median income values based on the factors you identified.
df.groupby('age_group')['total_income'].median()

age_group
adulto    23915.5280
joven     21864.1315
vejez     19761.4250
Name: total_income, dtype: float64

In [326]:
df.groupby('age_group')['total_income'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
age_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
adulto,15610.0,27605.320834,16974.35961,3306.762,16971.72025,23915.528,33737.364,362496.645
joven,1486.0,23912.098804,11541.770501,5217.034,16042.288,21864.1315,28921.271,105400.683
vejez,2255.0,23021.639994,14930.221646,3471.216,13725.6055,19761.425,27953.5005,274402.943


In [327]:
joven_mediana=df[df['age_group']=='joven']['total_income'].median()
adulto_mediana=df[df['age_group']=='adulto']['total_income'].median()
vejez_mediana=df[df['age_group']=='vejez']['total_income'].median()

joven_mediana , adulto_mediana , vejez_mediana

(21864.1315, 23915.528, 19761.425)

In [328]:
# Write a function that we will use to fill in the missing values.
def eliminar_ausentes(row):
    if pd.isna(row['total_income']):
        if row['age_group']=='joven':
            return joven_mediana
        if row['age_group']=='adulto':
            return adulto_mediana
        if row['age_group']=='vejez':
            return vejez_mediana
        
    return row['total_income']

In [329]:
# Check if it works
eliminar_ausentes(df.iloc[0])

40620.102

In [330]:
df['total_income'].isna().sum()

2103

In [331]:
# Apply to each row
df.loc[:,'total_income']=df.apply(eliminar_ausentes, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


In [332]:
# Check if we have any errors
df['total_income'].isna().sum()

0

In [334]:
# Check the number of entries in the columns
df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 21454 entries, 0 to 21524
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   children          21454 non-null  int64  
 1   days_employed     19351 non-null  float64
 2   dob_years         21454 non-null  int64  
 3   education         21454 non-null  object 
 4   education_id      21454 non-null  int64  
 5   family_status     21454 non-null  object 
 6   family_status_id  21454 non-null  int64  
 7   gender            21454 non-null  object 
 8   income_type       21454 non-null  object 
 9   debt              21454 non-null  int64  
 10  total_income      21454 non-null  float64
 11  purpose           21454 non-null  object 
 12  age_group         21454 non-null  object 
dtypes: float64(2), int64(5), object(6)
memory usage: 2.3+ MB


### Restore values in `days_employed`.

In [335]:
df.groupby('age_group')['days_employed'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
age_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
adulto,15610.0,41393.21659,112467.65014,24.141633,919.097008,2067.789981,4308.135603,401755.400475
joven,1486.0,1723.933153,16293.833068,51.496885,410.165239,859.95969,1374.244885,389397.167577
vejez,2255.0,286544.143436,150329.438811,100.309421,332434.038898,355229.618218,377820.900541,401715.811749


In [336]:
df.groupby('income_type')['days_employed'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
income_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
business,4577.0,2111.524398,2048.448594,30.195337,685.687432,1547.382223,2876.64852,17615.563266
civil servant,1312.0,3399.896902,2788.371363,39.95417,1257.171811,2689.368353,4759.39926,15193.032201
employee,10014.0,2326.499216,2307.924129,24.141633,746.027361,1574.202821,3108.123025,18388.949901
entrepreneur,1.0,520.848083,,520.848083,520.848083,520.848083,520.848083,520.848083
paternity / maternity leave,1.0,3296.759962,,3296.759962,3296.759962,3296.759962,3296.759962,3296.759962
retiree,3443.0,365003.491245,21069.606065,328728.720605,346649.346146,365213.306266,383231.396871,401755.400475
student,1.0,578.751554,,578.751554,578.751554,578.751554,578.751554,578.751554
unemployed,2.0,366413.652744,40855.478519,337524.466835,351969.05979,366413.652744,380858.245699,395302.838654


In [337]:
# Distribution of the medians of `days_employed` as a function of the parameters identified
df.groupby('income_type')['days_employed'].mean()

income_type
business                         2111.524398
civil servant                    3399.896902
employee                         2326.499216
entrepreneur                      520.848083
paternity / maternity leave      3296.759962
retiree                        365003.491245
student                           578.751554
unemployed                     366413.652744
Name: days_employed, dtype: float64

In [338]:
# Distribution of `days_employed` averages as a function of the parameters identified
df.groupby('income_type')['days_employed'].median()

income_type
business                         1547.382223
civil servant                    2689.368353
employee                         1574.202821
entrepreneur                      520.848083
paternity / maternity leave      3296.759962
retiree                        365213.306266
student                           578.751554
unemployed                     366413.652744
Name: days_employed, dtype: float64

In [339]:
# Function that calculates means or medians (depending on your decision) according to the identified parameter.

def calculo_med(row):
    if pd.isna(row['days_employed']):
        if row['income_type']=='business':
            mediana_days_employed=df[df['income_type']=='business']['days_employed'].median()
            return mediana_days_employed
        if row['income_type']=='civil servant':
            mediana_days_employed=df[df['income_type']=='civil servant']['days_employed'].median()
            return mediana_days_employed
        if row['income_type']=='employee':
            mediana_days_employed=df[df['income_type']=='employee']['days_employed'].median()
            return mediana_days_employed
        if row['income_type']=='entrepreneur':
            mediana_days_employed=df[df['income_type']=='entrepreneur']['days_employed'].median()
            return mediana_days_employed
        if row['income_type']=='paternity / maternity leave':
            mediana_days_employed=df[df['income_type']=='paternity / maternity leave']['days_employed'].median()
            return mediana_days_employed
        if row['income_type']=='retiree':
            mediana_days_employed=df[df['income_type']=='retiree']['days_employed'].median()
            return mediana_days_employed
        if row['income_type']=='student':
            mediana_days_employed=df[df['income_type']=='student']['days_employed'].median()
            return mediana_days_employed
        if row['income_type']=='unemployed':
            mediana_days_employed=df[df['income_type']=='unemployed']['days_employed'].median()
            return mediana_days_employed
        
    return row['days_employed']
        

In [340]:
# Check that the function works
calculo_med(df.iloc[12])

365213.3062657312

In [341]:

df.loc[:,'days_employed']=df.apply(calculo_med, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


In [342]:
# Check if the function worked

df['days_employed'].isna().sum()

0

In [344]:
# Check entries in all columns: make sure we have corrected all missing values.
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21454 entries, 0 to 21524
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   children          21454 non-null  int64  
 1   days_employed     21454 non-null  float64
 2   dob_years         21454 non-null  int64  
 3   education         21454 non-null  object 
 4   education_id      21454 non-null  int64  
 5   family_status     21454 non-null  object 
 6   family_status_id  21454 non-null  int64  
 7   gender            21454 non-null  object 
 8   income_type       21454 non-null  object 
 9   debt              21454 non-null  int64  
 10  total_income      21454 non-null  float64
 11  purpose           21454 non-null  object 
 12  age_group         21454 non-null  object 
dtypes: float64(2), int64(5), object(6)
memory usage: 2.3+ MB


[Back to Contents](#back)

## Step 3. Data Classification <a id='data_classification'></a>




In [345]:
# Data values selected for classification
df['purpose'].value_counts() , df['education'].value_counts() , df['family_status'].value_counts() , df['gender'].value_counts() , df['income_type'].value_counts(), df['age_group'].value_counts()


(wedding ceremony                            791
 having a wedding                            768
 to have a wedding                           765
 real estate transactions                    675
 buy commercial real estate                  661
 housing transactions                        652
 buying property for renting out             651
 transactions with commercial real estate    650
 housing                                     646
 purchase of the house                       646
 purchase of the house for my family         638
 construction of own property                635
 property                                    633
 transactions with my real estate            627
 building a real estate                      624
 buy real estate                             621
 purchase of my own house                    620
 building a property                         619
 housing renovation                          607
 buy residential real estate                 606
 buying my own car  

In [346]:
df.groupby('purpose')['days_employed'].mean() , df.groupby('income_type')['days_employed'].mean() , df.groupby('income_type')['days_employed'].median()

(purpose
 building a property                         63903.290328
 building a real estate                      70306.489088
 buy commercial real estate                  64960.376898
 buy real estate                             65521.989990
 buy residential real estate                 58271.463603
 buying a second-hand car                    70567.993128
 buying my own car                           60291.977021
 buying property for renting out             66956.584898
 car                                         75905.470667
 car purchase                                72366.177023
 cars                                        73977.346652
 construction of own property                68832.040237
 education                                   69543.819254
 getting an education                        64834.492622
 getting higher education                    63803.305039
 going to university                         64663.670502
 having a wedding                            68840.652850
 hous

[Vamos a comprobar los valores únicos]

In [347]:
# Unique values
len(df['purpose'].unique()) , len(df['education'].unique()) , len(df['family_status'].unique()) , len(df['gender'].unique()) , len(df['income_type'].unique()) , len(df['age_group'].unique()) 

(38, 5, 5, 2, 8, 3)

In [348]:
# Function to classify data according to common themes
def clasificacion_purpose(row):
    if 'wedding' in row['purpose']:
        return 'wedding'
    
    if 'real estate' in row['purpose']:
        return 'buy real estate'
    if 'buying property' in row['purpose']:
        return 'buy real estate'
    if 'housing' in row['purpose']:
        return 'buy real estate'
    if 'purchase of the house' in row['purpose']:
        return 'buy real estate'
    if 'purchase of my own house' in row['purpose']:
        return 'buy real estate'
    if 'property' in row['purpose']:
        return 'buy real estate'
    
    if 'building' in row['purpose']:
        return 'construction real estate'
    if 'housing renovation' in row['purpose']:
        return 'construction real estate'
    
    if 'university' in row['purpose']:
        return 'education'
    if 'education' in row['purpose']:
        return 'education'
    if 'educated' in row['purpose']:
        return 'education'
    
    if 'car' in row['purpose']:
        return 'car'
    
    return row['purpose']

clasificacion_purpose(df.iloc[4138])

'education'

In [349]:
# Create a column with the categories and count the values in them.
df.loc[:,'clasificacion_for_purpose']=df.apply(clasificacion_purpose, axis=1)
df['clasificacion_for_purpose'].value_counts()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value


buy real estate    10811
car                 4306
education           4013
wedding             2324
Name: clasificacion_for_purpose, dtype: int64

In [350]:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21454 entries, 0 to 21524
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   children                   21454 non-null  int64  
 1   days_employed              21454 non-null  float64
 2   dob_years                  21454 non-null  int64  
 3   education                  21454 non-null  object 
 4   education_id               21454 non-null  int64  
 5   family_status              21454 non-null  object 
 6   family_status_id           21454 non-null  int64  
 7   gender                     21454 non-null  object 
 8   income_type                21454 non-null  object 
 9   debt                       21454 non-null  int64  
 10  total_income               21454 non-null  float64
 11  purpose                    21454 non-null  object 
 12  age_group                  21454 non-null  object 
 13  clasificacion_for_purpose  21454 non-null  obj

[Si decides clasificar los datos numéricos, también tendrás que crear las categorías para ello.]

In [351]:

df['total_income'].value_counts()

23915.5280    1704
19761.4250     246
21864.1315     154
42413.0960       2
17312.7170       2
              ... 
6264.5320        1
27097.0850       1
45484.1090       1
27715.4580       1
41428.9160       1
Name: total_income, Length: 19350, dtype: int64

In [352]:

df['total_income'].describe()

count     21454.000000
mean      26443.876215
std       15687.781359
min        3306.762000
25%       17219.817250
50%       23915.528000
75%       31330.237250
max      362496.645000
Name: total_income, dtype: float64

In [409]:
# Function for sorting into different numerical groups based on ranges
def nivel_de_ingresos(row):
    if row['total_income']<17219.817250:
        return 'bajo nivel de ingresos'
    if row['total_income']<31330.237250:
        return 'mediano nivel de ingreso'
    else:
        return 'alto nivel de ingreso'

nivel_de_ingresos(df.iloc[26])

'mediano nivel de ingreso'

In [410]:

df.loc[:,'nivel_ingreso']=df.apply(nivel_de_ingresos, axis=1)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


In [411]:
# Count the values of each category to see the distribution.
df['nivel_ingreso'].value_counts()

mediano nivel de ingreso    10726
alto nivel de ingreso        5364
bajo nivel de ingresos       5364
Name: nivel_ingreso, dtype: int64

In [412]:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21454 entries, 0 to 21524
Data columns (total 16 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   children                   21454 non-null  int64  
 1   days_employed              21454 non-null  float64
 2   dob_years                  21454 non-null  int64  
 3   education                  21454 non-null  object 
 4   education_id               21454 non-null  int64  
 5   family_status              21454 non-null  object 
 6   family_status_id           21454 non-null  int64  
 7   gender                     21454 non-null  object 
 8   income_type                21454 non-null  object 
 9   debt                       21454 non-null  int64  
 10  total_income               21454 non-null  float64
 11  purpose                    21454 non-null  object 
 12  age_group                  21454 non-null  object 
 13  clasificacion_for_purpose  21454 non-null  obj

[Back to Contents](#back)

## Step 4. Hypotheses testing <a id='hypotheses'></a>


**Is there a correlation between having children and paying on time?**

In [402]:
# Checks data on children and on-time payments
df.groupby('children')['debt'].value_counts()

# Calculate default rate based on number of children
hijo_incumplimiento=df.groupby('children')['debt'].value_counts()
hijos_dist=df.groupby('children')['debt'].count()

conversion_hijos=hijo_incumplimiento*100/hijos_dist

conversion_hijos

children  debt
0         0        92.456178
          1         7.543822
1         0        90.834192
          1         9.165808
2         0        90.507519
          1         9.492481
3         0        91.818182
          1         8.181818
4         0        90.243902
          1         9.756098
5         0       100.000000
Name: debt, dtype: float64

**Intermediate conclusion**

It can be observed that when we calculate the default rate according to the number of children, it can be seen that the number of children does not influence the default on timely payments, because in all cases a payment compliance of over 90% is observed and less than 10% are the people who have defaulted on timely payments regardless of the number of children. It was observed that people who have 5 children have 100% compliance with payments, followed by those who do not have children with 92.4% compliance, with the lowest being those who have 4 children with 90.2% compliance.

[Back to Contents](#back)


**Is there a correlation between family situation and on-time payment?**

In [404]:
# Checks family status data and payments on time

df.groupby('family_status_id')['debt'].value_counts()

# Calculate the default rate based on family status

familiar_incumplimiento=df.groupby('family_status_id')['debt'].value_counts()
familiar_dist=df.groupby('family_status_id')['debt'].count()

conversion_familiar=familiar_incumplimiento*100/familiar_dist

conversion_familiar


family_status_id  debt
0                 0       92.454818
                  1        7.545182
1                 0       90.652855
                  1        9.347145
2                 0       93.430657
                  1        6.569343
3                 0       92.887029
                  1        7.112971
4                 0       90.249110
                  1        9.750890
Name: debt, dtype: float64

**Intermediate conclusion**

It can be observed that when we calculate the default rate as a function of family status, it can be seen that this does not influence the default of timely payments, due to the fact that in all cases a payment compliance of over 90% is observed, therefore, it cannot be determined that family status is a variable that influences the default of loan payments.

[Back to Contents](#back)




**Is there a correlation between the level of income and on-time payment?**

In [413]:
# Checks income level data and payments in time
df.groupby('nivel_ingreso')['debt'].value_counts()

# Calculate the default rate based on income level
nivel_ingreso_incumplimiento=df.groupby('nivel_ingreso')['debt'].value_counts()
nivel_ingreso_dist=df.groupby('nivel_ingreso')['debt'].count()

conversion_nivel_ingreso=nivel_ingreso_incumplimiento*100/nivel_ingreso_dist

conversion_nivel_ingreso


nivel_ingreso             debt
alto nivel de ingreso     0       92.859806
                          1        7.140194
bajo nivel de ingresos    0       92.039523
                          1        7.960477
mediano nivel de ingreso  0       91.320157
                          1        8.679843
Name: debt, dtype: float64

**Intermediate conclusion**

It can be observed that when we calculate the default rate based on income level, it can be seen that this does not influence the default of timely payments, since in all cases payment compliance is above 90%, therefore, it cannot be determined that income level is a variable that influences the default of loan payments.

[Back to Contents](#back)



**How does the purpose of the credit affect the default rate?**

In [414]:
# Default rate percentages for each credit purpose and analyze them.
conversion_hijos , conversion_familiar , conversion_nivel_ingreso


(children  debt
 0         0        92.456178
           1         7.543822
 1         0        90.834192
           1         9.165808
 2         0        90.507519
           1         9.492481
 3         0        91.818182
           1         8.181818
 4         0        90.243902
           1         9.756098
 5         0       100.000000
 Name: debt, dtype: float64,
 family_status_id  debt
 0                 0       92.454818
                   1        7.545182
 1                 0       90.652855
                   1        9.347145
 2                 0       93.430657
                   1        6.569343
 3                 0       92.887029
                   1        7.112971
 4                 0       90.249110
                   1        9.750890
 Name: debt, dtype: float64,
 nivel_ingreso             debt
 alto nivel de ingreso     0       92.859806
                           1        7.140194
 bajo nivel de ingresos    0       92.039523
                           1       

**Intermediate conclusion**

It was observed that neither the number of children, family status, nor income level, are a variable that affects or represents a risk factor in credit payment compliance, since in general of these 3 classifications, it can be observed that all people have a 90% of punctual payments, following a trend, so it is necessary to review if there is any other factor that may affect payment compliance.

[Back to Contents](#back)

# Conclusions <a id='end'></a>

**Is there a connection between having children and paying a loan on time?**

On this occasion it was determined that there is no connection between having children and repaying a loan, since in all cases, from those who had no children to those who had 5, they had a repayment rate of over 90% on time.

**Is there a connection between marital status and timely loan repayment?**

On this occasion, it was determined that there is no connection between marital status and loan repayment, since in all cases they have a timely repayment rate of over 90%, with very similar values that vary up to 3% maximum.

**Is there a connection between income level and timely repayment of a loan?**

On this occasion it was determined that there is no connection between their income level and repaying a loan, since in all cases they have an on-time repayment rate of over 90%, varying by little more than 1% between these rates.

**How do different loan purposes affect timely loan repayment?**

It was observed that the different purposes did not affect the recovery of the money lent to individuals, since in general, considering these 3 purposes, they exceed 90% repayment compliance.

[Back to Contents](#back)