# Loss Given Default Analysis [TPS August]
![](images/unsplash.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://unsplash.com/@constantinevdokimov?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Konstantin Evdokimov</a>
        on 
        <a href='https://unsplash.com/s/photos/loan?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText'>Unsplash.</a> All images are by author unless specified otherwise.
    </strong>
</figcaption>

# 1. Problem definition

In this month's TPS competition, we are tasked to predict the amount of money a bank or a financial institution might lose if a loan goes into default.

Before we start the EDA, let's make sure we are all on the same page on some of the key terms of the problem definition:
1. What is loan default?
   - Default is a failure to repay a debt/loan on time. It can occur when a borrower fails to make timely payments on loans such as mortgage, bank loans, car leases, etc.
2. What is a loss given default (LGD)?
   - LGD is the amount of money a bank or financial institution might lose if a loan goes into default. Calculating and predicting LGD can be complex and involve many factors. 

As you will see in just a bit, the dataset for the competition has over 100 features and the target `loss` is (I think) LGD. For more information on these terms, check out [this](https://www.kaggle.com/c/tabular-playground-series-aug-2021/discussion/256337) discussion thread.

The metric used in this competition is Root Mean Squared Error, a regression metric:
![](images/metric.png)

# 2. Setup

In [4]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from matplotlib import rcParams

# Global plot configs
rcParams["figure.dpi"] = 200
rcParams["axes.spines.top"] = False
rcParams["axes.spines.right"] = False

# Pandas global settings
pd.set_option("display.max_columns", None)
pd.set_option("precision", 4)

# Import data
train_df = pd.read_csv("data/train.csv", index_col="id")
test_df = pd.read_csv("data/test.csv", index_col="id")
sub = pd.read_csv("data/sample_submission.csv")

# 3. Overview of the datasets

Both training and test sets have 100 features, excluding the ID column. The target is given as `loss` and has a discrete distribution. 

Some other observations:
- Training and test data contain **250k and 150k** observations, respectively
- There are **no missing values** in both sets
- All features either have `float64` or `int64` type

Here are the first few rows of train and test datasets:

In [18]:
train_df.head()

Unnamed: 0_level_0,f0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99,loss
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1
0,-0.0023,59,0.7667,-1.3505,42.2727,16.6857,30.3599,1.2673,0.392,1.091,1.9687,1.8746,117.286,6.7162,0.9857,0.9734,4004232,0.3779,1.0338,0.5745,0.1224,8.1862,1517.83,3.1322,0.1128,5.0187,116.766,10891,8.1937,5.7972,1.1,14.8684,-0.2754,0.9157,167.8,-7.5341,4.2363,1.6296,1.1444,-0.3031,4.0921,3.2262,0.0748,0.2595,289.492,327.465,5.3893,7.3948,-0.4899,20.2923,2.4566,1.4477,-10639.0,85.6005,-0.1785,2815,-234.772,1.8332,88.5605,0.3679,8575300000.0,70.9733,0.3801,0.0318,1.0953,0.5635,0.1227,1.1607,1.6939,1.0722,65.1543,0.0225,-5.6068,1.7987,0.5281,6696.3,-0.5621,1.301,6.7162,1.1435,2.2998,0.0105,-0.1272,0.2311,4.5161,0.5945,397,0.264,8.6879,15.0701,0.3766,-42.4399,26.854,1.4575,0.6962,0.9418,1.8285,0.9241,2.2966,10.4898,15
1,0.7845,145,-0.4638,-0.5304,27324.9,3.4754,160.498,0.828,3.7359,1.2814,-2.7395,-0.5295,157.67,0.6964,1.4412,0.1591,23567462,-0.0896,-0.7116,-1.0459,0.1399,3.6929,-123.354,7.741,-0.8523,8.5102,161.175,87801,12.0202,1.7839,1.231,10.1497,-0.0187,1.0113,127.401,11.8214,5.9968,-0.9538,1.3794,1.0795,0.7119,2.906,0.0512,0.7551,140.893,29.5252,14.2296,3.5321,-0.4057,42.5357,1.4353,0.9398,138312.0,59.881,-0.071,1435,1046.88,1.5677,29.4306,2.4552,4518200000.0,75.5602,1.988,0.3182,1.149,0.7236,0.1848,-0.3483,-7.1763,1.4626,43.1121,-0.0608,64.0455,2.3584,5.7597,3958.14,1.5766,-1.2418,5.9141,0.9598,2.5663,0.0007,-0.3356,-0.2717,5.1032,8.7062,98,0.2105,7.8642,3.3719,0.148,-184.132,7.9014,1.7064,-0.4947,-2.0583,0.8192,0.4392,2.3647,1.1438,3
2,0.3178,19,-0.4326,-0.3826,1383.26,19.7129,31.1026,-0.5154,34.4308,1.2421,2.9018,-0.9603,118.59,7.6964,1.4888,0.3873,235760,0.0556,0.2686,0.7181,0.0271,11.6734,270.247,3.4405,-0.6791,13.3781,150.362,14173,1.6995,7.0473,1.1051,7.6287,0.6857,0.8095,120.064,194.427,6.7787,0.6341,0.9344,0.927,0.7411,2.4221,0.2604,0.6266,369.579,370.024,4.4889,7.8429,1.196,43.5343,2.5918,1.2567,168881.0,83.8419,2.7132,2911,23256.9,3.919,97.5578,4.3854,844500000.0,99.4933,5.098,0.2589,1.1688,0.0491,0.1735,0.937,2.0595,1.2225,50.2267,-0.0262,71.6849,2.4342,1.9046,27165.8,-0.7732,-1.8334,4.9855,1.1709,1.172,0.0168,-0.2356,-0.7249,3.2256,4.171,105,-0.1555,8.9183,0.1863,0.336,7.4372,37.2181,3.2534,0.3379,0.615,2.2168,0.7453,1.6968,12.3055,6
3,0.2108,17,-0.6165,0.9464,-119.253,4.0823,185.257,1.3833,-47.5214,1.0913,-1.512,-1.2923,125.461,7.3432,-3.0924,0.7138,1146032,0.3265,0.4548,0.22,0.038,5.152,4893.86,6.8975,-0.8306,4.4318,132.855,77147,32.809,4.0639,1.1191,3.9178,0.5004,0.2064,120.411,233.537,7.7513,0.6258,-1.9408,1.3332,-5.7932,1.5651,0.3095,0.1346,84.8637,24.3353,4.8712,2.551,-0.3723,14.896,1.1001,0.8922,17006.6,78.4078,0.4257,1723,375.24,1.9454,98.15,-0.1092,3544000000.0,60.8082,2.3576,0.2391,1.1612,0.5358,0.2222,1.1631,2.3634,1.4153,116.182,0.0187,55.4428,2.2282,4.3036,2643.76,-1.6663,0.7924,6.4516,1.0773,2.9068,0.0237,-0.092,-0.0987,4.2782,5.3475,512,0.856,8.2766,4.0667,0.3365,9.6678,0.6269,1.4943,0.5175,-10.2221,2.6273,0.6173,1.4565,10.0288,2
4,0.4397,20,0.9681,-0.0925,74.302,12.3065,72.186,-0.234,24.3991,1.1015,1.7735,-0.5468,147.186,17.3943,0.9647,0.9649,19272478,0.1211,0.4225,-0.1031,-0.0009,8.1419,162.713,1.5656,-0.3007,7.5646,160.995,5780,-1.5425,8.0908,1.6058,7.0463,-1.0569,1.7274,126.848,0.9108,6.0887,0.1506,1.1346,1.5535,16.4364,2.4887,0.2292,0.3748,465.293,76.8593,0.7631,8.2066,-0.6461,72.7674,1.4448,0.8585,-40791.9,70.7998,0.1692,1199,-120.388,1.6879,84.0649,-0.0812,6379400000.0,103.99,4.3963,0.2485,1.1808,0.5465,0.1407,0.8044,7.5519,-2.5099,68.2459,0.0012,47.2885,0.4619,1.0724,703.401,0.6911,5.0141,6.0639,1.1202,1.7335,-0.0013,-0.3339,-0.0638,1.1142,5.234,109,-0.1583,5.4306,0.9916,0.5285,290.657,15.6043,1.7356,-0.4767,1.3902,2.1957,0.827,1.7849,7.072,1


In [19]:
test_df.head()

Unnamed: 0_level_0,f0,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1
250000,0.8127,15,-1.2391,-0.8933,295.577,15.8712,23.0436,0.9423,29.898,1.1139,-1.3122,-1.11,128.357,7.3445,-1.0306,2.304,13018510,0.0196,0.7577,-0.7161,0.0223,0.142,8938.11,6.4437,1.1337,6.5271,148.03,80178,3.4344,5.5352,1.2795,8.2354,-1.3065,1.1107,153.408,50.8583,8.3492,-0.7176,1.9347,0.9745,-1.2989,2.3089,0.0753,0.4911,45.3193,21.0993,11.9258,4.0118,-1.0147,93.4686,2.4723,1.6232,28220.2,79.6277,-0.4354,1651,30.455,1.6514,90.1818,0.5058,2726400000.0,100.62,5.1824,0.2645,1.1903,0.566,0.3546,1.3124,14.3741,0.8395,119.689,-0.0501,62.5088,1.5597,2.7763,-50.0871,0.2473,0.0353,6.3437,1.1766,2.1044,0.0105,-0.394,-0.2433,3.7702,4.6354,52,-0.1446,3.8188,-0.3476,0.4464,-422.332,-1.4463,1.6907,1.0593,-3.0106,1.9466,0.5295,1.3869,8.7877
250001,0.1903,131,-0.5014,0.8019,64.8866,3.097,344.805,0.8072,38.4219,1.097,-5.6365,1.2871,128.25,6.6909,1.8363,0.6161,1618530,0.0673,-0.8082,-0.3201,0.9987,15.1139,1432.86,1.5751,-1.0789,2.8047,158.629,78013,0.8695,15.6028,1.2423,5.1889,-1.3847,0.8939,168.0,58.9003,6.4483,1.1952,1.9432,0.9666,-8.5975,2.6715,0.0047,0.4883,151.61,560.819,6.2055,5.8434,-0.8574,52.0208,1.635,0.618,65940.8,82.5252,0.8063,-17,3257.99,3.5431,64.7339,0.3876,6191200000.0,112.323,1.6689,0.2929,1.2335,0.5586,0.2684,0.9802,9.9798,1.8214,-1.3255,0.0475,20.8764,1.8762,3.7224,268.494,-0.4776,1.6939,6.8224,1.2194,1.202,-0.002,-0.383,0.1241,2.1182,5.184,70,0.2562,18.0312,23.5723,0.3772,10352.2,21.0627,1.8435,0.2519,4.4406,1.9031,0.2485,0.8639,11.7939
250002,0.9197,19,-0.0574,0.9014,11961.2,16.3965,273.24,-0.0033,37.94,1.1522,-3.4033,0.1426,152.386,15.0852,1.5549,1.6547,17563785,-0.1669,-1.2627,1.4112,0.004,8.7802,4043.78,7.0511,-1.0129,10.0445,121.983,88912,-0.487,8.9797,1.1984,8.5821,-0.2936,1.8836,120.411,110.36,5.1818,1.3658,-1.2579,0.9289,5.2845,2.1028,0.0918,0.5205,82.0779,74.4712,6.0981,2.6587,-0.3393,110.073,2.4382,0.0134,34353.4,93.1586,3.0899,825,464.775,1.6293,88.2927,0.0995,3349100000.0,34.8437,5.3338,0.3518,1.3872,0.3924,0.1486,1.0765,5.3435,-2.0625,144.71,-0.17,39.3509,2.3231,3.0126,23.2484,0.5114,1.3254,7.3319,1.1737,1.3936,0.0011,-0.1999,0.0274,4.3142,3.2428,501,-0.1414,4.7836,0.0684,0.9901,3224.02,-2.2529,1.551,-0.5592,17.8386,1.8338,0.9318,2.3369,9.054
250003,0.861,19,-0.5495,0.4718,7501.6,2.807,71.0817,0.7921,0.3952,1.2016,0.8709,1.2713,157.147,5.8488,1.3856,0.6173,252991,0.1085,0.4793,-0.7198,0.3198,25.2003,-170.328,7.5264,0.7399,0.7369,129.194,15850,11.3217,4.5921,1.3878,4.9169,-0.0106,5.1756,118.858,13.9613,2.3807,0.8835,1.2839,1.4273,3.1218,1.7441,0.2698,0.6378,133.78,57.7544,3.8178,9.2302,-1.2981,165.785,1.3772,1.2959,-10886.4,94.3216,-0.2072,160,447.421,1.4996,89.3567,0.8972,2364100000.0,50.3343,2.2809,0.1202,1.1997,0.6663,0.3741,1.2954,6.9092,-0.8518,0.4072,-0.0574,41.2673,3.0599,2.8497,292.196,-1.121,-0.0702,12.1671,1.167,1.7251,0.0006,-0.6167,-0.3758,6.054,4.4342,43,-0.2216,8.3863,0.6028,1.3969,9689.76,14.7715,1.4139,0.3293,0.8024,2.2325,0.8933,1.3595,4.8483
250004,0.3132,89,0.5885,0.1677,2931.26,4.3499,1.5719,1.1183,7.7546,1.1681,0.6714,1.6398,128.023,17.3336,1.9716,0.2521,481327,0.1021,-0.4426,-0.6361,0.0231,13.507,10276.2,3.3603,-1.4257,5.3158,139.739,10964,3.0741,5.4445,1.1415,7.2739,0.2621,1.2417,118.918,778.04,7.3407,0.6541,1.2779,0.9838,-4.4679,-0.7915,0.1249,0.1113,174.786,50.4495,18.3246,7.8915,0.9964,164.209,1.363,1.3113,-6608.04,97.9588,-0.1819,709,28748.4,1.6271,102.277,-0.2482,7542700000.0,121.86,4.1176,0.2034,1.2334,0.4639,0.2539,0.6034,-2.9365,1.2016,77.5471,0.042,58.9586,3.6113,1.8594,22610.8,1.1638,-1.9697,6.8726,1.1514,1.6628,0.0128,-0.2015,-0.0871,3.4478,5.182,94,1.4893,7.4989,1.2121,0.8625,2693.35,44.1805,1.5802,-0.191,26.253,2.6824,0.3619,1.5328,3.7066
