<img alt="Colaboratory logo" width="15%" src="https://raw.githubusercontent.com/carlosfab/escola-data-science/master/img/novo_logo_bg_claro.png">

#### **Data Science na Prática 3.0**
*by [sigmoidal.ai](https://sigmoidal.ai)*

---

# Credit Risk Assessment

### Credit Risk

**Credit Risk** can be defined as the probability or chance that someone or their counterparty will fail to honor their previous agreement, resulting in financial loss to banking institutions when the client *defaults* on the aforementioned agreement<sup><a href="https://www.risk-officer.com/Credit_Risk.htm">1</a>, </sup><sup><a href="https://www.investopedia.com/terms/c/creditrisk.asp">2</a></sup>. Usually, this occurs because of the inability of clients to repay their loans to said institutions.

<p align=center>
<img src="img/credit_risk.jpg" width="40%"><br>
<i><sup>Image credits: storyset @ <a href="https://www.freepik.com/author/stories">freepik</a>.</sup></i>
</p>

Although it may be impossible to predict exactly which clients will incur money loss for the company, ***Credit Risk Management*** refers exactly to the evaluation of this probability. That is, trying to identify which clients will default on their agreements. This is especially important for **Credit Risk Management** strategies, as it will allow the companies to mitigate money loss, for example, by increasing the interest rates for clients who represent higher risks or by even denying loans<sup><a href="https://www.risk-officer.com/Credit_Risk.htm">1</a>, </sup><sup><a href="https://www.investopedia.com/terms/c/creditrisk.asp">2</a></sup>.

One of the strategies used by lenders to evaluate risk are the *5 Cs of Credit*. Although companies have different ways of measuring these, they offer some insights about the risk of financial loss. The 5 Cs are: **Character**, as in the client's credit history; **Capital**, as in the amount of money they have; **Capacity**, or the debt-to-income ratio; **Collateral**, assets that can back or act as security for the loan; and **Conditions**, as in that is the purpose, amount and rates of the loan<sup><a href="https://www.investopedia.com/terms/f/five-c-credit.asp">3</a></sup>.

However, these are only a few characteristics that can be observed. The companies usually have a lot more information about their clients. Using Machine Learning methods, we can leverage this information, and with it try and predict if they will default or not.

## Goal

Goal
Neste problema, o objetivo é prever qual a probabilidade de um cliente da Startup Nubank não coumprir com suas obrigações financeiras e deixar de pagar a sua fatura do Cartão de Crédito.

<p align=center>
<img src="http://sigmoidal.ai/wp-content/uploads/2019/10/Nubank_logo.png" width="90px"></p>
  
Vale ressaltar que essa avaliação deve ser realizada no momento em que o cliente solicita o cartão (normalmente no primeiro contato com a instituição).

minimize false positives while preventing money loss

## Initial hypotheses 

*

*

*

## About the dataset

about Dataset
O conjunto de dados a ser utilizado neste Projeto de *Data Science* parte de uma competição realizada pela Startup [Nubank](https://nubank.com.br/sobre-nos) a fim de revelar talentos e potenciais contratações pela Fintech.


* ids
* target_default
* score_1
* score_2
* score_3
* score_4
* score_5
* score_6
* risk_rate
* last_amount_borrowed
* last_borrowed_in_months
* credit_limit
* reason
* income
* facebook_profile
* state
* zip
* channel
* job_name
* real_state
* ok_since
* n_bankruptcies
* n_defaulted_loans
* n_accounts
* n_issues
* application_time_applied
* application_time_in_funnel
* email
* external_data_provider_credit_checks_last_2_year
* external_data_provider_credit_checks_last_month
* external_data_provider_credit_checks_last_year
* external_data_provider_email_seen_before
* external_data_provider_first_name
* external_data_provider_fraud_score
* lat_lon
* marketing_channel
* profile_phone_number
* reported_income
* shipping_state
* shipping_zip_code
* profile_tags
* user_agent
* target_fraud

## Importing data

bla bla bla

In [1]:
# Dependencies
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Reading dataframe
df = pd.read_csv("data/acquisition_train.csv")

In [5]:
# Checking size and first entries
print(df.shape)
df.head(6)

(45000, 43)


Unnamed: 0,ids,target_default,score_1,score_2,score_3,score_4,score_5,score_6,risk_rate,last_amount_borrowed,...,external_data_provider_fraud_score,lat_lon,marketing_channel,profile_phone_number,reported_income,shipping_state,shipping_zip_code,profile_tags,user_agent,target_fraud
0,343b7e7b-2cf8-e508-b8fd-0a0285af30aa,False,1Rk8w4Ucd5yR3KcqZzLdow==,IOVu8au3ISbo6+zmfnYwMg==,350.0,101.800832,0.259555,108.427273,0.4,25033.92,...,645,"(-29.151545708122246, -51.1386461804385)",Invite-email,514-9840782,57849.0,BR-MT,17528,"{'tags': ['n19', 'n8']}",Mozilla/5.0 (Linux; Android 6.0.1; SGP771 Buil...,
1,bc2c7502-bbad-0f8c-39c3-94e881967124,False,DGCQep2AE5QRkNCshIAlFQ==,SaamrHMo23l/3TwXOWgVzw==,370.0,97.062615,0.942655,92.002546,0.24,,...,243,"(-19.687710705798963, -47.94151536525154)",Radio-commercial,251-3659293,4902.0,BR-RS,40933,"{'tags': ['n6', 'n7', 'nim']}",Mozilla/5.0 (Linux; Android 5.0.2; SAMSUNG SM-...,
2,669630dd-2e6a-0396-84bf-455e5009c922,True,DGCQep2AE5QRkNCshIAlFQ==,Fv28Bz0YRTVAT5kl1bAV6g==,360.0,100.027073,0.351918,112.892453,0.29,7207.92,...,65,"(-28.748023890412284, -51.867279334353995)",Waiting-list,230-6097993,163679.0,BR-RR,50985,"{'tags': ['n0', 'n17', 'nim', 'da']}",Mozilla/5.0 (Linux; Android 6.0.1; SGP771 Buil...,
3,d235609e-b6cb-0ccc-a329-d4f12e7ebdc1,False,1Rk8w4Ucd5yR3KcqZzLdow==,dCm9hFKfdRm7ej3jW+gyxw==,510.0,101.599485,0.987673,94.902491,0.32,,...,815,"(-17.520650158450454, -39.75801139933186)",Waiting-list,261-3543751,1086.0,BR-RN,37825,{'tags': ['n4']},Mozilla/5.0 (Linux; Android 6.0; HTC One X10 B...,
4,9e0eb880-e8f4-3faa-67d8-f5cdd2b3932b,False,8k8UDR4Yx0qasAjkGrUZLw==,+CxEO4w7jv3QPI/BQbyqAA==,500.0,98.474289,0.532539,118.126207,0.18,,...,320,"(-16.574259446978008, -39.90990074785962)",Invite-email,102-3660162,198618.0,BR-MT,52827,"{'tags': ['pro+aty', 'n19', 'da', 'b19']}",Mozilla/5.0 (Linux; Android 7.0; Pixel C Build...,
5,538c1908-bd80-b834-c3f0-238b4f536d3f,False,8k8UDR4Yx0qasAjkGrUZLw==,+CxEO4w7jv3QPI/BQbyqAA==,300.0,101.83704,0.915389,90.711273,0.44,,...,811,"(-6.762413011455668, -35.13224579733013)",Website,787-1678197,160198.0,BR-SP,55266,"{'tags': ['c1', 'n3', 'n9']}",Mozilla/5.0 (Linux; Android 6.0.1; Nexus 6P Bu...,


In [6]:
for column in df.columns:
    print(column)

ids
target_default
score_1
score_2
score_3
score_4
score_5
score_6
risk_rate
last_amount_borrowed
last_borrowed_in_months
credit_limit
reason
income
facebook_profile
state
zip
channel
job_name
real_state
ok_since
n_bankruptcies
n_defaulted_loans
n_accounts
n_issues
application_time_applied
application_time_in_funnel
email
external_data_provider_credit_checks_last_2_year
external_data_provider_credit_checks_last_month
external_data_provider_credit_checks_last_year
external_data_provider_email_seen_before
external_data_provider_first_name
external_data_provider_fraud_score
lat_lon
marketing_channel
profile_phone_number
reported_income
shipping_state
shipping_zip_code
profile_tags
user_agent
target_fraud


# References
1. https://www.risk-officer.com/Credit_Risk.htm
2. https://www.investopedia.com/terms/c/creditrisk.asp
3. https://www.investopedia.com/terms/f/five-c-credit.asp