# Project 01 - Granting credit cards

## Stage 1 CRISP - DM: Understanding the business

As the first stage of CRISP-DM, let's understand what the business is about and what the objectives are. 

This is a credit card concession problem, published on [Kaggle](https://www.kaggle.com/), a platform that promotes data science challenges, offering cash prizes to the best finishers. The original link is [here](https://www.kaggle.com/rikdifos/credit-card-approval-prediction).  
  
Our goal is to build a predictive model to identify the risk of default (typically defined by the occurrence of an arrears greater than or equal to 90 over a 12-month horizon) through variables that can be observed on the date of the credit assessment (typically when the customer applies for the card).

CRISP-DM's activities:

- Business objectives: Note that the goal here is for the model to serve the borrower (the client) so that they can evaluate their own decisions, and not the credit institution.
- Modeling objectives: The objective is well defined: to develop the best predictive model to help borrowers make their own credit decisions.
  
At this stage, the situation of the company/segment/subject is also assessed in order to understand the size of the audience, relevance, problems present and all the details of the process generating the phenomenon in question, and therefore the data.

It is also at this stage that a project plan is drawn up.

## Step 2 Crisp-DM: Understanding the data
The second step is to understand the data. We were given 15 variables plus the response variable (in bold in the table). The meaning of each of these variables can be found in the table.

#### Data dictionary

The data is arranged in a table with a row for each customer, and a column for each variable storing the characteristics of these customers. We have placed a copy of the data dictionary (explanation of these variables) below in this notebook:

| Variable Name            | Description                                         | Tipo  |
| ------------------------ |:---------------------------------------------------:| -----:|
| sexo| M = 'Masculino'; F = 'Feminino' |M/F|
| posse_de_veiculo| Y = 'possui'; N = 'não possui' |Y/N|
| posse_de_imovel| Y = 'possui'; N = 'não possui' |Y/N|
| qtd_filhos| Quantidade de filhos |inteiro|
| tipo_renda|Tipo de renda (ex: assaliariado, autônomo etc) | texto |
| educacao| Nível de educação (ex: secundário, superior etc) |texto|
| estado_civil | Estado civil (ex: solteiro, casado etc)| texto |
| tipo_residencia | tipo de residência (ex: casa/apartamento, com os pais etc) | texto |
| idade | idade em anos |inteiro|
| tempo de emprego | tempo de emprego em anos |inteiro|
| possui_celular | Indica se possui celular (1 = sim, 0 = não) |binária|
| possui_fone_comercial | Indica se possui telefone comercial (1 = sim, 0 = não) |binária|
| possui_fone | Indica se possui telefone (1 = sim, 0 = não) |binária|
| possui_email | Indica se possui e-mail (1 = sim, 0 = não) |binária|
| qt_pessoas_residencia | quantidade de pessoas na residência |inteiro|
| **mau** | indicadora de mau pagador (True = mau, False = bom) |binária|





#### Loading the packages
It is considered good practice to load the packages that will be used as the first thing in the program.

In [3]:
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier

#### Loading data
The pd.read_csv command is a command from the pandas library (pd.) and loads the data from the indicated csv file into a pandas *dataframe* object.

In [5]:
df = pd.read_csv('demo01.csv')
print("Number of rows and columns in the table: {}".format(df.shape))

df.head()

Number of rows and columns in the table: (16650, 16)


Unnamed: 0,sexo,posse_de_veiculo,posse_de_imovel,qtd_filhos,tipo_renda,educacao,estado_civil,tipo_residencia,idade,tempo_emprego,possui_celular,possui_fone_comercial,possui_fone,possui_email,qt_pessoas_residencia,mau
0,M,Y,Y,0,Working,Secondary / secondary special,Married,House / apartment,58.832877,3.106849,1,0,0,0,2.0,False
1,F,N,Y,0,Commercial associate,Secondary / secondary special,Single / not married,House / apartment,52.356164,8.358904,1,0,1,1,1.0,False
2,F,N,Y,0,Commercial associate,Secondary / secondary special,Single / not married,House / apartment,52.356164,8.358904,1,0,1,1,1.0,False
3,M,Y,Y,0,Working,Higher education,Married,House / apartment,46.224658,2.106849,1,1,1,1,2.0,False
4,F,Y,N,0,Working,Incomplete higher,Married,House / apartment,29.230137,3.021918,1,0,0,0,2.0,False
