# Analise dos riscos das empresas entrarem em falencia

**Objetivo**: Analisar os riscos de uma empresa entrar em falencia.


* `Industrial Risk (IR)`: Risco Industrial 
    - `0`: Positive
    - `1`: Average
    - `-1`: negative
* `Management Risk (MR)`: Risco Industrial 
    - `0`: Positive
    - `1`: Average
    - `-1`: negative
* `Class`: Falencia/nao Falencia
    - `0`: Bankruptcy
    - `1`: Non-Bankruptcy
    

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
dados = pd.read_csv('data/Qualitative_Bankruptcy.csv') #chamando e abrindo o csv

In [3]:
#fazendo o tratamento dos dados
feature_names = ['IR','MR','Class']
dados['IR'] = dados['IR'].map({'P': 0, 'A': 1, 'N':-1})
dados['MR'] = dados['MR'].map({'P': 0, 'A': 1, 'N':-1})
dados['Class'] = dados['Class'].map({'B': 0, 'NB': 1})

In [4]:
dados.dropna(subset=feature_names, inplace=True)

X = dados[feature_names].to_numpy()
y = dados['Class'].to_numpy()

In [5]:
print('Nomes dos Atributos: ', feature_names, '\n')
print('Tamanho de X: ', X.shape, '\n')
print('Tamanho de y: ', y.shape, '\n')

Nomes dos Atributos:  ['IR', 'MR', 'Class'] 

Tamanho de X:  (250, 3) 

Tamanho de y:  (250,) 



### Quebrando dataset em `train` e `test`


In [6]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=123)

In [7]:
print('Tamanho de X_train: ', X_train.shape, '\n')
print('Tamanho de X_test: ', X_test.shape, '\n')
print('Tamanho de y_train: ', y_train.shape, '\n')
print('Tamanho de y_test: ', y_test.shape, '\n')

Tamanho de X_train:  (187, 3) 

Tamanho de X_test:  (63, 3) 

Tamanho de y_train:  (187,) 

Tamanho de y_test:  (63,) 



# Modelo e Biblioteca Utilizados#

No nosso projeto, utilizamos o modelo de Regressão Logistica, da biblioteca Sckit-Learn, pois este modelo apresentou a melhor acuracia nos testes.

In [8]:
from sklearn.linear_model import SGDClassifier

clf = SGDClassifier(loss='log', learning_rate='constant', max_iter=10,
                   eta0=0.0001, verbose=1, tol=None, random_state=123)

In [9]:
print(clf)

SGDClassifier(alpha=0.0001, average=False, class_weight=None,
              early_stopping=False, epsilon=0.1, eta0=0.0001,
              fit_intercept=True, l1_ratio=0.15, learning_rate='constant',
              loss='log', max_iter=10, n_iter_no_change=5, n_jobs=None,
              penalty='l2', power_t=0.5, random_state=123, shuffle=True,
              tol=None, validation_fraction=0.1, verbose=1, warm_start=False)


In [10]:
clf.fit(X_train, y_train)

-- Epoch 1
Norm: 0.01, NNZs: 3, Bias: 0.001739, T: 187, Avg. loss: 0.692172
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 0.01, NNZs: 3, Bias: 0.003456, T: 374, Avg. loss: 0.690160
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 0.02, NNZs: 3, Bias: 0.005154, T: 561, Avg. loss: 0.688164
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 0.02, NNZs: 3, Bias: 0.006829, T: 748, Avg. loss: 0.686186
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 0.03, NNZs: 3, Bias: 0.008484, T: 935, Avg. loss: 0.684225
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 0.04, NNZs: 3, Bias: 0.010118, T: 1122, Avg. loss: 0.682281
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 0.04, NNZs: 3, Bias: 0.011729, T: 1309, Avg. loss: 0.680353
Total training time: 0.01 seconds.
-- Epoch 8
Norm: 0.05, NNZs: 3, Bias: 0.013320, T: 1496, Avg. loss: 0.678442
Total training time: 0.01 seconds.
-- Epoch 9
Norm: 0.05, NNZs: 3, Bias: 0.014891, T: 1683, Avg. loss: 0.676548
Total training time: 0.01 second

SGDClassifier(alpha=0.0001, average=False, class_weight=None,
              early_stopping=False, epsilon=0.1, eta0=0.0001,
              fit_intercept=True, l1_ratio=0.15, learning_rate='constant',
              loss='log', max_iter=10, n_iter_no_change=5, n_jobs=None,
              penalty='l2', power_t=0.5, random_state=123, shuffle=True,
              tol=None, validation_fraction=0.1, verbose=1, warm_start=False)

### Acurácia do Modelo
Usar a função do Scikit-Learn [`sklearn.metrics.accuracy_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)

Retorna um score de acurácia `float` entre $0$ e $1$

#### Argumentos
* `y_true`: Classes Verdadeiras
    * 2 classes: vetor (1-D)
    * Mais que 2 classes: matriz (2-D)
* `y_pred`: Classes Previstas pelo Modelo
    * 2 classes: vetor (1-D)
    * Mais que 2 classes: matriz (2-D)

In [11]:
# Coeficientes do modelo
for feature, coef in zip(feature_names, clf.coef_[0].tolist()):
    print(f"{feature}: {round(coef,3)}")

# Constante do modelo
print(f"Constante: {clf.intercept_}")

IR: 0.011
MR: 0.017
Class: 0.055
Constante: [0.01644234]


In [12]:
from sklearn.metrics import accuracy_score

y_train_true = y_train
y_train_pred = clf.predict(X_train)
y_test_true = y_test
y_test_pred = clf.predict(X_test)


print(f"Acurácia de Treino: {round(accuracy_score(y_train_true, y_train_pred), 2)}")
print('\n ---------------------------\n')
print(f"Acurácia de Teste: {round(accuracy_score(y_test_true, y_test_pred), 2)}")

Acurácia de Treino: 0.82

 ---------------------------

Acurácia de Teste: 0.78
