# Python IA: Artificial Intelligence and Predictions

### Case: Customer Credit Score

You have been hired by a bank to determine the credit scores of its customers. Your task is to analyze all the bank's customers and, based on this analysis, create a model capable of reading customer information and automatically assigning a credit score: Poor, Standard, Good.

### Step-by-Step
- Step 0: Understand the company and its challenge
- Step 1: Import the database
- Step 2: Prepare the database for AI
- Step 3: Train the AI and create the model to define credit scores
- Step 4: Choose the best AI model
- Step 5: Use the best model to predict credit scores

In [None]:
# Installing AI library
# !pip install scikit-learn

In [1]:
# Step 1: Import the database
import pandas as pd

url='https://drive.google.com/uc?id=1m_cqkKLUURaF65yswspFZMMN7gNKxbh-'
df_customers = pd.read_csv(url)
# display(df_customers.info())

In [2]:
# Step 2: Prepare the database for AI

# Using LabelEncoder to make text columns into numbers, changing the following columns:
# profissao
# mix_credito
# comportamento_pagamento

# Importing just LabelEncoder
from sklearn.preprocessing import LabelEncoder

# Using LabelEncoder to transform text into number
encoder_profession = LabelEncoder()
df_customers["profissao"] = encoder_profession.fit_transform(df_customers["profissao"])

encoder_credit = LabelEncoder()
df_customers["mix_credito"] = encoder_credit.fit_transform(df_customers["mix_credito"])

encoder_payment = LabelEncoder()
df_customers["comportamento_pagamento"] = encoder_payment.fit_transform(df_customers["comportamento_pagamento"])
# display(df_customers)

In [None]:
# Step 3: Train the AI and create the model to define credit scores
# Column Y: the one we want to predict
y = df_customers["score_credito"]
# Columns X: the ones AI will use to make the prediction
x = df_customers.drop(columns=["score_credito","id_cliente"])

# Spliting the data into training and testing
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y)

# Steps to work with AI
# Choosing to work with learning models called "Decision Tree", and "Nearest Neighbors"
# Importing the AI
from sklearn.ensemble import RandomForestClassifier # Decision Tree model
from sklearn.neighbors import KNeighborsClassifier # Nearest Neighbors model, "KNN"

# Creating the AI
model_decisiontree = RandomForestClassifier()
model_knn = KNeighborsClassifier()

# Training the AI
model_decisiontree.fit(x_train, y_train)
model_knn.fit(x_train, y_train)

In [5]:
# Step 4: Choose the best AI model
# Comparing the prediction of the two models, based on the testing data
prediction_decisiontree = model_decisiontree.predict(x_test)
prediction_knn = model_knn.predict(x_test)

# Calculating the prediction accuracy
from sklearn.metrics import accuracy_score

display("Decision Tree Precision: " + str(format(accuracy_score(y_test, prediction_decisiontree), ".0%")))
display("Nearest Neighbors Precision: " + str(format(accuracy_score(y_test, prediction_knn), ".0%")))

'Decision Tree Precision: 83%'

'Nearest Neighbors Precision: 74%'

**Based on the prediction numbers above, I'll use the Decision Tree model, which gave us an accuracy of 83%**

In [6]:
# Step 5: Use the best model to predict credit scores
# Importing new customers to run through the AI 
url='https://drive.google.com/uc?id=10Q0eut7cku_5s06uMJJNmgKefdR_DX1i'
df_new_customers = pd.read_csv(url)
display(df_new_customers)

# Once again, transforming the text columns into numbers with the LabelEncoder, like the AI learned
# Since we already created the encoders, now we use "transform"
df_new_customers["profissao"] = encoder_profession.transform(df_new_customers["profissao"])
df_new_customers["mix_credito"] = encoder_credit.transform(df_new_customers["mix_credito"])
df_new_customers["comportamento_pagamento"] = encoder_payment.transform(df_new_customers["comportamento_pagamento"])

new_prediction = model_decisiontree.predict(df_new_customers)
display(new_prediction)

Unnamed: 0,mes,idade,profissao,salario_anual,num_contas,num_cartoes,juros_emprestimo,num_emprestimos,dias_atraso,num_pagamentos_atrasados,...,taxa_uso_credito,idade_historico_credito,investimento_mensal,comportamento_pagamento,saldo_final_mes,emprestimo_carro,emprestimo_casa,emprestimo_pessoal,emprestimo_credito,emprestimo_estudantil
0,1,31.0,empresario,19300.34,6.0,7.0,17.0,5.0,52.0,19.0,...,29.934186,218.0,44.50951,baixo_gasto_pagamento_baixo,312.487689,1,1,0,0,0
1,4,32.0,advogado,12600.445,5.0,5.0,10.0,3.0,25.0,18.0,...,28.819407,12.0,0.0,baixo_gasto_pagamento_medio,300.994163,0,0,0,0,1
2,2,48.0,empresario,20787.69,8.0,6.0,14.0,7.0,24.0,14.0,...,34.235853,215.0,0.0,baixo_gasto_pagamento_alto,345.081577,0,1,0,1,0


array(['Poor', 'Poor', 'Standard'], dtype=object)

In [18]:
# Which characteristics were more important to predict the credit score?
columns = list(x_test.columns)
importance = pd.DataFrame(index=columns, data=model_decisiontree.feature_importances_)
importance.columns = ['Importance']
importance = importance.nlargest(5, 'Importance')
importance = importance * 100
print(importance)

                         Importance
divida_total              11.569916
juros_emprestimo           8.195898
mix_credito                8.029487
idade_historico_credito    7.478740
dias_atraso                6.789873
