---
# **Exercícios - Classificação Baseada em Exemplos**
---

**Author**
> Vitor Eduardo de Souza Costa

**References**
> - Solange Rezende. [Paradigma de aprendizado baseado em instâncias](https://edisciplinas.usp.br/pluginfile.php/8366136/mod_resource/content/1/Aula_15_IA_MedidasDistancia_KNN.pdf). Mai. de 2024.

## Importing necessary libraries

In [96]:
import pandas as pd
from sklearn.metrics import f1_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import StratifiedKFold

---
## Exercise 1
---

### Creating a dataset of patients registers to verify their diagnostic

> On the proposed exercise, the available data are referent to six different patients, commenting their symptoms and respective diagnostic. The objective is utilizing this data to label new cases, compare results with theoretical awaited results and the Decision Trees results with this same dataset.

#### Writing dataset file to train

In [97]:
%%writefile patients_register_train.tsv
Nome;Febre;Enjôo;Manchas;Dores;Diagnóstico
João;sim;sim;pequenas;sim;doente
Pedro;não;não;grandes;não;saudável
Maria;sim;sim;pequenas;não;saudável
José;sim;não;grandes;sim;doente
Ana;sim;não;pequenas;sim;saudável
Leila;não;não;grandes;sim;doente

Overwriting patients_register_train.tsv


#### Writing dataset file to predict

In [98]:
%%writefile patients_register_classify.tsv
Nome;Febre;Enjôo;Manchas;Dores
Luis;não;não;pequenas;sim
Laura;sim;sim;grandes;sim

Overwriting patients_register_classify.tsv


#### Reading dataset file to train

In [99]:
patients_dataset_train = pd.read_csv('patients_register_train.tsv', index_col='Nome', sep=';')

print(patients_dataset_train)

      Febre Enjôo   Manchas Dores Diagnóstico
Nome                                         
João    sim   sim  pequenas   sim      doente
Pedro   não   não   grandes   não    saudável
Maria   sim   sim  pequenas   não    saudável
José    sim   não   grandes   sim      doente
Ana     sim   não  pequenas   sim    saudável
Leila   não   não   grandes   sim      doente


#### Reading dataset file to predict

In [100]:
patients_x_predict = pd.read_csv('patients_register_classify.tsv', index_col='Nome', sep=';')

print(patients_x_predict)

      Febre Enjôo   Manchas Dores
Nome                             
Luis    não   não  pequenas   sim
Laura   sim   sim   grandes   sim


### Cleaning and treating the dataset

> With the available data, It's necessary to facility its computational treatment, converting its symbolical values into numerical, conserving the order on ordinary values and keeping an unitary distance when there isn't an order. On our case, all we need to do is the conversion of categorical values into numerical.

#### Converting symbolical values into numerical to train

In [101]:
patients_dataset_train.Febre.replace({'sim': 1,'não': 0}, inplace=True)
patients_dataset_train.Enjôo.replace({'sim': 1,'não': 0}, inplace=True)
patients_dataset_train.Manchas.replace({'grandes': 1,'pequenas': 0}, inplace=True)
patients_dataset_train.Dores.replace({'sim': 1,'não': 0}, inplace=True)
patients_dataset_train.Diagnóstico.replace({'saudável': 1,'doente': 0}, inplace=True)

# Separating label from dataset
patients_x = patients_dataset_train.drop(['Diagnóstico'], axis=1)
patients_y = patients_dataset_train.Diagnóstico

print(patients_x)

       Febre  Enjôo  Manchas  Dores
Nome                               
João       1      1        0      1
Pedro      0      0        1      0
Maria      1      1        0      0
José       1      0        1      1
Ana        1      0        0      1
Leila      0      0        1      1


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_dataset_train.Febre.replace({'sim': 1,'não': 0}, inplace=True)
  patients_dataset_train.Febre.replace({'sim': 1,'não': 0}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_dataset_train.Enjôo.replace({'sim': 1,'não': 0}, inplace=True)
  patients_dat

#### Converting symbolical values into numerical to predict

In [102]:
patients_x_predict.Febre.replace({'sim': 1,'não': 0}, inplace=True)
patients_x_predict.Enjôo.replace({'sim': 1,'não': 0}, inplace=True)
patients_x_predict.Manchas.replace({'grandes': 1,'pequenas': 0}, inplace=True)
patients_x_predict.Dores.replace({'sim': 1,'não': 0}, inplace=True)

print(patients_x_predict)

       Febre  Enjôo  Manchas  Dores
Nome                               
Luis       0      0        0      1
Laura      1      1        1      1


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_x_predict.Febre.replace({'sim': 1,'não': 0}, inplace=True)
  patients_x_predict.Febre.replace({'sim': 1,'não': 0}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_x_predict.Enjôo.replace({'sim': 1,'não': 0}, inplace=True)
  patients_x_predict.Enjôo

### Solution for K = 1

> Creating a K-NN model where K = 1, looking to compare these obtained results for different values of K.

#### Construction of the model

In [103]:
patients_k1 = KNeighborsClassifier(n_neighbors=1)

#### Validation of the model

##### Instance a Stratified KFold cross validation for K = 1

In [104]:
# Defining Stratified KFold and the number of splits which our dataset should be divisible
patients_skf_k1 = StratifiedKFold(n_splits=3)
patients_k1_index = KNeighborsClassifier(n_neighbors=1)
patients_accuracies_k1 = []
patients_f1_scores_k1 = []

# Defining train and test dataframes for each index prepared by SKF, then defining accuracy for each index results
for i, (patients_train_index_k1, patients_test_index_k1) in enumerate(patients_skf_k1.split(patients_x, patients_y)):
    x_train_k1 = patients_x.iloc[patients_train_index_k1]
    y_train_k1 = patients_y.iloc[patients_train_index_k1]
    x_test_k1 = patients_x.iloc[patients_test_index_k1]
    y_test_k1 = patients_y.iloc[patients_test_index_k1]
    patients_k1_index.fit(x_train_k1,y_train_k1)
    patients_accuracies_k1.insert(i,patients_k1_index.score(x_test_k1,y_test_k1))
    patients_f1_scores_k1.insert(i,f1_score(y_test_k1,patients_k1_index.predict(x_test_k1), average='weighted'))

##### Showing accuracy results by Stratified KFold cross validation for K = 1

In [105]:
# Transforming the list of accuracies and F1 for each interactions of KFold into a dataframe
patients_accuracies_k1_df = pd.DataFrame(data=patients_accuracies_k1, columns=[''])
patients_f1_scores_k1_df = pd.DataFrame(data=patients_f1_scores_k1, columns=[''])

# Calculating the accuracy of model as mean of accuracies and F1 from each split of SKF
patients_accuracy_k1 = patients_accuracies_k1_df.mean()
patients_f1_score_k1 = patients_f1_scores_k1_df.mean()

# Getting standard deviation for accuracies
patients_accuracy_std_k1 = patients_accuracies_k1_df.std()
patients_f1_score_std_k1 = patients_f1_scores_k1_df.std()

print(f"Accuracy of the model is: {patients_accuracy_k1}\nIts standard deviation is: {patients_accuracy_std_k1}\n\n")
print(f"F1 score of the model is: {patients_f1_score_k1}\nIts standard deviation is: {patients_f1_score_std_k1}")

Accuracy of the model is:     0.0
dtype: float64
Its standard deviation is:     0.0
dtype: float64


F1 score of the model is:     0.0
dtype: float64
Its standard deviation is:     0.0
dtype: float64


#### Obtaining the label on examples to predict

In [106]:
patients_classifier_k1 = patients_k1.fit(patients_x,patients_y)

patients_classified_k1 = patients_classifier_k1.predict(patients_x_predict)
patients_results_k1 = patients_x_predict.copy()
patients_results_k1['Diagnóstico'] = patients_classified_k1

print(patients_results_k1)

       Febre  Enjôo  Manchas  Dores  Diagnóstico
Nome                                            
Luis       0      0        0      1            1
Laura      1      1        1      1            0


#### Returning numerical values into categorical for better interpretation

In [107]:
patients_results_k1.Febre.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k1.Enjôo.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k1.Dores.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k1.Manchas.replace({0: 'Pequenas', 1: 'Grandes'}, inplace=True)
patients_results_k1.Diagnóstico.replace({0: 'Doente', 1: 'Saudável'}, inplace=True)

print(patients_results_k1)

      Febre Enjôo   Manchas Dores Diagnóstico
Nome                                         
Luis    Não   Não  Pequenas   Sim    Saudável
Laura   Sim   Sim   Grandes   Sim      Doente


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_results_k1.Febre.replace({0: 'Não', 1: 'Sim'}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_results_k1.Enjôo.replace({0: 'Não', 1: 'Sim'}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the inte

### Solution for K = 3

> Creating a K-NN model where K = 3, looking to compare these obtained results for different values of K.

#### Construction of the model

In [108]:
patients_k3 = KNeighborsClassifier(n_neighbors=3)

#### Validation of the model

##### Instance a Stratified KFold cross validation for K = 3

In [109]:
# Defining Stratified KFold and the number of splits which our dataset should be divisible
patients_skf_k3 = StratifiedKFold(n_splits=3)
patients_k3_index = KNeighborsClassifier(n_neighbors=3)
patients_accuracies_k3 = []
patients_f1_scores_k3 = []

# Defining train and test dataframes for each index prepared by SKF, then defining accuracy for each index results
for i, (patients_train_index_k3, patients_test_index_k3) in enumerate(patients_skf_k3.split(patients_x, patients_y)):
    x_train_k3 = patients_x.iloc[patients_train_index_k3]
    y_train_k3 = patients_y.iloc[patients_train_index_k3]
    x_test_k3 = patients_x.iloc[patients_test_index_k3]
    y_test_k3 = patients_y.iloc[patients_test_index_k3]
    patients_k3_index.fit(x_train_k3,y_train_k3)
    patients_accuracies_k3.insert(i,patients_k3_index.score(x_test_k3,y_test_k3))
    patients_f1_scores_k3.insert(i,f1_score(y_test_k3,patients_k3_index.predict(x_test_k3), average='weighted'))

##### Showing accuracy results by Stratified KFold cross validation for K = 3

In [110]:
# Transforming the list of accuracies and F1 for each interactions of KFold into a dataframe
patients_accuracies_k3_df = pd.DataFrame(data=patients_accuracies_k3, columns=[''])
patients_f1_scores_k3_df = pd.DataFrame(data=patients_f1_scores_k3, columns=[''])

# Calculating the accuracy of model as mean of accuracies and F1 from each split of SKF
patients_accuracy_k3 = patients_accuracies_k3_df.mean()
patients_f1_score_k3 = patients_f1_scores_k3_df.mean()

# Getting standard deviation for accuracies
patients_accuracy_std_k3 = patients_accuracies_k3_df.std()
patients_f1_score_std_k3 = patients_f1_scores_k3_df.std()

print(f"Accuracy of the model is: {patients_accuracy_k3}\nIts standard deviation is: {patients_accuracy_std_k3}\n\n")
print(f"F1 score of the model is: {patients_f1_score_k3}\nIts standard deviation is: {patients_f1_score_std_k3}")

Accuracy of the model is:     0.5
dtype: float64
Its standard deviation is:     0.5
dtype: float64


F1 score of the model is:     0.444444
dtype: float64
Its standard deviation is:     0.509175
dtype: float64


#### Obtaining the label on examples to predict

In [111]:
patients_classifier_k3 = patients_k3.fit(patients_x,patients_y)

patients_classified_k3 = patients_classifier_k3.predict(patients_x_predict)
patients_results_k3 = patients_x_predict.copy()
patients_results_k3['Diagnóstico'] = patients_classified_k3

print(patients_results_k3)

       Febre  Enjôo  Manchas  Dores  Diagnóstico
Nome                                            
Luis       0      0        0      1            0
Laura      1      1        1      1            0


#### Returning numerical values into categorical for better interpretation

In [112]:
patients_results_k3.Febre.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k3.Enjôo.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k3.Dores.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k3.Manchas.replace({0: 'Pequenas', 1: 'Grandes'}, inplace=True)
patients_results_k3.Diagnóstico.replace({0: 'Doente', 1: 'Saudável'}, inplace=True)

print(patients_results_k3)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_results_k3.Febre.replace({0: 'Não', 1: 'Sim'}, inplace=True)


      Febre Enjôo   Manchas Dores Diagnóstico
Nome                                         
Luis    Não   Não  Pequenas   Sim      Doente
Laura   Sim   Sim   Grandes   Sim      Doente


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_results_k3.Enjôo.replace({0: 'Não', 1: 'Sim'}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_results_k3.Dores.replace({0: 'Não', 1: 'Sim'}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the inte

### Solution for K = 5

> Creating a K-NN model where K = 5, looking to compare these obtained results for different values of K.

#### Construction of the model

In [113]:
patients_k5 = KNeighborsClassifier(n_neighbors=5)

#### Validation of the model

> Actually, because of the amount of examples on the dataset is limited, we're not able to determine the cross validation for k = 5, because to find the 5-nearest examples we need at least 6 examples, and It's exactly the number of examples we have in all dataset, so if we accomplish a split, we'll not have enough data to get 5-nearest examples.

#### Obtaining the label on examples to predict

In [114]:
patients_classifier_k5 = patients_k5.fit(patients_x,patients_y)

patients_classified_k5 = patients_classifier_k5.predict(patients_x_predict)
patients_results_k5 = patients_x_predict.copy()
patients_results_k5['Diagnóstico'] = patients_classified_k5

print(patients_results_k5)

       Febre  Enjôo  Manchas  Dores  Diagnóstico
Nome                                            
Luis       0      0        0      1            0
Laura      1      1        1      1            0


#### Returning numerical values into categorical for better interpretation

In [115]:
patients_results_k5.Febre.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k5.Enjôo.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k5.Dores.replace({0: 'Não', 1: 'Sim'}, inplace=True)
patients_results_k5.Manchas.replace({0: 'Pequenas', 1: 'Grandes'}, inplace=True)
patients_results_k5.Diagnóstico.replace({0: 'Doente', 1: 'Saudável'}, inplace=True)

print(patients_results_k5)

      Febre Enjôo   Manchas Dores Diagnóstico
Nome                                         
Luis    Não   Não  Pequenas   Sim      Doente
Laura   Sim   Sim   Grandes   Sim      Doente


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_results_k5.Febre.replace({0: 'Não', 1: 'Sim'}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  patients_results_k5.Enjôo.replace({0: 'Não', 1: 'Sim'}, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the inte

---
## Exercício 2
---

### Criar dataset de classe e escolaridade

> No exercício proposto são disponibilizados dados referentes a escolaridade, estado e renda de diferentes pessoas. Busca-se registrar os dados disponibilizados no exercício para implementar uma solução de classificação.

In [116]:
# @title Criando Dataset

%%writefile classe_escolaridade.tsv
Estado;Escolaridade;Altura;Salário;Classe
SP;Médio;180;3000;A
RJ;Superior;174;7000;B
RS;Médio;180;600;B
RJ;Superior;100;2000;A
SP;Fundamental;178;5000;A
RJ;Fundamental;188;1800;A


UsageError: Line magic function `%%writefile` not found.


In [None]:
# @title Criando dataset com dados novos para serem classificados

%%writefile classificar_escolaridade.tsv
Estado;Escolaridade;Altura;Salário
RJ;Médio;178;2000
SP;Superior;200;800

Overwriting classificar_escolaridade.tsv


In [None]:
# @title Lendo dataset
# Indexa pelo nome, busca predizer a coluna "Diagnóstico"
dataset = pd.read_csv('classe_escolaridade.tsv', sep=';')

dataset

Unnamed: 0,Estado,Escolaridade,Altura,Salário,Classe
0,SP,Médio,180,3000,A
1,RJ,Superior,174,7000,B
2,RS,Médio,180,600,B
3,RJ,Superior,100,2000,A
4,SP,Fundamental,178,5000,A
5,RJ,Fundamental,188,1800,A


In [None]:
# @title Lendo dataset com valores a classificar

rank = pd.read_csv('classificar_escolaridade.tsv', sep=';')

rank

Unnamed: 0,Estado,Escolaridade,Altura,Salário
0,RJ,Médio,178,2000
1,SP,Superior,200,800


### Limpar e tratar o dataset

> Com os dados disponíveis, é necessário facilitar seu tratamento computacional convertendo valores simbolicos em numéricos, conservando ordem em caso de valores ordinais e mantendo distância unitária caso não haja distância unitária. Em nosso caso, como é uma base minúscula e sem problemas, esta etapa se resume à conversão dos valores categóricos.

In [None]:
# @title Convertendo valores simbólicos em numéricos

#Valores simbólicos com distância unitária ou ordinais com ordem definida
dataset.Classe.replace({'A': 1,'B': 0}, inplace=True)
dataset.Escolaridade.replace({'Superior': 2, 'Médio': 1, 'Fundamental': 0}, inplace=True)

#Tratametno para manter distância unitária em valores não ordinais
onehot = pd.get_dummies(dataset.Estado,dtype=int)
dataset = pd.concat([onehot, dataset.drop('Estado', axis=1)], axis=1)

dataframe = dataset.drop('Classe', axis=1)
dataframe


Unnamed: 0,RJ,RS,SP,Escolaridade,Altura,Salário
0,0,0,1,1,180,3000
1,1,0,0,2,174,7000
2,0,1,0,1,180,600
3,1,0,0,2,100,2000
4,0,0,1,0,178,5000
5,1,0,0,0,188,1800


In [None]:
# @title Convertendo valores simbólicos em numéricos para valores a classificar

#Valores simbólicos com distância unitária ou ordinais com ordem definida
rank.Escolaridade.replace({'Superior': 2, 'Médio': 1, 'Fundamental': 0}, inplace=True)

#Tratametno para manter distância unitária em valores não ordinais
onehot_rank = pd.get_dummies(rank.Estado,dtype=int)
rank = pd.concat([onehot_rank, rank.drop('Estado', axis=1)], axis=1)

#Recuperando coluna perdida para o dado chegar da mesma forma que os presentes no treinamento
RS = [0]*len(rank.RJ)
RS_pd = pd.DataFrame(RS,columns=['RS'])

rank = pd.concat([rank.RJ,RS_pd,rank.drop('RJ', axis=1)],axis=1)
rank

Unnamed: 0,RJ,RS,SP,Escolaridade,Altura,Salário
0,1,0,0,1,178,2000
1,0,0,1,2,200,800


### Solução para K = 1

> Criação de um algoritmo K-NN com K = 1, para comparar resultados obtidos para diferentes valores de K e exibir possíveis viéses destes modelos.

In [None]:
# @title Construção do Modelo

k1 = KNeighborsClassifier(n_neighbors=1)

In [None]:
# @title Obtenção das classes para exemplos

classifier_k1 = k1.fit(dataframe,dataset.Classe)

ranked_k1 = classifier_k1.predict(rank)
ranked_k1 = pd.DataFrame(ranked_k1,columns=['Classe'])
results_k1 = pd.concat([rank,ranked_k1],axis=1)

In [None]:
# @title Retornando valores numéricos para categóricos

results_k1.Escolaridade.replace({1:'Médio',2:'Superior'}, inplace=True)
results_k1.Classe.replace({1:'A',0:'B'}, inplace=True)
dummies_k1 = pd.concat([results_k1.RJ, results_k1.RS, results_k1.SP], axis=1)
onecold_k1 = pd.from_dummies(dummies_k1)
onecold_k1.rename(columns={'':'Estado'}, inplace=True)
results_k1 = pd.concat([onecold_k1, results_k1.drop(['RJ','RS','SP'], axis=1)], axis=1)

results_k1

Unnamed: 0,Estado,Escolaridade,Altura,Salário,Classe
0,RJ,Médio,178,2000,A
1,SP,Superior,200,800,B


> Para possível verificação destes resultados foi produzida uma planilha disponível na pasta de arquivos anexados a este *notebook*, nomeada "*exercício_2.xlsx*". Os resultados apresentados são condizentes com os preditos na planilha para o caso de K = 1.

### Solução para K = 3

> Criação de um algoritmo K-NN com K = 3, para comparar resultados obtidos para diferentes valores de K e exibir possíveis viéses destes modelos.

In [None]:
# @title Construção do Modelo

k3 = KNeighborsClassifier(n_neighbors=3)

In [None]:
# @title Obtenção das classes para exemplos

classifier_k3 = k3.fit(dataframe,dataset.Classe)

ranked_k3 = classifier_k3.predict(rank)
ranked_k3 = pd.DataFrame(ranked_k3,columns=['Classe'])
results_k3 = pd.concat([rank,ranked_k3],axis=1)

In [None]:
# @title Retornando valores numéricos para categóricos

results_k3.Escolaridade.replace({1:'Médio',2:'Superior'}, inplace=True)
results_k3.Classe.replace({1:'A',0:'B'}, inplace=True)
dummies_k3 = pd.concat([results_k3.RJ, results_k3.RS, results_k3.SP], axis=1)
onecold_k3 = pd.from_dummies(dummies_k3)
onecold_k3.rename(columns={'':'Estado'}, inplace=True)
results_k3 = pd.concat([onecold_k3, results_k3.drop(['RJ','RS','SP'], axis=1)], axis=1)

results_k3

Unnamed: 0,Estado,Escolaridade,Altura,Salário,Classe
0,RJ,Médio,178,2000,A
1,SP,Superior,200,800,A


> Agora, com os resultados para K = 3, conseguimos também confirmar na planilha exercício_2.xlsx que os valores de classe para ambos os testes condiz com o esperado.

###Calculando a precisão do algoritmo