# Predição de Retenção de Estudantes no Ensino Superior

## Objetivo

Este projeto tem como objetivo analisar e prever a retenção de estudantes no ensino superior, identificando fatores que podem levar ao abandono (Dropout), matrícula contínua (Enrolled) ou conclusão do curso (Graduate). A partir de técnicas de **Análise Exploratória de Dados (EDA)** e **Machine Learning**, buscamos construir modelos capazes de realizar predições baseadas em características socioeconômicas, acadêmicas e institucionais dos alunos.

## Metodologia

1. **Carregamento e Análise dos Dados**: utilizamos a biblioteca `pandas` para carregar o dataset e `plotly` para realizar análises exploratórias e visuais das variáveis.
2. **Pré-processamento**: aplicamos transformações como codificação de variáveis categóricas (`OneHotEncoder`) e normalização de dados numéricos (`MinMaxScaler`).
3. **Modelagem**: foram treinados e avaliados diversos modelos de classificação, como `DummyClassifier`, `DecisionTreeClassifier`, `RandomForestClassifier` e `KNeighborsClassifier`.
4. **Avaliação**: comparamos o desempenho dos modelos utilizando métricas de acurácia.
5. **Exportação**: os melhores modelos e transformadores foram serializados utilizando `pickle` para facilitar futuras aplicações.

## Fonte dos Dados

Os dados utilizados neste projeto foram obtidos do Kaggle e estão disponíveis no seguinte link:  
[Higher Education - Predictors of Student Retention](https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention/data)



#**Carregamento e Análise dos Dados**

In [1]:
import pandas as pd

In [2]:
dados = pd.read_csv('/content/dataset.csv')

In [3]:
dados

Unnamed: 0,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Target
0,1,8,5,2,1,1,1,13,10,6,...,0,0,0,0,0.000000,0,10.8,1.4,1.74,Dropout
1,1,6,1,11,1,1,1,1,3,4,...,0,6,6,6,13.666667,0,13.9,-0.3,0.79,Graduate
2,1,1,5,5,1,1,1,22,27,10,...,0,6,0,0,0.000000,0,10.8,1.4,1.74,Dropout
3,1,8,2,15,1,1,1,23,27,6,...,0,6,10,5,12.400000,0,9.4,-0.8,-3.12,Graduate
4,2,12,1,3,0,1,1,22,28,10,...,0,6,6,6,13.000000,0,13.9,-0.3,0.79,Graduate
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4419,1,1,6,15,1,1,1,1,1,6,...,0,6,8,5,12.666667,0,15.5,2.8,-4.06,Graduate
4420,1,1,2,15,1,1,19,1,1,10,...,0,6,6,2,11.000000,0,11.1,0.6,2.02,Dropout
4421,1,1,1,12,1,1,1,22,27,10,...,0,8,9,1,13.500000,0,13.9,-0.3,0.79,Dropout
4422,1,1,1,9,1,1,1,22,27,8,...,0,5,6,5,12.000000,0,9.4,-0.8,-3.12,Graduate


In [4]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4424 entries, 0 to 4423
Data columns (total 35 columns):
 #   Column                                          Non-Null Count  Dtype  
---  ------                                          --------------  -----  
 0   Marital status                                  4424 non-null   int64  
 1   Application mode                                4424 non-null   int64  
 2   Application order                               4424 non-null   int64  
 3   Course                                          4424 non-null   int64  
 4   Daytime/evening attendance                      4424 non-null   int64  
 5   Previous qualification                          4424 non-null   int64  
 6   Nacionality                                     4424 non-null   int64  
 7   Mother's qualification                          4424 non-null   int64  
 8   Father's qualification                          4424 non-null   int64  
 9   Mother's occupation                      

In [5]:
import plotly.express as px

In [81]:
features = [
    'Target', 'Marital status', 'Application mode', 'Application order', 'Course',
    'Daytime/evening attendance', 'Previous qualification', 'Nacionality',
    "Mother's qualification", "Father's qualification", "Mother's occupation",
    "Father's occupation", 'Displaced', 'Educational special needs', 'Debtor',
    'Tuition fees up to date', 'Gender', 'Scholarship holder', 'Age at enrollment',
    'International', 'Curricular units 1st sem (credited)',
    'Curricular units 1st sem (enrolled)',
    'Curricular units 1st sem (evaluations)',
    'Curricular units 1st sem (approved)', 'Unemployment rate',
    'Inflation rate', 'GDP'
]

for feature in features:
    fig = px.histogram(dados, x=feature, text_auto=True, color='Target', barmode='group')
    fig.show()


In [33]:
x = dados.drop('Target', axis = 1)
y = dados['Target']

In [34]:
x

Unnamed: 0,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
0,1,8,5,2,1,1,1,13,10,6,...,0,0,0,0,0,0.000000,0,10.8,1.4,1.74
1,1,6,1,11,1,1,1,1,3,4,...,0,0,6,6,6,13.666667,0,13.9,-0.3,0.79
2,1,1,5,5,1,1,1,22,27,10,...,0,0,6,0,0,0.000000,0,10.8,1.4,1.74
3,1,8,2,15,1,1,1,23,27,6,...,0,0,6,10,5,12.400000,0,9.4,-0.8,-3.12
4,2,12,1,3,0,1,1,22,28,10,...,0,0,6,6,6,13.000000,0,13.9,-0.3,0.79
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4419,1,1,6,15,1,1,1,1,1,6,...,0,0,6,8,5,12.666667,0,15.5,2.8,-4.06
4420,1,1,2,15,1,1,19,1,1,10,...,0,0,6,6,2,11.000000,0,11.1,0.6,2.02
4421,1,1,1,12,1,1,1,22,27,10,...,0,0,8,9,1,13.500000,0,13.9,-0.3,0.79
4422,1,1,1,9,1,1,1,22,27,8,...,0,0,5,6,5,12.000000,0,9.4,-0.8,-3.12


In [35]:
y

Unnamed: 0,Target
0,Dropout
1,Graduate
2,Dropout
3,Graduate
4,Graduate
...,...
4419,Graduate
4420,Dropout
4421,Dropout
4422,Graduate


#**Pré-processamento**

In [36]:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder

In [37]:
colunas = x.columns
colunas

Index(['Marital status', 'Application mode', 'Application order', 'Course',
       'Daytime/evening attendance', 'Previous qualification', 'Nacionality',
       'Mother's qualification', 'Father's qualification',
       'Mother's occupation', 'Father's occupation', 'Displaced',
       'Educational special needs', 'Debtor', 'Tuition fees up to date',
       'Gender', 'Scholarship holder', 'Age at enrollment', 'International',
       'Curricular units 1st sem (credited)',
       'Curricular units 1st sem (enrolled)',
       'Curricular units 1st sem (evaluations)',
       'Curricular units 1st sem (approved)',
       'Curricular units 1st sem (grade)',
       'Curricular units 1st sem (without evaluations)',
       'Curricular units 2nd sem (credited)',
       'Curricular units 2nd sem (enrolled)',
       'Curricular units 2nd sem (evaluations)',
       'Curricular units 2nd sem (approved)',
       'Curricular units 2nd sem (grade)',
       'Curricular units 2nd sem (without evaluations)

In [38]:
cat_cols = [
    'Marital status', 'Application mode', 'Application order', 'Course',
    'Daytime/evening attendance', 'Previous qualification', 'Nacionality',
    "Mother's qualification", "Father's qualification", "Mother's occupation",
    "Father's occupation", 'Displaced', 'Educational special needs', 'Debtor',
    'Tuition fees up to date', 'Gender', 'Scholarship holder', 'International'
]

num_cols = [
    'Age at enrollment', 'Curricular units 1st sem (credited)',
    'Curricular units 1st sem (enrolled)', 'Curricular units 1st sem (evaluations)',
    'Curricular units 1st sem (approved)', 'Curricular units 1st sem (grade)',
    'Curricular units 1st sem (without evaluations)', 'Curricular units 2nd sem (credited)',
    'Curricular units 2nd sem (enrolled)', 'Curricular units 2nd sem (evaluations)',
    'Curricular units 2nd sem (approved)', 'Curricular units 2nd sem (grade)',
    'Curricular units 2nd sem (without evaluations)', 'Unemployment rate',
    'Inflation rate', 'GDP'
]

In [39]:
one_hot = make_column_transformer(
    (OneHotEncoder(drop = 'if_binary'), cat_cols),
    remainder='passthrough',
    sparse_threshold=0
)

In [40]:
x = one_hot.fit_transform(x)

In [41]:
pd.DataFrame(x, columns = one_hot.get_feature_names_out(colunas))

Unnamed: 0,onehotencoder__Marital status_1,onehotencoder__Marital status_2,onehotencoder__Marital status_3,onehotencoder__Marital status_4,onehotencoder__Marital status_5,onehotencoder__Marital status_6,onehotencoder__Application mode_1,onehotencoder__Application mode_2,onehotencoder__Application mode_3,onehotencoder__Application mode_4,...,remainder__Curricular units 1st sem (without evaluations),remainder__Curricular units 2nd sem (credited),remainder__Curricular units 2nd sem (enrolled),remainder__Curricular units 2nd sem (evaluations),remainder__Curricular units 2nd sem (approved),remainder__Curricular units 2nd sem (grade),remainder__Curricular units 2nd sem (without evaluations),remainder__Unemployment rate,remainder__Inflation rate,remainder__GDP
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,10.8,1.4,1.74
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,6.0,6.0,6.0,13.666667,0.0,13.9,-0.3,0.79
2,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,6.0,0.0,0.0,0.000000,0.0,10.8,1.4,1.74
3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,6.0,10.0,5.0,12.400000,0.0,9.4,-0.8,-3.12
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,6.0,6.0,6.0,13.000000,0.0,13.9,-0.3,0.79
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4419,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,6.0,8.0,5.0,12.666667,0.0,15.5,2.8,-4.06
4420,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,6.0,6.0,2.0,11.000000,0.0,11.1,0.6,2.02
4421,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,8.0,9.0,1.0,13.500000,0.0,13.9,-0.3,0.79
4422,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,5.0,6.0,5.0,12.000000,0.0,9.4,-0.8,-3.12


In [42]:
from sklearn.preprocessing import LabelEncoder

In [43]:
label_encoder = LabelEncoder()

In [44]:
y = label_encoder.fit_transform(y)

In [45]:
y

array([0, 2, 0, ..., 0, 2, 2])

In [46]:
from sklearn.model_selection import train_test_split

In [47]:
x_treino, x_teste, y_treino, y_teste = train_test_split(x, y, stratify=y, random_state=10)

#**Modelagem**

In [48]:
from sklearn.dummy import DummyClassifier

In [49]:
dummy = DummyClassifier()
dummy.fit(x_treino, y_treino)

dummy.score(x_teste, y_teste)

0.49909584086799275

In [50]:
from sklearn.tree import DecisionTreeClassifier

In [51]:
tree = DecisionTreeClassifier()
tree.fit(x_treino, y_treino)

In [52]:
tree.predict(x_teste)

array([0, 1, 0, ..., 0, 0, 1])

In [53]:
tree.score(x_teste, y_teste)

0.701627486437613

In [54]:
tree.score(x_treino, y_treino)

1.0

In [55]:
tree = DecisionTreeClassifier(random_state = 10, max_depth = 10)
tree.fit(x_treino, y_treino)

In [56]:
tree.score(x_treino, y_treino)

0.8755274261603375

In [57]:
tree.score(x_teste, y_teste)

0.7296564195298373

In [58]:
from sklearn.ensemble import RandomForestClassifier

In [59]:
rf = RandomForestClassifier(random_state=10, n_estimators=100, max_depth=10)
rf.fit(x_treino, y_treino)

In [60]:
rf.score(x_treino, y_treino)

0.8484026522001206

In [61]:
rf.score(x_teste, y_teste)

0.7522603978300181

In [62]:
from sklearn.preprocessing import MinMaxScaler

In [63]:
normalizacao = MinMaxScaler()
x_treino_normalizado = normalizacao.fit_transform(x_treino)

In [64]:
pd.DataFrame(x_treino_normalizado)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,242,243,244,245,246,247,248,249,250,251
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.217391,0.333333,0.15,0.584615,0.0,0.372093,0.488889,0.766182
1,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.000000,0.304348,0.303030,0.25,0.662308,0.0,0.558140,0.288889,0.772787
2,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.000000,0.260870,0.242424,0.25,0.613846,0.0,0.558140,0.288889,0.772787
3,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.000000,0.260870,0.181818,0.30,0.691026,0.0,0.209302,0.000000,0.124174
4,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.000000,0.260870,0.272727,0.25,0.753846,0.0,0.151163,0.488889,1.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3313,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.000000,0.260870,0.212121,0.30,0.761538,0.0,0.732558,0.111111,0.640687
3314,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.000000,0.217391,0.212121,0.25,0.635385,0.0,0.000000,0.755556,0.578600
3315,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.217391,0.242424,0.25,0.646154,0.0,0.732558,0.111111,0.640687
3316,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.304348,0.212121,0.30,0.644359,0.0,0.000000,0.755556,0.578600


In [65]:
from sklearn.neighbors import KNeighborsClassifier

In [66]:
knn = KNeighborsClassifier()

In [67]:
knn.fit(x_treino_normalizado, y_treino)

In [68]:
x_teste_normalizado = normalizacao.transform(x_teste)

In [69]:
knn.score(x_teste_normalizado, y_teste)

0.5913200723327305

#**Avaliação**

In [70]:
print(f"Dummy: {dummy.score(x_teste, y_teste)}")
print(f"Arvore: {tree.score(x_teste, y_teste)}")
print(f"Random Forest: {rf.score(x_teste, y_teste)}")
print(f"KNN: {knn.score(x_teste_normalizado, y_teste)}")

Dummy: 0.49909584086799275
Arvore: 0.7296564195298373
Random Forest: 0.7522603978300181
KNN: 0.5913200723327305


#**Exportação**

In [71]:
import pickle

In [72]:
with open('modelo_onehotenc.pkl', 'wb') as arquivo:
    pickle.dump(one_hot, arquivo)

In [73]:
with open('modelo_random_forest.pkl', 'wb') as arquivo:
    pickle.dump(rf, arquivo)

In [74]:
dados.columns

Index(['Marital status', 'Application mode', 'Application order', 'Course',
       'Daytime/evening attendance', 'Previous qualification', 'Nacionality',
       'Mother's qualification', 'Father's qualification',
       'Mother's occupation', 'Father's occupation', 'Displaced',
       'Educational special needs', 'Debtor', 'Tuition fees up to date',
       'Gender', 'Scholarship holder', 'Age at enrollment', 'International',
       'Curricular units 1st sem (credited)',
       'Curricular units 1st sem (enrolled)',
       'Curricular units 1st sem (evaluations)',
       'Curricular units 1st sem (approved)',
       'Curricular units 1st sem (grade)',
       'Curricular units 1st sem (without evaluations)',
       'Curricular units 2nd sem (credited)',
       'Curricular units 2nd sem (enrolled)',
       'Curricular units 2nd sem (evaluations)',
       'Curricular units 2nd sem (approved)',
       'Curricular units 2nd sem (grade)',
       'Curricular units 2nd sem (without evaluations)

In [75]:
dados[dados.columns[15:]]

Unnamed: 0,Gender,Scholarship holder,Age at enrollment,International,Curricular units 1st sem (credited),Curricular units 1st sem (enrolled),Curricular units 1st sem (evaluations),Curricular units 1st sem (approved),Curricular units 1st sem (grade),Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Target
0,1,0,20,0,0,0,0,0,0.000000,0,0,0,0,0,0.000000,0,10.8,1.4,1.74,Dropout
1,1,0,19,0,0,6,6,6,14.000000,0,0,6,6,6,13.666667,0,13.9,-0.3,0.79,Graduate
2,1,0,19,0,0,6,0,0,0.000000,0,0,6,0,0,0.000000,0,10.8,1.4,1.74,Dropout
3,0,0,20,0,0,6,8,6,13.428571,0,0,6,10,5,12.400000,0,9.4,-0.8,-3.12,Graduate
4,0,0,45,0,0,6,9,5,12.333333,0,0,6,6,6,13.000000,0,13.9,-0.3,0.79,Graduate
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4419,1,0,19,0,0,6,7,5,13.600000,0,0,6,8,5,12.666667,0,15.5,2.8,-4.06,Graduate
4420,0,0,18,1,0,6,6,6,12.000000,0,0,6,6,2,11.000000,0,11.1,0.6,2.02,Dropout
4421,0,1,30,0,0,7,8,7,14.912500,0,0,8,9,1,13.500000,0,13.9,-0.3,0.79,Dropout
4422,0,1,20,0,0,5,5,5,13.800000,0,0,5,6,5,12.000000,0,9.4,-0.8,-3.12,Graduate


In [76]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4424 entries, 0 to 4423
Data columns (total 35 columns):
 #   Column                                          Non-Null Count  Dtype  
---  ------                                          --------------  -----  
 0   Marital status                                  4424 non-null   int64  
 1   Application mode                                4424 non-null   int64  
 2   Application order                               4424 non-null   int64  
 3   Course                                          4424 non-null   int64  
 4   Daytime/evening attendance                      4424 non-null   int64  
 5   Previous qualification                          4424 non-null   int64  
 6   Nacionality                                     4424 non-null   int64  
 7   Mother's qualification                          4424 non-null   int64  
 8   Father's qualification                          4424 non-null   int64  
 9   Mother's occupation                      

## Predição com Novo Dado

Após o treinamento e avaliação dos modelos, realizamos a **validação prática** da solução aplicando o modelo de `RandomForestClassifier` em um **novo perfil de estudante**.

### Etapas realizadas:

1. Criamos um **novo conjunto de dados** representando as características de um estudante hipotético.
2. Aplicamos o mesmo **transformador de pré-processamento** (`OneHotEncoder`) salvo previamente para garantir que o dado de entrada estivesse no mesmo formato usado durante o treinamento.
3. Utilizamos o modelo `RandomForestClassifier` salvo para realizar a **predição**.

O resultado retornado foi um número que representa a classe prevista, que corresponde às seguintes categorias:

- `0` → Dropout  
- `1` → Enrolled  
- `2` → Graduate  

No caso deste exemplo, a predição foi `2`, indicando que, de acordo com o modelo, o estudante **provavelmente irá se formar (Graduate)**.



In [77]:
novo_dado = {
    'Marital status': [2],
    'Application mode': [8],
    'Application order': [1],
    'Course': [9],
    'Daytime/evening attendance': [1],
    'Previous qualification': [1],
    'Nacionality': [1],
    "Mother's qualification": [3],
    "Father's qualification": [24],
    "Mother's occupation": [8],
    "Father's occupation": [4],
    'Displaced': [1],
    'Educational special needs': [0],
    'Debtor': [0],
    'Tuition fees up to date': [1],
    'Gender': [1],
    'Scholarship holder': [0],
    'Age at enrollment': [20],
    'International': [0],
    'Curricular units 1st sem (credited)': [0],
    'Curricular units 1st sem (enrolled)': [6],
    'Curricular units 1st sem (evaluations)': [6],
    'Curricular units 1st sem (approved)': [5],
    'Curricular units 1st sem (grade)': [12],
    'Curricular units 1st sem (without evaluations)': [0],
    'Curricular units 2nd sem (credited)': [0],
    'Curricular units 2nd sem (enrolled)': [5],
    'Curricular units 2nd sem (evaluations)': [9],
    'Curricular units 2nd sem (approved)': [5],
    'Curricular units 2nd sem (grade)': [11],
    'Curricular units 2nd sem (without evaluations)': [0],
    'Unemployment rate': [8.9],
    'Inflation rate': [1.4],
    'GDP': [0.29]
}

In [78]:
novo_dado = pd.DataFrame(novo_dado)
novo_dado

Unnamed: 0,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Nacionality,Mother's qualification,Father's qualification,Mother's occupation,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
0,2,8,1,9,1,1,1,3,24,8,...,0,0,5,9,5,11,0,8.9,1.4,0.29


In [79]:
modelo_onehot = pd.read_pickle('/content/modelo_onehotenc.pkl')
modelo_random_forest = pd.read_pickle('/content/modelo_random_forest.pkl')

In [80]:
novo_dado = modelo_onehot.transform(novo_dado)
modelo_random_forest.predict(novo_dado)

array([2])

In [82]:
label_encoder.classes_

array(['Dropout', 'Enrolled', 'Graduate'], dtype=object)