Mentoring has been identified as an essential component in the development academic and profisional. 

The purpose is the better understanding of the mentoring experiences of graduate students enrolled at college. 

In this Notebook, used data analysis to provide personalized student mentoring, recommendations based on their profile.

The dataset used in this exercise is based https://github.com/maratonadev-br/desafio-2-2020/tree/master/Assets/Data.

In [None]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.utils import class_weight
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_score
from imblearn.ensemble import BalancedRandomForestClassifier
from sklearn.metrics import classification_report

Features :

Student enrollment

Name 

Number of  students that flunked  in the Business Law Discipline

Number of  students that flunked in the Entrepreneurship Discipline

Number of  students that flunked in the Financial Mathematics Discipline

Number of students that flunked in the  Operational Management Discipline

Grade point average of students in the Business Law Exam

Grade point average of students in the Entrepreneurship Exam

Grade point average of students in the Financial Mathematics Exam

Grade point average of students in the Operational Management Exam

English - Binary variable  that indicates If the student has  knowledge of the English Language (No==0 or Yes==1)

Study time attendance completed by students

Number of tasks handed online submission by students

Number of students absent

Target Variable Categorical:

Profile 

Excellent

Very Good

Human

Exact

Difficulty

In [None]:
df = pd.read_csv('data/dataset.csv')

# Exploratory Data Analysis 

In [None]:
df.info()

In [None]:
df.loc[:,['REPROVACOES_DE','REPROVACOES_EM','REPROVACOES_MF','REPROVACOES_GO',
         'NOTA_DE','NOTA_EM','NOTA_MF','NOTA_GO','H_AULA_PRES', 'TAREFAS_ONLINE', 
         'FALTAS']].describe()

In [None]:
#Create histogram for each columns 
inputs = ['NOTA_DE','NOTA_EM','NOTA_MF','NOTA_GO']
for col in inputs:
        df.hist(column=col, bins=16, figsize=(6,6),color='#1aa3ff')
        plt.title(col)
plt.tight_layout()
plt.show()

Handling missing values

In [None]:
#Checking missing values
df.isnull().sum()

In [None]:
#Replacing missing values
df.fillna(0,inplace=True)

In [None]:
df.shape

In [None]:
df.duplicated().sum()

In [None]:
sns.set_theme(style="whitegrid")
sns.barplot(x='PERFIL', y='NOTA_MF', data=df, color='#1aa3ff')
plt.xlabel('PERFIL')
plt.ylabel('NOTA_MF')
plt.xticks(rotation = 45)
plt.tight_layout()
plt.show()

In [None]:
df['PERFIL'].unique()

In [None]:
df['PERFIL'].value_counts(ascending=True)

In [None]:
#Define X and y
features  = ['REPROVACOES_DE','REPROVACOES_EM','REPROVACOES_MF','REPROVACOES_GO',
            'NOTA_DE','NOTA_EM','NOTA_MF','NOTA_GO','H_AULA_PRES', 'TAREFAS_ONLINE', 'FALTAS']
X = df[features]
y = df.PERFIL

In [None]:
X.head()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.20, random_state=0, stratify=y)

In [None]:
X_train.shape

In [None]:
X_test.shape

In [None]:
clf = BalancedRandomForestClassifier(class_weight='balanced', 
                                      max_depth=20, 
                                      random_state=1, 
                                      replacement=True)

In [None]:
clf.fit(X_train,y_train)

In [None]:
score = clf.score(X_train,y_train)
print(score)

In [None]:
y_pred = clf.predict(X_test)
print('Preditions : \n',y_pred[0:10])

In [None]:
print('True Labels: \n', y_test[0:10])

In [None]:
conf_matrix = confusion_matrix(y_test, y_pred)
print(conf_matrix)

In [25]:
#classes = ['EXATAS', 'HUMANAS', 'DIFICULDADE', 'MUITO_BOM', 'EXCELENTE']
plt.imshow(conf_matrix, interpolation="nearest", cmap=plt.cm.Blues)
plt.colorbar()
tick_marks = np.arange(len(y_train))
plt.xticks(tick_marks, y_train, rotation=45)
plt.yticks(tick_marks, y_train)
plt.xlabel("Predicted Profile")
plt.ylabel("Actual Profile")
plt.show()

In [None]:
print(classification_report(y_test, y_pred))