# Student Grades Prediction with Machine Learning

In graduate studies, many students find it difficult to achieve good grades because they do not get much support in higher education courses compared to the support that students receive in schools. We can use machine learning for the student grades prediction task so that instructors can help students prepare for topics where student grades were predicted low. In this article, I will walk you through the task of student grades prediction with Machine Learning using Python.

## Student Grades Prediction

Universities are very prestigious places to access higher education. But the amount of fees universities charge today never equals the support they give to students. Some students need a lot of attention from instructors because if special attention will not be given to those students who are not getting good grades, it could be detrimental to their emotional state and their career in the long run.

By using machine learning algorithms, we can predict how well the students are going to perform so that we can help the students whose grades are predicted low. Student Grades Prediction is based on the problem of regression in machine learning. In the section below, I will take you through the task of Student Grades prediction with machine learning using Python.

## Student Grades Prediction using Python

I hope you now have understood why we need to predict the grades of a student. Now let’s see how we can use machine learning for the task of student grades prediction using Python. I will start this task by importing the necessary Python libraries and the datase

## Import Necessary Library

In [77]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import cv2

from sklearn.utils import shuffle

%matplotlib inline

## Data Loading

In [78]:
data = pd.read_csv(r".\\data\student_mat.csv")
data.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10


In [79]:
print(data.head())

  school sex  age address famsize Pstatus  Medu  Fedu     Mjob      Fjob  ...  \
0     GP   F   18       U     GT3       A     4     4  at_home   teacher  ...   
1     GP   F   17       U     GT3       T     1     1  at_home     other  ...   
2     GP   F   15       U     LE3       T     1     1  at_home     other  ...   
3     GP   F   15       U     GT3       T     4     2   health  services  ...   
4     GP   F   16       U     GT3       T     3     3    other     other  ...   

  famrel freetime  goout  Dalc  Walc health absences  G1  G2  G3  
0      4        3      4     1     1      3        6   5   6   6  
1      5        3      3     1     1      3        4   5   5   6  
2      4        3      2     2     3      3       10   7   8  10  
3      3        2      2     1     1      5        2  15  14  15  
4      4        3      2     1     2      5        4   6  10  10  

[5 rows x 33 columns]


## Data Pre_Processing

In [80]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 395 entries, 0 to 394
Data columns (total 33 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   school      395 non-null    object
 1   sex         395 non-null    object
 2   age         395 non-null    int64 
 3   address     395 non-null    object
 4   famsize     395 non-null    object
 5   Pstatus     395 non-null    object
 6   Medu        395 non-null    int64 
 7   Fedu        395 non-null    int64 
 8   Mjob        395 non-null    object
 9   Fjob        395 non-null    object
 10  reason      395 non-null    object
 11  guardian    395 non-null    object
 12  traveltime  395 non-null    int64 
 13  studytime   395 non-null    int64 
 14  failures    395 non-null    int64 
 15  schoolsup   395 non-null    object
 16  famsup      395 non-null    object
 17  paid        395 non-null    object
 18  activities  395 non-null    object
 19  nursery     395 non-null    object
 20  higher    

In [81]:
data.describe()

Unnamed: 0,age,Medu,Fedu,traveltime,studytime,failures,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
count,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0,395.0
mean,16.696203,2.749367,2.521519,1.448101,2.035443,0.334177,3.944304,3.235443,3.108861,1.481013,2.291139,3.55443,5.708861,10.908861,10.713924,10.41519
std,1.276043,1.094735,1.088201,0.697505,0.83924,0.743651,0.896659,0.998862,1.113278,0.890741,1.287897,1.390303,8.003096,3.319195,3.761505,4.581443
min,15.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,3.0,0.0,0.0
25%,16.0,2.0,2.0,1.0,1.0,0.0,4.0,3.0,2.0,1.0,1.0,3.0,0.0,8.0,9.0,8.0
50%,17.0,3.0,2.0,1.0,2.0,0.0,4.0,3.0,3.0,1.0,2.0,4.0,4.0,11.0,11.0,11.0
75%,18.0,4.0,3.0,2.0,2.0,0.0,5.0,4.0,4.0,2.0,3.0,5.0,8.0,13.0,13.0,14.0
max,22.0,4.0,4.0,4.0,4.0,3.0,5.0,5.0,5.0,5.0,5.0,5.0,75.0,19.0,19.0,20.0


In [82]:
print(data.describe())

              age        Medu        Fedu  traveltime   studytime    failures  \
count  395.000000  395.000000  395.000000  395.000000  395.000000  395.000000   
mean    16.696203    2.749367    2.521519    1.448101    2.035443    0.334177   
std      1.276043    1.094735    1.088201    0.697505    0.839240    0.743651   
min     15.000000    0.000000    0.000000    1.000000    1.000000    0.000000   
25%     16.000000    2.000000    2.000000    1.000000    1.000000    0.000000   
50%     17.000000    3.000000    2.000000    1.000000    2.000000    0.000000   
75%     18.000000    4.000000    3.000000    2.000000    2.000000    0.000000   
max     22.000000    4.000000    4.000000    4.000000    4.000000    3.000000   

           famrel    freetime       goout        Dalc        Walc      health  \
count  395.000000  395.000000  395.000000  395.000000  395.000000  395.000000   
mean     3.944304    3.235443    3.108861    1.481013    2.291139    3.554430   
std      0.896659    0.9988

In [83]:
data.columns

Index(['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu', 'Fedu',
       'Mjob', 'Fjob', 'reason', 'guardian', 'traveltime', 'studytime',
       'failures', 'schoolsup', 'famsup', 'paid', 'activities', 'nursery',
       'higher', 'internet', 'romantic', 'famrel', 'freetime', 'goout', 'Dalc',
       'Walc', 'health', 'absences', 'G1', 'G2', 'G3'],
      dtype='object')

The dataset that I am using for the task of students grade prediction is based on the achievements of the students of the Portuguese schools. In this dataset the G1 represents the grades of the first period, G2 represents the grades of the second period, and G3 represents the final grades. Now let’s prepare the data and let’s see how we can predict the final grades of the students

## Feature Selection

In [84]:
data = data[["G1", "G2", "G3", "studytime", "failures", "absences"]]
predict = "G3"
featute = np.array(data.drop([predict], 1))
target = np.array(data[predict])


In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.



In the above code, I first selected the necessary columns that we need to train a machine learning model for the task of student grades prediction. Then I declared that the G3 column is our target label and then I split the dataset into 20% testing and 80% training. Now let’s see how to train a linear regression model for the task of student grades prediction

## Spliting Data

In [85]:
from sklearn.model_selection import train_test_split
xtest, xtrain, ytest, ytrain = train_test_split(featute, target, test_size=0.2)


## Model Selection

In [86]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(xtrain, ytrain)


## Model accuracy

In [87]:
accuracy = model.score(xtest, ytest)
print(accuracy)


0.8000650499638657


The linear regression model gave an accuracy of about 80% which is not bad in this task. Now let’s have a look at the predictions made by the students’ grade prediction model

## Model Predictions

In [89]:
predictions = model.predict(xtest)
for i in range(len(predictions)):
    print(predictions[featute], xtest[featute], [ytest[featute]])


[[ 9.93946911 18.30489836 19.26647124 13.95937577 18.30489836]
 [ 9.93946911  9.93946911 19.26647124 13.95937577 15.22365599]
 [10.22878063  7.00303904 19.26647124 15.05323604 18.12038462]
 ...
 [18.12038462  7.00303904  7.93046838 15.05323604 15.05323604]
 [ 8.81109923 19.24939947  7.93046838 13.95937577 13.95937577]
 [ 7.00303904  9.06785493  7.93046838 13.95937577  9.93946911]] [[[ 8 10  1  0  4]
  [18 18  1  0  8]
  [18 19  1  0 10]
  [15 14  3  2  4]
  [18 18  1  0  8]]

 [[ 8 10  1  0  4]
  [ 8 10  1  0  4]
  [18 19  1  0 10]
  [15 14  3  2  4]
  [16 15  2  0 10]]

 [[13 10  2  1 22]
  [ 8  7  2  0  6]
  [18 19  1  0 10]
  [14 15  2  0  4]
  [18 18  3  0  5]]

 ...

 [[18 18  3  0  5]
  [ 8  7  2  0  6]
  [ 8  8  2  0  0]
  [14 15  2  0  4]
  [14 15  2  0  4]]

 [[ 7  9  2  0  0]
  [18 19  1  0  6]
  [ 8  8  2  0  0]
  [15 14  3  2  4]
  [15 14  3  2  4]]

 [[ 8  7  2  0  6]
  [ 8  9  1  1 38]
  [ 8  8  2  0  0]
  [15 14  3  2  4]
  [ 8 10  1  0  4]]] [array([[10, 18, 19, 15, 18]

## Summary

So this is how you can train a linear regression model for the task of students grade prediction with machine learning using Python. You can do a lot more with this dataset, you can find the complete information about this dataset from here. I hope you liked this article on Students Grade prediction with machine learning using Python. Feel free to ask your valuable questions in the comments section below.

# Sheikh Rasel Ahmed

#### Data Science || Machine Learning || Deep Learning || Artificial Intelligence Enthusiast

#### LinkedIn - https://www.linkedin.com/in/shekhnirob1

#### GitHub - https://github.com/Rasel1435

#### Behance - https://www.behance.net/Shekhrasel2513