In graduate studies, many students find it difficult to achieve good grades because they do not get much support in higher education courses compared to the support that students receive in schools. We can use machine learning for the student grades prediction task so that instructors can help students prepare for topics where student grades were predicted low. 

In this file, we will walk through student grades prediction with Machine Learning using Python.

### Student Grades Prediction

Universities are very prestigious places to access higher education. But the amount of fees universities charge today never equals the support they give to students. Some students need a lot of attention from instructors because if special attention will not be given to those students who are not getting good grades, it could be detrimental to their emotional state and their career in the long run.

By using machine learning algorithms, we can predict how well the students are going to perform so that we can help the students whose grades are predicted low. 

Student Grades Prediction is based on the problem of regression in machine learning.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.utils import shuffle

In [2]:
data = pd.read_csv("student-mat.csv")
data.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10


The dataset that we are using is based on the achievements of the students of the Portuguese schools. In this dataset;
* `G1` represents the grades of the first period, 
* `G2` represents the grades of the second period, and
* `G3` represents the final grades. 

Now let’s prepare the data and predict the final grades of the students:

In [3]:
data = data[["G1", "G2", "G3", "studytime", "failures", "absences"]]
predict = "G3"
x = np.array(data.drop([predict], 1))
y = np.array(data[predict])

In [4]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2)

In the above code, we first selected the necessary columns that we need to train a machine learning model. Then we declared that the G3 column is our target label and then we split the dataset into 20% testing and 80% training. 

In [5]:
linear_regression = LinearRegression()
linear_regression.fit(xtrain, ytrain)
accuracy = linear_regression.score(xtest, ytest)
accuracy

0.810773184285705

The linear regression model gave an accuracy of about 81% which is not bad. Now let’s have a look at the predictions made by the students’ grade prediction model:

In [6]:
predictions = linear_regression.predict(xtest)
for i in range(len(predictions)):
    print(predictions[x], xtest[x], [ytest[x]])

[[-0.56238665 10.719633   14.67446164  7.64586747 10.719633  ]
 [-0.56238665 -0.56238665 14.67446164  7.64586747  7.48546351]
 [13.80978087 15.23626029 14.67446164 15.57929191 11.83397205]
 ...
 [11.83397205 15.23626029 18.51599366 15.57929191 15.57929191]
 [10.63493544  8.35522868 18.51599366  7.64586747  7.64586747]
 [15.23626029  9.5921226  18.51599366  7.64586747 -0.56238665]] [[[10  0  2  0  0]
  [11 11  2  0  4]
  [13 15  3  0  0]
  [11  8  2  0  2]
  [11 11  2  0  4]]

 [[10  0  2  0  0]
  [10  0  2  0  0]
  [13 15  3  0  0]
  [11  8  2  0  2]
  [ 7  8  2  0 12]]

 [[13 14  3  0  4]
  [14 15  1  0  3]
  [13 15  3  0  0]
  [17 15  1  0  2]
  [14 12  4  0  6]]

 ...

 [[14 12  4  0  6]
  [14 15  1  0  3]
  [17 18  1  0  0]
  [17 15  1  0  2]
  [17 15  1  0  2]]

 [[13 11  2  1  3]
  [10  9  3  0  2]
  [17 18  1  0  0]
  [11  8  2  0  2]
  [11  8  2  0  2]]

 [[14 15  1  0  3]
  [10 10  2  0  4]
  [17 18  1  0  0]
  [11  8  2  0  2]
  [10  0  2  0  0]]] [array([[ 0, 11, 15,  8, 11]

[[-0.56238665 10.719633   14.67446164  7.64586747 10.719633  ]
 [-0.56238665 -0.56238665 14.67446164  7.64586747  7.48546351]
 [13.80978087 15.23626029 14.67446164 15.57929191 11.83397205]
 ...
 [11.83397205 15.23626029 18.51599366 15.57929191 15.57929191]
 [10.63493544  8.35522868 18.51599366  7.64586747  7.64586747]
 [15.23626029  9.5921226  18.51599366  7.64586747 -0.56238665]] [[[10  0  2  0  0]
  [11 11  2  0  4]
  [13 15  3  0  0]
  [11  8  2  0  2]
  [11 11  2  0  4]]

 [[10  0  2  0  0]
  [10  0  2  0  0]
  [13 15  3  0  0]
  [11  8  2  0  2]
  [ 7  8  2  0 12]]

 [[13 14  3  0  4]
  [14 15  1  0  3]
  [13 15  3  0  0]
  [17 15  1  0  2]
  [14 12  4  0  6]]

 ...

 [[14 12  4  0  6]
  [14 15  1  0  3]
  [17 18  1  0  0]
  [17 15  1  0  2]
  [17 15  1  0  2]]

 [[13 11  2  1  3]
  [10  9  3  0  2]
  [17 18  1  0  0]
  [11  8  2  0  2]
  [11  8  2  0  2]]

 [[14 15  1  0  3]
  [10 10  2  0  4]
  [17 18  1  0  0]
  [11  8  2  0  2]
  [10  0  2  0  0]]] [array([[ 0, 11, 15,  8, 11]

This is how we can train a linear regression model. We can do a lot more with this dataset. We can find the complete information about this dataset from [here](https://archive.ics.uci.edu/ml/datasets/Student+Performance).