# **Prediction using Supervised Machine Learning**

Problem Statement: Use ‘SML’ on dataset ‘student_score’ and predict the percentage of student based on the number of study hours.


![image.png](attachment:a997ee6a-4799-4dd5-b774-666e2936e231.png)

In [1]:
import pandas as pd
import numpy as np  
import matplotlib.pyplot as plt  
%matplotlib inline

In [2]:
data = pd.read_csv("../input/scores/student_scores - student_scores.csv")
print(data) 

**Plotting data to find the behaviour and to determine the variable dependency**

In [3]:
data.plot(x='Hours', y='Scores', style='.')  
plt.title('Hours vs Percentage')  
plt.xlabel('Hours Studied')  
plt.ylabel('Percentage Score')  
plt.show()

In [4]:
X = data.iloc[:, :-1].values  
y = data.iloc[:, 1].values

In [5]:
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) 

**Applying Logistic Regression and checking Mean absolute error**

In [6]:
from sklearn.linear_model import LogisticRegression
logisticRegr = LogisticRegression(solver='liblinear',random_state=0)

In [7]:
logisticRegr.fit(X_train, y_train) 
print("Model Fitted")

In [8]:
print(X_test) # Testing data - In Hours
y_pred = logisticRegr.predict(X_test) # Predicting the scores

In [9]:
# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})  
df 

In [10]:
from sklearn import metrics  
print('Mean Absolute Error:', 
      metrics.mean_absolute_error(y_test, y_pred)) 

**Applying Random Forest Decision Tree and checking Mean absolute error**

In [11]:
#Trying Random forest
from sklearn.ensemble  import RandomForestRegressor
rf = RandomForestRegressor(random_state=0)


In [12]:
rf.fit(X_train,y_train)

In [13]:
pred=rf.predict(X_test)
# Comparing Actual vs Predicted
rf_df = pd.DataFrame({'Actual': y_test, 'Predicted': pred})  
rf_df 

In [14]:
 print('Mean Absolute Error:', 
      metrics.mean_absolute_error(y_test, pred)) 

**Applying Linear Regression and checking Mean absolute error**

In [15]:
#trying Linear regression model
from sklearn.linear_model import LinearRegression
linearRegr = LinearRegression()
linearRegr.fit(X_train,y_train)

In [16]:
lin_Pred=linearRegr.predict(X_test)
# Comparing Actual vs Predicted
ln_df = pd.DataFrame({'Actual': y_test, 'Predicted': lin_Pred})  
ln_df 

**The Linear regression model is closest to the actual values.**

In [17]:
 print('Mean Absolute Error:', 
      metrics.mean_absolute_error(y_test, lin_Pred)) 

**As determined in our actual and predicted values scores, there is a slight error in values indicated in Mean Absolute error.
These typically need to be minimised as much as possible, though with our given dataset, this is a good result.**

# Q) What will be the predicted score if a student studies 9.25 hours/day?

In [18]:
# Testing for custom input value
hours = 9.25
own_pred = linearRegr.predict([[9.25]])
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred[0]))

# A) If a student studies for 9.25 hours, their score will be 93.6