## **Aim: Predict the percentage of an student based on the no. of study hours Linear Regression with Python Scikit Learn**
In this section we will see how the Python Scikit-Learn library for machine learning can be used to implement regression functions. We will start with simple linear regression involving two variables.


### **Simple Linear Regression**
In this regression task we will predict the percentage of marks that a student is expected to score based upon the number of hours they studied. This is a simple linear regression task as it involves just two variables.

##  **Author: Chirag Bansal**
#### **_University of Petroleum and Energy Studies_**

In [None]:
#Importing all necessary libraries required.
import pandas as pd
import numpy as np  
import matplotlib.pyplot as plt  
import seaborn as sns
%matplotlib inline

In [None]:
# Reading data from remote link
link = "http://bit.ly/w-data"
file = pd.read_csv(link)
print("Data imported successfully")
file

In [None]:
#check the shape of the data
file.shape

In [None]:
file.isnull().sum()

In [None]:
file.dropna

In [None]:
file.drop_duplicates

In [None]:
file.head()

In [None]:
file.tail()

In [None]:
file.info()

In [None]:
file.describe()

# **Let's plot our data points on 2-D graph(Scattered graph) to see our dataset and see if we can manually find any relationship between the Hours and Scores**.
## *We can create the plot with the following script:*

In [None]:
# Plotting the distribution of scores
ax=sns.scatterplot(x=file['Hours'], y=file['Scores'])  
plt.title('Hours vs Percentage')  
plt.xlabel('Hours Studied')  
plt.ylabel('Percentage Score')  
plt.show()

In [None]:
#checking the names of the columns
file.columns

In [None]:
sns.heatmap(file.corr(),annot=True,linewidths=4,cmap='BuPu')

# Now the code for Machine Learning Starts

In [None]:
#slicing the values
X = file.iloc[:, :-1].values #First Column which is hours : features  
y = file.iloc[:, 1].values  #Second Column which is scores : labels

## Importing the Libraries for Machine Learning

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

In [None]:
# Splitting the data and training and evaluating it
x_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.25, random_state=0) 

In [None]:
regressor = LinearRegression()
regressor.fit(x_train,y_train)

print('Training Completed')

In [None]:
# Plotting the regression line
line = regressor.coef_*X+regressor.intercept_

# Plotting for the test data
plt.title('Linear Regression line')
plt.xlabel('Hours Studied')
plt.ylabel('percentage Scored by the Students')
plt.scatter(X, y)
plt.plot(X, line);
plt.show()

# Predicting the values from the Model

In [None]:
y_pred = regressor.predict(X_test) # Predicting the scores
print(X_test)#predicting the value in hours
print(y_pred) 

In [None]:
# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})  
df 

# Predcting the Score value for 9.25 Hours per day

In [None]:
# Checking the output by putting random hours of our choice
n=float(input('Enter number of hours:'))
calc = regressor.predict([[n]])
print("No of Hours = {n}")
print("Predicted Score = {}".format(calc[0]))

# Calculating Accuracy and Errors

In [None]:
print('Mean Absolute Error:{}'.format(mean_absolute_error(y_test,y_pred)))
print('Mean Squared Error:{}'.format(mean_squared_error(y_test,y_pred)))
print('Root Mean squared Error:{}'.format(np.sqrt(mean_squared_error(y_test,y_pred))))
print('R2 Score is:{}'.format(r2_score(y_test,y_pred)))

# *Thank You*