# Prediction using Supervised ML (Level - Beginner)

## Predict the percentage of an student based on the no. of study hours.

## This is a simple linear regression task as it involves just 2 variables.

### Pandas :

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

### Numpy :

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

### lux :

Lux is a Python library that makes data science easier by automating certain aspects of the data exploration process. Lux is designed to facilitate faster experimentation with data, even when the user does not have a clear idea of what they are looking for. Lux is integrated with an interactive Jupyter widget that allows users to quickly browse through large collections of data directly within their Jupyter notebooks.

### Matplotlib: 

Matplotlib is a plotting library. Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

### scikit-learn :

Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.

#### Step:1--> Importing all libraries which required in this notebook

In [None]:
import pandas as pd

import numpy as np

import lux

import matplotlib.pyplot as plt  
%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

#### Step:2--> Reading data from csv file and visualization

In [None]:
path = r"https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv"
data = pd.read_csv(path)

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
data

##### From this graph, we can say that there is a positive linear relation between the number of hours studied and percentage of score.

#### Step :3--> Preparing the data
>The next step is to divide the data into attributes i.e. inputs and labels i.e. outputs and spliting that data into training and test sets.

In [None]:
data.columns

In [None]:
data.shape

In [None]:
train, test = train_test_split(data,test_size=0.25,random_state=123)

In [None]:
train.shape

In [None]:
test.shape

In [None]:
train_x=train.drop("Scores",axis=1)
train_y=train["Scores"]

In [None]:
test_x=test.drop("Scores",axis=1)
test_y=test["Scores"]

#### Step:4-->Training the Algorithm
>We have split our data into training and testing sets, and now is finally the time to train our algorithm.

In [None]:
lnr = LinearRegression()

In [None]:
lnr.fit(train_x,train_y)

In [None]:
lnr.coef_

In [None]:
lnr.intercept_

In [None]:
# Plotting the regression line # formula for line is y=m*x + c
line = lnr.coef_*train_x+lnr.intercept_

# Plotting for the test data
plt.scatter(train_x,train_y)
plt.plot(train_x, line);
plt.show()

#### Step:5-->Making Predictions
> we are done with training model, now we have to do predictions

In [None]:
pr = lnr.predict(test_x)

In [None]:
list(zip(test_y,pr))

#### Step:6--> Evaluating the model
> The final step is to evaluate the performance of algorithm. This step is particularly important to compare how well different algorithms perform on a particular dataset. For simplicity here, we have chosen the mean square error. There are many such metrics.

In [None]:
from sklearn.metrics import mean_squared_error

In [None]:
mean_squared_error(test_y,pr,squared=False)

#### Step:7--> Solution

In [None]:
hour = [9.25]
own_pr = lnr.predict([hour])
print("No of Hours = {}".format([hour]))
print("Predicted Score = {}".format(own_pr[0]))

# Predicted Score = 91.407589223163

## Conclusion : Hence , Predicted successfully the percentage of an student based on the no. of study hours.