# Machine Learning
### Textbook is available at: [https://www.github.com/a-mhamdi/isetbz](https://www.github.com/a-mhamdi/isetbz)

---

Linear regression is a statistical method used to model the linear relationship between a dependent variable and one or more independent variables. Simple linear regression is a type of linear regression that involves only one independent variable.

### Simple Linear Regression

The goal is to find the line of best fit that represents the relationship between the independent variable (also known as the predictor or explanatory variable) and the dependent variable (aka the response or outcome variable). The line of best fit is a line that is as close as possible to the data points in the scatterplot of the variables.

**Importing the libraries**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
%matplotlib notebook

`%matplotlib notebook` is a magic command in _Jupyter_ Notebooks which enables interactive features such as panning, zooming of plots, as well as the ability to hover over data point to display their values.

In [3]:
np.set_printoptions(precision=2)

In [4]:
plt.style.use("ggplot")

**Importing the dataset**

In [5]:
df = pd.read_csv("../Datasets/Salary_Data.csv")

In [6]:
df.head()

Unnamed: 0,YearsExperience,Salary
0,1.1,39343.0
1,1.3,46205.0
2,1.5,37731.0
3,2.0,43525.0
4,2.2,39891.0


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   YearsExperience  30 non-null     float64
 1   Salary           30 non-null     float64
dtypes: float64(2)
memory usage: 608.0 bytes


In [8]:
df.describe()

Unnamed: 0,YearsExperience,Salary
count,30.0,30.0
mean,5.313333,76003.0
std,2.837888,27414.429785
min,1.1,37731.0
25%,3.2,56720.75
50%,4.7,65237.0
75%,7.7,100544.75
max,10.5,122391.0


In [9]:
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

**Splitting the dataset into training set and test set**

In [10]:
from sklearn.model_selection import train_test_split

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=123)

**Training the Simple Linear Regression model on the Training set**

In [12]:
from sklearn.linear_model import LinearRegression

In [13]:
lr = LinearRegression()
lr.fit(X_train, y_train)

**Predicting the Test set results**

In [14]:
y_pred = lr.predict(X_test)

**Visualizing the Test set results**

We can also use _Matplotlib_ to visualize the line of best fit and the data points in a `scatterplot`. This will create a scatterplot of the data with the line of best fit plotted on top of it.

In [15]:
plt.scatter(X_test, y_test, color="blue")
plt.plot(X_test, y_pred, color="red")
plt.title("Test set")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.grid()

<IPython.core.display.Javascript object>