<a href="https://www.kaggle.com/code/iamarunkumar/2-simple-linear-regression?scriptVersionId=178268506" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Regression

Regression is the branch of machine learning that aims to predict some continous real numbers like salary or temperature or any kind of continous numerical values.

# Simple Linear Regression

![Screenshot 2024-05-17 235648.png](attachment:93f77bbe-8695-40c4-9dc6-ae76b9d4c294.png)

Here, 
**y hat --> dependent variable**

**b0 --> y-intercept (constant)** 

**b1 --> slope coefficient** and 

**X1 --> independent variable**

This is the simplest machine learning model that we could ever build. In this, we have only **one independent variable and one feature and one continous real value to predict**. These are linear datasets or datasets with linear correlation which will provide accurate predictions.

**Multiple linear regression** is same like simple linear regression except this time we have many features.

**Polynomial regression** on the other hand, has non-linear datasets or datasets with non-linear correlations as opposed to previous ML models.

**Support vector regression** is another kind of non-linear datasets with non-linear correlations.

**Decision tree & random forest regression** which provides an alternative to predict non-linear datasets.

We have a dataset with 2 columns. Yrs of experience and Salary. Here Yrs of experience is the continous value or the feature and the salary is the dependent variable which we want to predict.

Our **goal using simple linear regression** model is to train with the current data (the correlations between Yrs of experience and salary) and in future if some yrs of experience of an employee is given, it has to **predict the salary of the employee against the yrs of experience.**

**We follow below steps in simple linear regression model.**
1. Importing the libraries
2. Importing the datasets
3. Splitting the dataset into training set and test set
4. Training the simple linear regression model on dataset
5. Predicting the test set results
6. Visualizing the training set results
7. Visualizing the test set results

# Importing the libraries

In [None]:
# Importing the required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Importing the datasets

In [None]:
# Importing the dataset

df = pd.read_csv('/kaggle/input/simp-linear-reg-dataset/Salary_Data.csv')

# Get the two entities matrix of features and dependent variable vector
# Create matrix of features

X = df.iloc[:,:-1].values

# Create a dependent variable vector
y = df.iloc[:,-1].values

# Splitting the dataset into training set and test set

In [None]:
# Splitting the dataset into training set and test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=0)

# Training the simple linear regression model in dataset

We need to import a class called **Linear Regression** from module called **linear model** in **sklearn library**

In [None]:
# Training the simple linear regression model.
# Import the LinearRegression() class from linear model module in sklearn library

from sklearn.linear_model import LinearRegression

# Creating an instance of the class and here we don't need to pass any arguments
regressor = LinearRegression()

"""Fit function is the function that connects the object to the training set. The method that we are going to train the
regression model is fit method. Here the fit method expects the X_train and y_train as parameters."""

regressor.fit(X_train,y_train)

# Predicting the test set results

In [None]:
"""We predict the test set using a function called predict(). Predict method expects an argument of X_test of only features
which means the 'Yrs of experience' column only and not the dependent variable (salary).

Actually the predict method will predict the salary against the Yrs of experience that we are
passing as parameter(6 values).

The output of the predict function will contain the few vectors or vector of salaries against the experience.
All these vectors are saved in variable called 'y_pred' which contains the predicted salary whereas 'y_test'
contains original salaries."""
y_pred = regressor.predict(X_test)

# Visualizing the training set

In [None]:
"""We visualize or plot the real salaries compared with predicted salaries using the matplotlib 2D scatter plot.
We use red points for real salaries and blue straight lines for predicted salaries

We pass the argument in scatter() as 'X_train' & 'y_train' because we need to pass X-coordinate and y-coordinate
also we are visualizing the training set. hence, X_train & y_train."""


plt.scatter(X_train,y_train,color='red')

"""Next, we need to plot the regression line. It's the line of prediction (regression) that comes as close to real salaries
(original values).
For that, we use plot() method to plot the curve of a function or the regression line. here, since it's linear it will
be straight line.

Now very careful area - In plot() we need to pass 2 coorniates(X & y). Since we are in 'training' set visualization,
we pass X_train for x-coordinate and y-coordinate is actually not created yet. The vector containing the
predicted salary of training set is not yet created. But we have created y_pred which is containing the predicted salary
of test set. So, for y-corodinate, since we don't have the vectors for predicted salaries of x-train, we need to pass the
predicted values of 'no. yrs of experience'. This is nothing but calling the predict method on
X_train ('no.of yrs of experience') of employees in training set will get me exactly the predicted salaries of trainig
set."""


plt.plot(X_train, regressor.predict(X_train), color = 'blue')

# Adding title as 'Salary vs Experience'
plt.title('Salary vs Experience(Training set)')

# Adding name or label to x-asxis & y-axis
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

# To show the graph, we use show() function
plt.show()

# Visualizing the test set result

In [None]:
# Let's visualize the test set result
plt.scatter(X_test,y_test,color='red')

"""Here, we don't need to modify the arguments for plot as X_test. since, the predicted salaries of the test set will 
be on the regression line same as the training set"""

plt.plot(X_train,regressor.predict(X_train),color='blue')
plt.title('Salary vs Experience(Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

# To show the graph, we use show() function
plt.show()