<a href="https://colab.research.google.com/github/besherh/Machine-Learning-Course/blob/master/Regression/Simple_Linear_Regression_With_scikit_learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

Think of linear regression like drawing a "line of best fit" through a scatter plot of points. Just like how a trendline can show if ice cream sales go up with temperature, linear regression helps us understand and predict relationships between variables.

### Step 1: Importing the Required Tools





In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression



Think of this like gathering your tools before starting a home improvement project:


- `numpy` is like your toolbox - it contains all the basic tools for working with numbers and arrays

- `LinearRegression` is like your specialized tool (think of it as a smart level that not only shows if something is straight but also tells you exactly how to make it straight)


### Step 2: Preparing Your Data

This is like organizing your data into two columns in a spreadsheet:

- `x` represents your independent variable (like hours studied)

- `y` represents your dependent variable (like test scores)

The `.reshape((-1, 1))` part is crucial because scikit-learn expects data in a specific format. Think of it like this:

- A 1D array is like a single row of numbers: `[1, 2, 3]`

- A 2D array is like a spreadsheet with rows and columns: `[[1], [2], [3]]`

- The `-1` tells numpy "figure out how many rows I need automatically"

- The `1` means "I want one column"

In [2]:
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])


Now, you have two arrays: the input x and output y. You should call .reshape() on x because this array is required to be two-dimensional, or to be more precise, to have one column and as many rows as necessary. That’s exactly what the argument (-1, 1) of .reshape() specifies.

This is how x and y look now:



In [3]:
print(x)

[[ 5]
 [15]
 [25]
 [35]
 [45]
 [55]]


In [4]:
print(y)

[ 5 20 14 32 22 38]



### Step 3: Creating and Fitting the Model


In [5]:
model = LinearRegression()
model.fit(x, y)

This is like teaching a student (our model) by showing examples:

- Creating the model (model = LinearRegression()) is like getting a blank notebook ready

- Fitting the model (.fit(x, y)) is like showing the student many examples of input-output pairs so they can learn the pattern

The LinearRegression() parameters have important meanings:
- fit_intercept=True: Allows the line to start at any point on the y-axis (imagine a line that doesn't have to go through (0,0))

- normalize=False: Decides whether to scale your data first (like converting feet to meters)

- copy_X=True: Makes a safety copy of your data (like backing up your work)

With .fit(), you calculate the optimal values of the weights 𝑏₀ and 𝑏₁, using the existing input and output (x and y) as the arguments. In other words, .fit() fits the model. It returns self, which is the variable model itself. That’s why you can replace the last two statements with this one:



In [7]:
model = LinearRegression().fit(x, y)


This statement does the same thing as the previous two. It’s just shorter.

### Step 4: Get results

This is like grading how well your model learned:
- The R² score (coefficient of determination) is like a grade from 0 to 1
    - 1 means perfect predictions
    - 0 means the model failed to learn anything useful

- The intercept is where your line crosses the y-axis (like your starting point)
- The slope tells you how much y changes when x increases by 1 (like the rate of improvement)



In [8]:
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)

coefficient of determination: 0.7158756137479542


When you’re applying .score(), the arguments are also the predictor x and regressor y, and the return value is 𝑅².

The attributes of model are .intercept_, which represents the coefficient, 𝑏₀ and .coef_, which represents 𝑏₁:



In [9]:
print('intercept:', model.intercept_)

intercept: 5.633333333333329


In [10]:
print('slope:', model.coef_)

slope: [0.54]


print('slope:', model.coef_)

The value 𝑏₀ = 5.63 (approximately) illustrates that your model predicts the response 5.63 when 𝑥 is zero. The value 𝑏₁ = 0.54 means that the predicted response rises by 0.54 when 𝑥 is increased by one.

### Step 5: Predict response

- Think of it like a recipe: once you know the relationship (recipe), you can predict the outcome (dish) for any input (ingredients)

- The model uses the formula: y = mx + b
    - Where m is the slope (model.coef_)
    - And b is the intercept (model.intercept_)

#### Real-world Example
Imagine you're predicting house prices:
- x could be house size in square feet
- y could be house price in dollars
- The slope would tell you how much more you pay for each additional square foot
- The intercept would be like the base price for a theoretical 0 square foot house
- R² would tell you how reliable these predictions are

This makes linear regression a powerful tool for understanding relationships in data and making predictions based on those relationships.



In [11]:
y_pred = model.predict(x)
print('predicted response:', y_pred, sep='\n')

predicted response:
[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


When applying .predict(), you pass the regressor as the argument and get the corresponding predicted response.



In [16]:
print(y)

[ 5 20 14 32 22 38]


In [17]:
print(y_pred)

[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]
