# Linear Regression 




## Simple Linear Regression

<a data-flickr-embed="true" href="https://www.flickr.com/photos/200947226@N07/53819385178/in/dateposted-public/" title="Dataset illustrating height and corresponding weight of random people ">
  <img src="https://live.staticflickr.com/65535/53819385178_9db6c6fb5a_z.jpg" width="400" height="250" alt="b"/>
</a>
<script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

figure 1 : Dataset of height and corresponding weight of random people 


### Problem : How can we create a Machine Learning Model   for data in figure 1 , that  predict the weight , provided the height ?

On  finding the best fit line we can predict the value of weight for new value  of height.

Here we try to fit a simple line to this data.

The simple linear equation is:

 $$\mathbf{y} = m\mathbf{x} + c $$


where, we need to estimate the parameters,  y -intercept($c$) and slope($m$). 

Simple Linear regression is a procedure to find the value of c and m which best fit the given data. 




Let's see  a  simple linear regression performed on scatter plot of _height_ Vs. _weight_.


<a data-flickr-embed="true" href="https://www.flickr.com/photos/200947226@N07/53819384673/in/dateposted-public/" title="a"><img src="https://live.staticflickr.com/65535/53819384673_c792870188_z.jpg" width="400" height="250" alt="a"/></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

figure 2 : Best fit line plotted over the data of figure 1.

(We will  later go in detail regarding the procedure of finding the best fit  line )


The red line is a simple linear regression line with output $\mathbf{y}$ as `weight` and $\mathbf{x}$ as `height`. 

The  error, $\epsilon$ is the difference between the actual value, $y_i$, and predicted value, $\hat{y_i}$.

The actual output data point, which  are  the blue dots ,  the predicted value is projection of blue dots into red regression line . 

Error for each output data point is shown by the vertical distance from the actual output data point to the predicted point on a regression line.

The predicted output value is:

$$\hat{y_i} = mx_i+ c$$

The actual output value is:

$$y_i =  mx_i+ c  + \epsilon_i$$

Where $\epsilon_i$ is a  error. 
The error $\epsilon_i$ as ($y_{i}-\hat{y_{i}}$) can either be positive or negative or even 0 sometimes. 

We can see in the figure, error  represented by vertical lines are on either side of the regression line. 

We square each error and sum them, called _Sum of Squared Errors_.

$$\text{Sum of Squared Errors (SSE)} = \sum_{i=1}^{n}(y_{i}-\hat{y_{i}})^2$$

The summation is indexed from $1$ to $n$, since we have $n$ samples. 
Change in $m$ and $c$ causes change in Sum of Squared Errors. 

The main principle  is that we should end up choosing intercept ($m$) and slope ($$) such that the overall sum is minimum.

Sum of Squared Errors (SSE) can also be written as:

$$\text{SSE} = \sum_{i=1}^{n}(y_{i}-\hat{y_{i}})^2 =\sum_{i=1}^{n}(y_{i}-(c+m_1x_i))^2 $$

Here, $\hat{y_i}$ is replaced with the simple linear regression model equation,  i.e $\hat{y_i}$ =  $m*x+c$ .


Since $\text{SSE}$ is a squared term, it is always positive.
On plotting  the  value of $c$ ,$m$ in x and y axis and  evaluating corresponding SSE in z axis ,the 3D graph  would be a convex structure  facing upwards. 






### fig : Visualization of change in value of SSE for various combination of m and c . 
##


In [None]:
import numpy as np
import plotly.graph_objects as go

# Generate synthetic data
np.random.seed(42)
heights = np.linspace(150, 210, 20)
X = heights - heights.mean()
true_slope = 0.5
true_intercept = 10
noise = np.random.uniform(-5, 5, size=X.shape)
y = true_slope * heights + true_intercept + noise

# Compute best-fit slope and intercept
m_best = np.sum((X - X.mean()) * (y - y.mean())) / np.sum((X - X.mean())**2)
c_best = y.mean() - m_best * X.mean()

# Meshgrid for slope and intercept
m_vals = np.linspace(m_best - 5, m_best + 5, 100)
c_vals = np.linspace(c_best - 100, c_best + 100, 100)
M, C = np.meshgrid(m_vals, c_vals)

# Compute SSE surface
SSE = np.zeros_like(M)
for i in range(M.shape[0]):
    for j in range(M.shape[1]):
        predictions = M[i, j] * X + C[i, j]
        SSE[i, j] = np.sum((y - predictions) ** 2)

# Main surface plot
fig = go.Figure()

fig.add_trace(go.Surface(
    x=M, y=C, z=SSE,
    colorscale='Viridis', opacity=0.9,
    name='SSE Surface'
))

# Layout
fig.update_layout(
    title='SSE Surface with Perpendicular Derivative Planes',
    scene=dict(
        xaxis_title='Slope (m)',
        yaxis_title='Intercept (c)',
        zaxis_title='SSE'
    ),
    width=900,
    height=700
)

fig.show()




The parameters at a minimum point are obtained from calculus using Gradient Decent . 
Gradient Decent is really cool concept in the AI domain till now.

Before diving into Gradient Descent , we need to have clear idea regarding partial derivative. 

#### Partial Derivative 
Partial derivative exist for function which have 2 or more variables.

For function (SSE)  $\text{SSE} =\sum_{i=1}^{n}(y_{i}-(c+m_1x_i))^2$, there are two variables $m$ and $c$ . 

There can exist the derivative of $\text{SSE}$ with respect to $m$ and $c$ .

The derivative of $\text{SSE}$ with respect to $m$ , while $c$ remaining constant is known as partial derivative of $\text{SSE}$ with respect to   $m$. Denoted by : $\frac{\partial }{\partial m }\sum(y_i-(m*x_i+c))^2$

Similarly , the derivative of $\text{SSE}$ with respect to $c$ , while $m$ remaining
constant is known as partial derivative of $\text{SSE}$ with respect to   $c$. Denoted by : $\frac{\partial }{\partial c }\sum(y_i-(m*x_i+c))^2$


Partial derivative of SSE with respect to m is 
    
$\frac{\partial }{\partial m }\sum(y_i-(m *x_i+c))^2$    
= $\sum\frac{\partial }{\partial m}(y_i-(m *x_i+ c ))^2 $   
= $ \sum2(y_i-(m *x_i + c ))(-x_i) $    

Similarly , 
Partial derivative of SSE with respect to c is 
  
$\frac{\partial }{\partial c }\sum(y_i-(m *x_i+c))^2$  
= $\sum\frac{\partial }{\partial c}(y_i-(m *x_i+ c ))^2 $  
= $ \sum2(y_i-(m *x_i + c )) $  




### We can visualize partial derivative in the interactive as follow :

#### On selecting   $\frac{\partial }{\partial c }SSE$     view ; 
-We get a yellow color vertical plane ( parallel to c- axis  and perpendicular to m-axis )  intersecting the SSE-plot .  
-One the region of intersection we can see the white curve.   
-The derivative of this white  curve at the specific point (red dot ) is the partial derivative of SSE with  respect to c . 

#### On selecting   $\frac{\partial }{\partial m  }SSE$     view ; 
-We get a red color vertical plane ( parallel to m- axis  and perpendicular to c-axis )  intersecting the SSE-plot .    
-On the region of intersection we can see the white curve.   
-The derivative of this white curve at the specific point (red dot ) is the partial derivative of SSE with  respect to m . 



In [None]:
import numpy as np
import plotly.graph_objects as go

# Synthetic data
np.random.seed(42)
heights = np.linspace(150, 210, 20)
X = heights - heights.mean()
true_slope = 0.5
true_intercept = 10
noise = np.random.uniform(-5, 5, size=X.shape)
y = true_slope * heights + true_intercept + noise

# Best-fit slope and intercept
m_best = np.sum((X - X.mean()) * (y - y.mean())) / np.sum((X - X.mean())**2)
c_best = y.mean() - m_best * X.mean()

# Grid for main surface
m_vals = np.linspace(m_best - 5, m_best + 5, 100)
c_vals = np.linspace(c_best - 100, c_best + 100, 100)
M, C = np.meshgrid(m_vals, c_vals)

# SSE surface
SSE = np.zeros_like(M)
for i in range(M.shape[0]):
    for j in range(M.shape[1]):
        predictions = M[i, j] * X + C[i, j]
        SSE[i, j] = np.sum((y - predictions) ** 2)

# Points of interest
m_point = (m_best + np.min(m_vals)) / 2
c_point = (c_best + np.min(c_vals)) / 2
predictions_at_point = m_point * X + c_point
sse_point = np.sum((y - predictions_at_point) ** 2)

# ∂SSE/∂m curved slice
m_curve = np.linspace(m_best - 7, m_best + 7, 100)
sse_m_curve = np.array([
    np.sum((y - (m * X + c_point)) ** 2) for m in m_curve
])
z_m = np.tile(sse_m_curve, (2, 1))
x_m = np.tile(m_curve, (2, 1))
y_m = np.full_like(x_m, c_point)
z_m[0, :] = sse_m_curve
z_m[1, :] = np.max(SSE) * 1.05

# ∂SSE/∂c curved slice
c_curve = np.linspace(c_best - 120, c_best + 120, 100)
sse_c_curve = np.array([
    np.sum((y - (m_point * X + c)) ** 2) for c in c_curve
])
z_c = np.tile(sse_c_curve, (2, 1))
y_c = np.tile(c_curve, (2, 1))
x_c = np.full_like(y_c, m_point)
z_c[0, :] = sse_c_curve
z_c[1, :] = np.max(SSE) * 1.05

# Intersection lines on SSE surface
intersection_m = go.Scatter3d(
    x=m_curve, y=[c_point] * len(m_curve), z=sse_m_curve,
    mode='lines', line=dict(color='white', width=8),
    name='∂SSE/∂m Intersection', visible=False
)

intersection_c = go.Scatter3d(
    x=[m_point] * len(c_curve), y=c_curve, z=sse_c_curve,
    mode='lines', line=dict(color='white', width=8),
    name='∂SSE/∂c Intersection', visible=False
)

# Camera views
views = {
    "default": dict(eye=dict(x=-1.5, y=-1.5, z=0.8)),
    "dm_view": dict(eye=dict(x=0, y=-2, z=0.8)),
    "dc_view": dict(eye=dict(x=-2, y=0, z=0.8))
}

# Create figure
fig = go.Figure()

# SSE Surface
fig.add_trace(go.Surface(
    x=M, y=C, z=SSE,
    colorscale='Viridis',
    opacity=0.8,
    name='SSE Surface',
    showscale=False
))

# Base plane
fig.add_trace(go.Surface(
    x=M, y=C, z=np.zeros_like(M),
    colorscale=[[0, 'lightgray'], [1, 'lightgray']],
    opacity=0.3,
    showscale=False,
    name='Parameter Plane'
))

# Points on surface
fig.add_trace(go.Scatter3d(
    x=[m_point], y=[c_point], z=[0],
    mode='markers', marker=dict(size=6, color='black'),
    name='Parameter Point'
))
fig.add_trace(go.Scatter3d(
    x=[m_point], y=[c_point], z=[sse_point],
    mode='markers', marker=dict(size=6, color='red'),
    name='SSE Point'
))
fig.add_trace(go.Scatter3d(
    x=[m_point, m_point], y=[c_point, c_point], z=[0, sse_point],
    mode='lines', line=dict(color='black', width=3, dash='dot'),
    name='Connection'
))

# ∂SSE/∂m plane - vivid red
fig.add_trace(go.Surface(
    x=x_m, y=y_m, z=z_m,
    colorscale=[[0, 'rgba(255,0,0,0.9)'], [1, 'rgba(255,0,0,0.9)']],
    showscale=False,
    name='∂SSE/∂m Plane',
    visible=False,
    surfacecolor=np.ones_like(x_m),
    contours_z=dict(show=True, color='red', width=4)
))

# ∂SSE/∂c plane - vivid yellow
fig.add_trace(go.Surface(
    x=x_c, y=y_c, z=z_c,
    colorscale=[[0, 'rgba(255,255,0,0.9)'], [1, 'rgba(255,255,0,0.9)']],  # vivid yellow
    showscale=False,
    name='∂SSE/∂c Plane',
    visible=False,
    surfacecolor=np.ones_like(y_c),
    contours_z=dict(show=True, color='gold', width=4)
))

# Add intersection lines
fig.add_trace(intersection_m)
fig.add_trace(intersection_c)

# Visibility masks
default_vis = [True, True, True, True, True, False, False, False, False]
dm_vis =     [True, True, True, True, True, True, False, True, False]
dc_vis =     [True, True, True, True, True, False, True, False, True]

# Layout
fig.update_layout(
    title='SSE Surface with Derivative Planes and Intersections',
    scene=dict(
        xaxis_title='Slope (m)',
        yaxis_title='Intercept (c)',
        zaxis_title='SSE',
        camera=views["default"],
        xaxis=dict(range=[min(m_vals)-1, max(m_vals)+1]),
        yaxis=dict(range=[min(c_vals)-10, max(c_vals)+10]),
        zaxis=dict(range=[0, np.max(SSE)*1.2])
    ),
    updatemenus=[
        {
            "buttons": [
                {
                    "args": [{"visible": default_vis}, {"scene.camera": views["default"]}],
                    "label": "Default View",
                    "method": "update"
                },
                {
                    "args": [{"visible": dm_vis}, {"scene.camera": views["dm_view"]}],
                    "label": "∂SSE/∂m View",
                    "method": "update"
                },
                {
                    "args": [{"visible": dc_vis}, {"scene.camera": views["dc_view"]}],
                    "label": "∂SSE/∂c View",
                    "method": "update"
                }
            ],
            "direction": "down",
            "showactive": True,
            "x": 0.1,
            "xanchor": "left",
            "y": 1.15,
            "yanchor": "top"
        }
    ],
    annotations=[
        dict(
            text="Select view:",
            x=0.1, y=1.2,
            xref="paper", yref="paper",
            showarrow=False
        )
    ],
    width=1000,
    height=800
)

fig.show()



### Gradient Descent 

Lets proceed into the working mechanism of gradient descent .

Initially partial derivative is calculated at random initial values of m and c as $m_0$ and $c_0$ respectively .  
i.e At    ($m_0$ , $c_0$)   

$\frac{\partial }{\partial m } SSE$ =$ \sum2(y_i-(m_0*x_i + c_0))(-x_i)$  

$\frac{\partial }{\partial c } SSE$ = $ \sum2(y_i-(m_0*x_i + c_0))$  

There might exist better value of m and c which lead to lower value of SSE. 

Hence ,  the new values of $m$ and $c$ are evaluated by :

$m_1$ = $m_0$ - learning_rate * $\frac{\partial }{\partial m }SSE(m_0, c_0)$ 

$c_1$ = $c_0$ - learning_rate * $\frac{\partial }{\partial c }SSE(m_0, c_0)$ 

Where , learning_rate is the multiplication  factor , which decides how fast we need to change the  value of m and c , to get into optimum value.

In this way we calculate $(m_2 , c_2) , (m_3 , c_3) , .......... and so on .$.  
This operation exist untill we get no significant change at all  in value of SSE on changing the value of m and c . 

The  procedure mentioned above is known as gradient descent. 






In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from matplotlib.gridspec import GridSpec

# Synthetic data
np.random.seed(1)
heights = np.linspace(150, 210, 30)
X = heights - heights.mean()
true_m = 0.5
true_c = 10
noise = np.random.uniform(-5, 5, size=X.shape)
y = true_m * X + true_c + noise

# Create a grid of m and c values
m_vals = np.linspace(true_m - 3, true_m + 3, 100)
c_vals = np.linspace(true_c - 50, true_c + 50, 100)
M, C = np.meshgrid(m_vals, c_vals)
SSE = np.array([
    np.sum((y - (m * X + c))**2)
    for m, c in zip(M.flatten(), C.flatten())
]).reshape(M.shape)

# Animation path (parameter updates)
n_frames = 60
m_path = np.linspace(true_m - 2.5, true_m, n_frames)
c_path = np.linspace(true_c - 40, true_c, n_frames)

# Prepare figure
fig = plt.figure(figsize=(10, 4))
gs = GridSpec(1, 2, width_ratios=[1, 1])
ax_line = fig.add_subplot(gs[0])
ax_sse = fig.add_subplot(gs[1], projection='3d')

# Line fit setup
scatter = ax_line.scatter(X, y, color='black')
fit_line, = ax_line.plot([], [], color='orange', lw=2)
ax_line.set_title("Line Fitting")
ax_line.set_xlim(X.min(), X.max())
ax_line.set_ylim(y.min() - 10, y.max() + 10)

# SSE surface setup with updated color and opacity
ax_sse.plot_surface(M, C, SSE, cmap='viridis', alpha=0.5, edgecolor='none')
sse_dot, = ax_sse.plot([], [], [], 'ro', markersize=6)
line_to_base, = ax_sse.plot([], [], [], 'w--', lw=1)
ax_sse.set_title("SSE Surface")
ax_sse.set_xlabel("m")
ax_sse.set_ylabel("c")
ax_sse.set_zlabel("SSE")
ax_sse.view_init(elev=35, azim=-60)

def update(frame):
    m = m_path[frame]
    c = c_path[frame]
    y_pred = m * X + c
    fit_line.set_data(X, y_pred)

    # SSE value
    sse = np.sum((y - y_pred)**2)
    sse_dot.set_data([m], [c])
    sse_dot.set_3d_properties([sse])
    line_to_base.set_data([m, m], [c, c])
    line_to_base.set_3d_properties([0, sse])
    return fit_line, sse_dot, line_to_base

ani = FuncAnimation(fig, update, frames=n_frames, interval=100, blit=True)

# Display inline HTML5 animation
HTML(ani.to_jshtml())


## Implementation on Real World Dataset

Here is the link to dataset :  * https://drive.google.com/file/d/1xoZ51eaK-NfLfH_0L7UB6AtNlwba_Xdx/view?usp=sharing  
This dataset has one input, height in cm, and one output, weight in kg.  


    

## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib as mp
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error


### Dataset

In [None]:
data_path = "https://drive.google.com/uc?export=download&id=1xoZ51eaK-NfLfH_0L7UB6AtNlwba_Xdx"
# we can also download  the data and mention the directory within own computer as data path 

# Read the CSV data from the link
data_frame = pd.read_csv(data_path)

# Printfirst 5 samples from the DataFrame
data_frame.head()

### Training Simple Linear Regression 

In [None]:

# Extract X and y

X = data_frame.iloc[:, 0].values.reshape(-1, 1)  # Select the first column (X)
y = data_frame.iloc[:, 1]  # Select the second column (y)

# Initialize Linear Regression model
model = LinearRegression()

# Fit the model
model.fit(X, y)

# Print trained parameters
print("Trained slope (m) :", model.coef_)
print("Trained intercept (C):", model.intercept_)

# Make predictions
y_pred = model.predict(X)

# Calculate Mean Squared Error  
# Mean Squared Error = SSE / n ( n is the number of sample data )
mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)  



### Visualization 

In [None]:

# Plot the data and the linear regression line

plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, y_pred, color='red', label='Linear Regression')
plt.xlabel('Height in cm ')
plt.ylabel('Weight in kg  ')
plt.title('Linear Regression Fit: Height vs Weight ')
plt.legend()
plt.show()

### Predict for a query . 

In [None]:
predicted_value = model.predict([[175]])  # we can change the value to get the prediction 
print("Predicted weight for given height is :", predicteder_value, 'kg ' ) 