## Table of Contents

- [Multiple Linear Regression](#Multiple-Linear-Regression)
  - [Formula](#Formula)
  - [Assumptions](#Assumptions)
  - [Types](#Types)
- [Implementation](#Implementation)
- [🏠 Home](../../welcomePage.ipynb)

# Multiple Linear Regression

**Linear Regression** is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The goal of linear tip is to find a linear equation that best predicts the dependent variable from the independent variables.

## Formula

The equation for a linear regression line is:

$$
y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_nx_n + \epsilon
$$

where:
-  𝑦  is the dependent variable,
-  𝛽_0, 𝛽_1, 𝛽_2, ..., 𝛽_n  are the coefficients,
- 𝑋_1, 𝑋_2, ..., 𝑋_n are the independent variables,
- 𝜖 is the error term, which accounts for the variability in 𝑦 not explained by the independent variables.

## Assumptions

Linear regression assumes:
1. **Linearity:** The relationship between the predictors and the response is linear.
2. **No multicollinearity:** Predictors are not perfectly collinear or highly correlated.
3. **Homoscedasticity:** The variance of residual is the same for any value of the predictors.
4. **Independence:** Observations are independent of each other.
5. **Normality:** For any fixed value of X, Y is normally distributed.

## Types

There are two main types of linear regression:
- **Simple Linear Regression:** Only one independent variable is used to predict the dependent variable. It models the relationship between the dependent variable and one independent variable using a linear equation.
- **Multiple Linear Regression:** More than one independent variable is used to predict the dependent variable.


# Implementation

### Import Libraries

**Press ▶ to import libraries.**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

print("Libraries are imported.")

### Import and show Data

**Press ▶ to load data.**

In [None]:
import os
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output

# List all .csv and Excel files in the current directory
supported_extensions = ['.csv', '.xlsx', '.xls']
files = [f for f in os.listdir('./Data') if any(f.endswith(ext) for ext in supported_extensions)]

# Create a dropdown widget
dropdown = widgets.Dropdown(
    options=files,
    description='Files:',
    disabled=False,
)

# Create a button widget
button = widgets.Button(
    description='Select',
    disabled=False,
    button_style='',
    tooltip='Click to select file',
    icon='check'
)

# Output widget to display messages
output = widgets.Output()

# Function to handle button click
def on_button_click(b):
    with output:
        clear_output()
        selected_file = dropdown.value
        global data
        if selected_file.endswith('.csv'):
            data = pd.read_csv("./Data/" +selected_file)
        elif selected_file.endswith(('.xlsx', '.xls')):
            data = pd.read_excel("./Data/" +selected_file)
        print(f"File '{selected_file}' uploaded as data.")

# Attach the function to the button widget
button.on_click(on_button_click)

# Display the dropdown, button widgets, and initial message within the output widget
with output:
    print("Please select a file from the dropdown and click 'Select'.")
display(output)
display(dropdown)
display(button)



**Press ▶ to display the data.**

In [None]:
display(data.head())
print ("The data is composed of ", data.shape[0], " rows and ", data.shape[1], " columns.")

### Data Preprocessing

**Press ▶ to specify the target column.**

In [None]:
import ipywidgets as widgets
import pandas as pd

# Create a Dropdown widget for column selection
dropdown = widgets.Dropdown(
    options=data.columns.tolist(),
    value=data.columns[0],
    description='Select Target Column:',
    disabled=False,
    layout=widgets.Layout(width='500px'),
    style={'description_width': '200px'}
)

# Create a Button widget
button = widgets.Button(
    description='Select',
    button_style='',  # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click to select the target column as the last column',
    icon='check'  # FontAwesome icon names (without 'fa-')
)

# Create an Output widget for displaying messages
output = widgets.Output()

# Function to handle button click that rearranges the DataFrame
def on_button_clicked(b):
    with output:
        output.clear_output()
        global data
        # Get the selected column name
        selected_column = dropdown.value
        # Reorder the DataFrame columns
        new_columns = [col for col in data.columns if col != selected_column] + [selected_column]
        data = data[new_columns]
        print(f"Column '{selected_column}' has been moved to the last position.")

# Link the button click event to the function
button.on_click(on_button_clicked)

# Display the widgets and output
display(widgets.VBox([dropdown, button, output]))


**Press ▶ to create a lagged target column.**

In [None]:
target = data.columns[-1]
data['Target_Lag1'] = data.iloc[:, -1].shift(1)


data.dropna(inplace=True)

### Predict Bead Area

**Press ▶ to specify independent variables and train/test split and to forecast the data.**

In [None]:
# Load your dataset
# Assume `data` is your full dataset loaded as a DataFrame
# data = pd.read_csv('your_dataset.csv')  # Replace with your data loading code

# Define widgets with adjusted layout
index_range_slider = widgets.IntRangeSlider(
    value=[0, min(7000, len(data))],
    min=0,
    max=len(data),
    step=1,
    description='Index Range:',
    layout=widgets.Layout(width='600px'),  # Increase width for better readability
    style={'description_width': '150px'},  # Increase description width
    continuous_update=False
)

feature_select = widgets.SelectMultiple(
    options=tuple(col for col in data.columns if col != target),
    value=tuple(col for col in data.columns if col != target),
    description='Features:',
    layout=widgets.Layout(width='600px', height='180px'),  # Increase width and height
    style={'description_width': '150px'},  # Increase description width
    disabled=False
)

train_size_slider = widgets.IntSlider(
    value=80,
    min=50,
    max=95,
    step=1,
    description='Train %:',
    layout=widgets.Layout(width='600px'),  # Increase width
    style={'description_width': '150px'},  # Increase description width
    continuous_update=False
)

apply_button = widgets.Button(description="Apply Changes", layout=widgets.Layout(width='800px'))

# Define the function to apply changes and update the plots
def apply_changes(b):
    with output:
        clear_output(wait=True)
        
        # Extract the parameters from widgets
        index_range = index_range_slider.value
        selected_features = list(feature_select.value)
        train_size_pct = train_size_slider.value / 100
        
        # Slice the data
        df = data[index_range[0]:index_range[1]]
        
        # Prepare the data (assuming 'Interpolated Bead Area' is already in `df`)
        X = df[selected_features]
        y = df[target]
        
        # Train-test split
        train_size = int(len(df) * train_size_pct)
        X_train, X_test = X[:train_size], X[train_size:]
        y_train, y_test = y[:train_size], y[train_size:]
        
        # Train the model
        model = LinearRegression()
        model.fit(X_train, y_train)
        
        # Predict on test data
        y_pred = model.predict(X_test)
        mse = mean_squared_error(y_test, y_pred)
        display(HTML(f'<b>Mean Squared Error: {mse:.5f}</b>'))  # Display MSE in bold
        
        # Plot predicted vs actual
        plt.figure(figsize=(10, 6))
        plt.plot(y_train.index, y_train, label='Training', color='green')
        plt.plot(y_test.index, y_test, label='Actual', color='blue')
        plt.plot(y_test.index, y_pred, label='Predicted', color='red', linestyle='--')
        plt.xlabel('Time')
        plt.ylabel(target)
        plt.title('Actual vs Predicted '+target)
        plt.legend()
        plt.show()
        
        # Calculate loss for each point
        pointwise_mse_loss = (y_test - y_pred) ** 2
        
        # Plot the pointwise loss
        plt.figure(figsize=(10, 6))
        plt.plot(y_test.index, y_test, label='Actual', color='blue')
        plt.plot(y_test.index, y_pred, label='Predicted', color='red', linestyle='--')
        plt.plot(y_test.index, pointwise_mse_loss, label='Pointwise MSE Loss', color='orange')
        plt.xlabel('Time')
        plt.ylabel('MSE Loss')
        plt.title('Pointwise MSE Loss of Predicted vs Actual '+target)
        plt.legend()
        plt.show()

# Link the apply button to the function
apply_button.on_click(apply_changes)

# Display the widgets and the output area
output = widgets.Output()

display(index_range_slider, feature_select, train_size_slider, apply_button, output)
