# **Fares Ahmed Moustafa**
### *F.ahmed2270@nu.edu.eg*

## **Correlation:**
Measures the linear relationship between two variables.

It’s a statistic, usually expressed using values between -1 and +1.


## **Multicollinearity:**

Happens when two or more independent variables in a regression model are highly correlated with each other.

# **Types of Gradient Descent Algorithms**

## **Batch Gradient Descent**

Uses the entire dataset to compute gradients.

**Advantages:** Stable convergence, accurate minimum.

**Limitations:** Very slow on large datasets.



## **Stochastic Gradient Descent (SGD)**

Updates weights for each training sample (one at a time).

**Advantages:** Fast, good for very large datasets, can escape local minima.

**Limitations:** Very noisy, convergence fluctuates.



## **Mini-Batch Gradient Descent**

Compromise between Batch and SGD – updates using small subsets (batches).

**Advantages:** Efficient, balances speed and stability, uses vectorization.

**Limitations:** Still needs tuning of batch size.



# **Types of Regularization**

## **L1 Regularization (Lasso)**

Forces some coefficients to 0 → performs feature selection.

Good when many irrelevant features exist.

Can be unstable if features are highly correlated.

## **L2 Regularization (Ridge)**

Shrinks coefficients but never sets them exactly to 0.

Works well when predictors are correlated.

Doesn’t eliminate irrelevant features.


In [3]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


data = pd.read_csv("train_energy_data.csv")

X = data.drop("Energy Consumption", axis=1)
y = data["Energy Consumption"]

categorical_features = ["Building Type", "Day of Week"]
numerical_features = ["Square Footage", "Number of Occupants", "Appliances Used", "Average Temperature"]

# Linear Regression
linear_preprocessor = ColumnTransformer(
    transformers=[
        ("cat", OneHotEncoder(drop="first"), categorical_features),
        ("num", "passthrough", numerical_features)
    ]
)

linear_model = Pipeline(steps=[
    ("preprocessor", linear_preprocessor),
    ("regressor", LinearRegression())
])

linear_model.fit(X, y)
y_pred_lin = linear_model.predict(X)

print("Linear Regression Results:")
print("MSE:", mean_squared_error(y, y_pred_lin))
print("R2 Score:", r2_score(y, y_pred_lin))

# Polynomial Regression (degree=2 as example)
poly_preprocessor = ColumnTransformer(
    transformers=[
        ("cat", OneHotEncoder(drop="first"), categorical_features),
        ("num", PolynomialFeatures(degree=2, include_bias=False), numerical_features)
    ]
)

poly_model = Pipeline(steps=[
    ("preprocessor", poly_preprocessor),
    ("regressor", LinearRegression())
])

poly_model.fit(X, y)
y_pred_poly = poly_model.predict(X)

print("\nPolynomial Regression Results (degree=2):")
print("MSE:", mean_squared_error(y, y_pred_poly))
print("R2 Score:", r2_score(y, y_pred_poly))

Linear Regression Results:
MSE: 0.0001863116596849601
R2 Score: 0.9999999997858984

Polynomial Regression Results (degree=2):
MSE: 0.00018460644937261903
R2 Score: 0.999999999787858
