# Sales Prediction using Linear Regression

This notebook demonstrates how to predict sales using a linear regression model based on features extracted from a dataset.

## 1. Import Libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

## 2. Load the Dataset

In [None]:
file_path = 'Source/Sales.csv'
df = pd.read_csv(file_path, encoding='ISO-8859-1')
df.head()

## 3. Preprocess the Data

In [None]:
df['Ship_Date'] = pd.to_datetime(df['Ship_Date'], errors='coerce')
df['Ship_Year'] = df['Ship_Date'].dt.year
df['Ship_Month'] = df['Ship_Date'].dt.month
df['Ship_Day'] = df['Ship_Date'].dt.day
df['Ship_Weekday'] = df['Ship_Date'].dt.weekday
df = df.drop(columns=['Ship_Date'])
df.head()

## 4. Define Features and Target

In [None]:
features = ['Ship_Year', 'Ship_Month', 'Ship_Day', 'Ship_Weekday', 'Ship_Mode', 'Customer_ID', 'Segment', 'City', 'State', 'Region', 'Postal_Code', 'Product_ID', 'Category', 'Sub_Category']
target = 'Sales'
X = df[features]
y = df[target]

## 5. Split the Data into Training and Testing Sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 6. Build and Train the Linear Regression Model

In [None]:
categorical_features = ['Ship_Mode', 'Customer_ID', 'Segment', 'City', 'State', 'Region', 'Postal_Code', 'Product_ID', 'Category', 'Sub_Category']
numerical_features = ['Ship_Year', 'Ship_Month', 'Ship_Day', 'Ship_Weekday']

categorical_transformer = OneHotEncoder(handle_unknown='ignore')

preprocessor = ColumnTransformer(
    transformers=[
        ('cat', categorical_transformer, categorical_features)
    ],
    remainder='passthrough'
)

model = Pipeline(steps=[('preprocessor', preprocessor),
                        ('regressor', LinearRegression())])

model.fit(X_train, y_train)

## 7. Evaluate the Model

In [None]:
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"R-squared (R²): {r2:.2f}")

## 8. Conclusion

In this notebook, we successfully built and evaluated a linear regression model to predict sales. The model's performance was assessed using metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R²).