# 📊 E-Commerce Sales Prediction (ML & Data Analytics)
## 🚀 Machine Learning Model for Sales Forecasting
This project aims to analyze historical e-commerce sales data and predict future sales trends using machine learning models.

## 📌 Step 1: Importing Necessary Libraries
We will use various Python libraries for data preprocessing, visualization, and machine learning.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

warnings.filterwarnings('ignore')

## 📌 Step 2: Load and Explore Data
We will load the dataset and perform an initial exploration to understand its structure.

In [None]:
df = pd.read_csv('../input/walmart.csv')
df.head()

## 📌 Step 3: Exploratory Data Analysis (EDA)
Perform basic data analysis, including checking for missing values and statistical summary.

In [None]:
df.info()
df.describe()
df.isnull().sum()

## 📌 Step 4: Feature Engineering
Transform categorical features into numerical representations and create new derived features.

In [None]:
df_encoded = df.copy()
label_encoders = {}

for col in ['Gender', 'Age', 'City_Category', 'Stay_In_Current_City_Years']:
    le = LabelEncoder()
    df_encoded[col] = le.fit_transform(df_encoded[col])
    label_encoders[col] = le

df_encoded['Total_Spending'] = df_encoded.groupby('User_ID')['Purchase'].transform('sum')
df_encoded['Avg_Product_Purchase'] = df_encoded.groupby('Product_ID')['Purchase'].transform('mean')
df_encoded['Purchase_Count'] = df_encoded.groupby('User_ID')['Purchase'].transform('count')
df_final = df_encoded.drop(columns=['User_ID', 'Product_ID'])

df_final.head()

## 📌 Step 5: Data Splitting
Split the dataset into training and testing sets for model evaluation.

In [None]:
X = df_final.drop(columns=['Purchase'])
y = df_final['Purchase']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 📌 Step 6: Train Machine Learning Models
Train and evaluate different regression models.

In [None]:
def evaluate_model(y_test, y_pred, model_name):
    mae = mean_absolute_error(y_test, y_pred)
    mse = mean_squared_error(y_test, y_pred)
    rmse = mse ** 0.5
    r2 = r2_score(y_test, y_pred)
    return {
        'Model': model_name,
        'MAE': mae,
        'RMSE': rmse,
        'R2 Score': r2
    }

# Linear Regression
lr_model = LinearRegression()
lr_model.fit(X_train_scaled, y_train)
y_pred_lr = lr_model.predict(X_test_scaled)

# Random Forest
rf_model = RandomForestRegressor(n_estimators=50, max_depth=10, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)

# Model Evaluation
lr_results = evaluate_model(y_test, y_pred_lr, 'Linear Regression')
rf_results = evaluate_model(y_test, y_pred_rf, 'Random Forest')

model_comparison = pd.DataFrame([lr_results, rf_results])
print(model_comparison)

## 📌 Step 7: Export Processed Data
Save the processed dataset for further analysis.

In [None]:
df_final.to_csv('processed_ecommerce_data.csv', index=False)
print('Dataset successfully saved!')