# 📊 Amazon E-commerce Furniture Dataset 2024 - Analysis & Prediction

### 📝 Project Overview  
This project analyzes the **Amazon E-commerce Furniture Dataset 2024**, focusing on price trends, category distribution, and rating patterns over the years.  
We perform **data cleaning, visualization, and machine learning-based price prediction**.

---  


## 📂 1. Load & Inspect Data
We first load the dataset and inspect its structure.


In [None]:
import pandas as pd

# Load Dataset
df = pd.read_csv("amazon_furniture_dataset_2024.csv")

# Display basic info
df.info()
df.head()


## 🛠 2. Data Cleaning  
We remove duplicates, handle missing values, and convert data types.


In [None]:
# Remove duplicates and handle missing values
df.drop_duplicates(inplace=True)
df.dropna(inplace=True)

# Convert price to numeric
df['price'] = df['price'].replace('[\$,]', '', regex=True).astype(float)

# Convert ratings to numeric
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
df['rating'].fillna(df['rating'].mean(), inplace=True)


## 📊 3. Exploratory Data Analysis  
We visualize **price distributions**, **category-wise price variations**, and **yearly trends**.


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Price Distribution
plt.figure(figsize=(10,5))
sns.histplot(df['price'], bins=30, kde=True, color="skyblue")
plt.title("Price Distribution")
plt.show()

# Price by Category
plt.figure(figsize=(12,5))
sns.boxplot(x=df['category'], y=df['price'], palette="coolwarm")
plt.xticks(rotation=45)
plt.title("Price Variation by Category")
plt.show()


## 📅 4. Yearly Comparison  
We analyze price trends and rating variations over different years.


In [None]:
# Convert 'year' column to datetime if not already
df['year'] = pd.to_datetime(df['year'], errors='coerce').dt.year

# Yearly Price Trend
plt.figure(figsize=(10,5))
sns.lineplot(x=df['year'], y=df['price'], marker="o", color="green")
plt.title("Yearly Price Trend")
plt.show()

# Yearly Rating Trend
plt.figure(figsize=(10,5))
sns.lineplot(x=df['year'], y=df['rating'], marker="o", color="red")
plt.title("Yearly Rating Trend")
plt.show()


## 🤖 5. Machine Learning - Price Prediction  
We train a **Random Forest Regressor** to predict furniture prices.


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# One-hot encode categorical variables
df = pd.get_dummies(df, columns=['category', 'brand'], drop_first=True)

# Define features and target
X = df.drop(columns=['price'])
y = df['price']

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions & Evaluation
y_pred = model.predict(X_test)

print("MAE:", mean_absolute_error(y_test, y_pred))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred)))
print("R2 Score:", r2_score(y_test, y_pred))
