# 💻 Laptop Price Prediction

This notebook demonstrates a complete pipeline for predicting laptop prices using **data preprocessing, exploratory data analysis (EDA), and machine learning models**.

## 1. Load Dataset

In [None]:
import pandas as pd

# Load dataset
df = pd.read_csv("laptopData.csv")
print("Initial shape:", df.shape)
df.head()

## 2. Handle Missing Values
- Drop rows with more than 50% missing values.
- Fill numeric columns with **median** (robust to outliers).
- Fill categorical columns with **mode** (most frequent value).

In [None]:
# Drop rows with too many NaNs
df = df.dropna(thresh=df.shape[1]//2)

# Fill numeric NaNs with median
df = df.fillna(df.median(numeric_only=True))

# Fill categorical NaNs with mode
for col in df.select_dtypes(include='object').columns:
    df[col] = df[col].fillna(df[col].mode()[0])

df.info()

## 3. Encode Categorical Variables
Convert string columns into numerical codes using **Label Encoding**.

In [None]:
from sklearn.preprocessing import LabelEncoder

label_encoders = {}
for col in df.select_dtypes(include='object').columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le

df.head()

## 4. Train–Test Split

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop("Price", axis=1)
y = df["Price"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train.shape, X_test.shape

## 5. Exploratory Data Analysis (EDA)
- Distribution plots (Price, Weight)
- Scatterplot (Weight vs Price)
- Boxplots to identify outliers
- Summary statistics

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Summary stats
df[["Price", "Weight"]].describe()

In [None]:
# Distribution plots
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.histplot(df['Price'], kde=True)
plt.title('Distribution of Price')

plt.subplot(1, 2, 2)
sns.histplot(df['Weight'], kde=True)
plt.title('Distribution of Weight')

plt.show()

In [None]:
# Scatterplot
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Weight', y='Price', data=df)
plt.title('Weight vs. Price')
plt.show()

In [None]:
# Boxplots
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.boxplot(y=df['Price'])
plt.title('Box Plot of Price')

plt.subplot(1, 2, 2)
sns.boxplot(y=df['Weight'])
plt.title('Box Plot of Weight')

plt.show()

## 6. Model Training
- Scale features using **StandardScaler**
- Train a baseline model (**Linear Regression**)
- Compare with **Random Forest Regressor** (tuned with different n_estimators)

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Linear Regression
lin_reg = LinearRegression()
lin_reg.fit(X_train_scaled, y_train)
y_pred = lin_reg.predict(X_test_scaled)

print("Linear Regression MAE:", mean_absolute_error(y_test, y_pred))
print("Linear Regression R2:", r2_score(y_test, y_pred))

In [None]:
# Random Forest Regressor tuning
for n in [50, 100, 200]:
    rf = RandomForestRegressor(n_estimators=n, random_state=42)
    rf.fit(X_train_scaled, y_train)
    preds = rf.predict(X_test_scaled)
    print(f"RandomForest (n_estimators={n}) -> MAE: {mean_absolute_error(y_test, preds):.2f}, R2: {r2_score(y_test, preds):.2f}")

## 7. Results & Insights
- **Linear Regression**:
  - MAE: ~20,287
  - R²: 0.47
  - Explains ~47% variance but with large error → not reliable.

- **Random Forest**:
  - MAE: ~10,000
  - R²: ~0.80
  - Explains ~80% variance, much better performance.
  - Increasing trees beyond 100 gives diminishing returns.

✅ Conclusion: **Random Forest** is a far better choice for predicting laptop prices in this dataset.