# Laptop Price Predictor


Short project notebook to train a simple Machine Learning model to predict laptop prices.

This notebook is meant to be uploaded to your GitHub profile as a demo project. Replace `laptop_price.csv` with your real dataset or keep using the synthetic dataset provided for demonstration.

## 1. Overview

- Problem: Predict the price (in INR or USD) of laptops given features such as brand, ram, storage, processor, weight, and so on.
- Approach: Simple end-to-end pipeline — load data, do exploratory data analysis (EDA), preprocess, train a `RandomForestRegressor`, evaluate, and save the model as a `.pkl` file.
- Notes: If you have a `laptop_price.csv`, put it in the same folder. If not found, this notebook will create a small synthetic dataset so the pipeline runs end-to-end.

## 2. Requirements

```bash
pip install pandas scikit-learn numpy matplotlib
```

All standard libraries are used. If you run on GitHub (rendered preview), outputs may not execute there — but the notebook file and code are visible to recruiters.

In [None]:
# 3. Imports
import os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_absolute_error, r2_score
import pickle

print('Libraries imported')


In [None]:
# 4. Load dataset (or create synthetic if not present)
DATA_PATH = 'laptop_price.csv'
if os.path.exists(DATA_PATH):
    df = pd.read_csv(DATA_PATH)
    print(f'Loaded dataset from {DATA_PATH} — shape:', df.shape)
else:
    print(f'{DATA_PATH} not found. Creating a small synthetic dataset for demo.')
    rng = np.random.RandomState(42)
    brands = ['Dell', 'HP', 'Lenovo', 'Asus', 'Acer']
    processors = ['i3', 'i5', 'i7']
    rams = [4, 8, 16, 32]
    storages = [256, 512, 1024]
    weights = [1.1, 1.3, 1.5, 2.0]
    n = 200
    df = pd.DataFrame({
        'brand': rng.choice(brands, n),
        'processor': rng.choice(processors, n),
        'ram_gb': rng.choice(rams, n),
        'storage_gb': rng.choice(storages, n),
        'weight_kg': rng.choice(weights, n),
        'price': None
    })
    # Generate a synthetic price with some logic + noise
    base = df['ram_gb'] * 2 + (df['storage_gb'] / 128) * 5
    proc_map = {'i3': 20, 'i5': 40, 'i7': 60}
    brand_map = {'Dell': 10, 'HP': 5, 'Lenovo': 8, 'Asus': 6, 'Acer': 4}
    df['price'] = (base + df['processor'].map(proc_map) + df['brand'].map(brand_map) - df['weight_kg'] * 3 + rng.normal(0, 8, n)).round(2)
    df.to_csv('synthetic_laptop_price_demo.csv', index=False)
    print('Synthetic dataset created with shape:', df.shape)

df.head()


## 5. Basic EDA

Let's look at basic statistics and feature distributions.

In [None]:
display(df.describe(include='all'))
print('\nMissing values per column:')
print(df.isnull().sum())


## 6. Preprocessing & Pipeline

We'll one-hot encode categorical features and scale numeric features. Then a RandomForestRegressor will be trained.

In [None]:
# Define features
TARGET = 'price'
FEATURES = [c for c in df.columns if c != TARGET]
categorical_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()
numerical_cols = df.select_dtypes(include=[np.number]).columns.tolist()
if TARGET in numerical_cols:
    numerical_cols.remove(TARGET)

print('Categorical cols:', categorical_cols)
print('Numerical cols:', numerical_cols)

numeric_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())
])
cat_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numerical_cols),
        ('cat', cat_transformer, categorical_cols)
    ]
)

model = RandomForestRegressor(n_estimators=100, random_state=42)

pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('model', model)])

print('Pipeline ready')


## 7. Train / Test split and training


In [None]:
X = df[FEATURES]
y = df[TARGET]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

pipeline.fit(X_train, y_train)
print('Model trained')


In [None]:
# Evaluation
preds = pipeline.predict(X_test)
mae = mean_absolute_error(y_test, preds)
r2 = r2_score(y_test, preds)
print(f'Mean Absolute Error: {mae:.3f}')
print(f'R^2 score: {r2:.3f}')


In [None]:
# 8. Save model to a pickle file
model_filename = 'laptop_price_model.pkl'
with open(model_filename, 'wb') as f:
    pickle.dump(pipeline, f)
print(f'Model saved to {model_filename}')


In [None]:
# 9. Example: load model and predict on a single sample
with open(model_filename, 'rb') as f:
    loaded = pickle.load(f)

sample = X_test.iloc[0:1]
print('Sample features:\n', sample)
print('Predicted price:', loaded.predict(sample))


## 10. How to upload this project to GitHub

1. Create a new repository on GitHub (e.g., `laptop-price-predictor`).
2. Add this notebook file `laptop_price_predictor.ipynb` to the repository root.
3. If you used a separate CSV, upload it as well (e.g., `laptop_price.csv`).
4. Add a `README.md` describing the project, dependencies, and sample results.
5. Commit and push. Recruiters will be able to preview the notebook on GitHub (static view) and download to run locally.

Optional: Include `requirements.txt` with pinned package versions.


----
You can edit this notebook (author name, dataset path, model choices) before uploading. Good luck!