## Laptop Price Prediction - Data Cleaning
This notebook covers all steps to clean and prepare the dataset for modeling.
We will:
1. Drop unnecessary columns.
2. Convert text columns to numeric.
3. Parse complex columns (Memory, ScreenResolution).
4. Encode categorical features.
5. Prepare the target variable.

### Step 0: Import, Load & Preview
- Import `pandas` and `numpy` as pd and np respectively.
- Load the dataset.
- Display the column names to get a good overview of everything.
- Display the first few rows of the dataset.

In [None]:
# 01_data_cleaning.ipynb
import pandas as pd
import numpy as np

In [None]:
# Load dataset
data = pd.read_csv("../data/raw/laptop_price.csv", encoding="ISO-8859-1")

In [None]:
# Show the column names
print("Columns in dataset:")
print(data.columns.tolist())

In [None]:
data.head()  # Display the first few rows of the dataset

### Step 1: Drop Unnecesary Columns
We drop `laptop_ID` and `Product` because they don't provide useful information for predicting the price.

In [None]:
# Drop columns
data = data.drop(['laptop_ID', 'Product'], axis=1)
# print(data.columns.tolist())
# data.head(1)

### Step 2: Convert RAM To Numeric
The `Ram` column contains text like "8GB". We remove "GB" and convert it to integer for modeling.

In [None]:
data['Ram'] = data['Ram'].str.replace('GB', '').astype(int)
data['Ram'].head()

### Step 3: Convert Weight To Numeric
The `Weight` column contains text like "1.37kg". We remove "kg" and convert it to float.

In [None]:
data['Weight'] = data['Weight'].str.replace('kg', '').astype(float)
data['Weight'].head()

### Step 4: Parse Memory Column
The `Memory` column contains text like "256GB SSD + 1TB HDD".
We will split it into separate columns for SSD, HDD, Hybrid storage and convert everything to GB.

In [None]:
# Create new columns with default 0
data['SSD'] = 0
data['HDD'] = 0
data['Hybrid'] = 0
data['Flash_Storage'] = 0

# Function to convert memory strings to numbers.