# Laptop Price Prediction - Exploratory Data Analysis (EDA)

In this notebook, we will explore the dataset to understand relationships between features and the target variable (`Price_euros`).
We will:
1. Explore numeric features (distributions, correlations)
2. Explore categorical features (company, type, etc.)
3. Visualize feature-target relationships

### Step 0: Setup
Import necessary libraries and load the cleaned dataset from `/cleaned/laptops_clean.csv`.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Make plots look nicer
sns.set(style="whitegrid")

# Load the cleaned dataset
data = pd.read_csv("../data/cleaned/laptops_clean.csv")

# Quick check
print("Shape:", data.shape)
data.head()

### Step 1: Basic Statistics
Check the shape, summary statistics, and missing values.

In [None]:
print("Shape of dataset:", data.shape)
print("\nMissing values:\n", data.isnull().sum())
data.describe()

### Step 2: Distribution of Price
Visualize the distribution of the target variable to see its spread.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(8,5))
sns.histplot(data['Price_euros'], bins=50, kde=True)
plt.title("Distribution of Laptop Prices")
plt.xlabel("Price (Euros)")
plt.ylabel("Count")
plt.show()

### Step 3: Correlation Heatmap
Check correlations among numeric features and price.

In [None]:
plt.figure(figsize=(22,8))
sns.heatmap(data.corr(numeric_only=True), annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()
plt.savefig("../reports/figures/correlation_heatmap.png")

### Step 4: Price vs Company
Check how average laptop price varies by company.

In [None]:
plt.figure(figsize=(12,6))
sns.boxplot(x="Company", y="Price_euros", data=data)
plt.xticks(rotation=45)
plt.title("Laptop Price by Company")
plt.show()
plt.savefig("../reports/figures/price_by_company.png")

### Step 5: Price vs Laptop Type
Check how average price varies by type (e.g., Ultrabook, Gaming).

In [None]:
eda = pd.read_csv("../data/raw/laptop_price.csv", encoding="ISO-8859-1")
sns.boxplot(x="TypeName", y="Price_euros", data=eda)
plt.xticks(rotation=45)
plt.title("Laptop Price by Type")
plt.show()
plt.savefig("../reports/figures/price_by_type.png")

### Step 6: Price vs RAM
Check how price changes with RAM size.

In [None]:
plt.figure(figsize=(8,5))
sns.scatterplot(x="Ram", y="Price_euros", data=data)
plt.title("Price vs RAM")
plt.show()
plt.savefig("../reports/figures/price_by_ram.png")

### Step 7: Price vs Weight
Check if lighter laptops tend to cost more.

In [None]:
plt.figure(figsize=(8,5))
sns.scatterplot(x="Weight", y="Price_euros", data=data)
plt.title("Price vs Weight")
plt.show()
plt.savefig("../reports/figures/price_by_weight.png")

### Step 8: Save EDA Figures (Optional)
We can save plots for reports/presentation.

In [None]:
plt.figure(figsize=(8,5))
sns.histplot(data["Price_euros"], bins=50, kde=True)
plt.savefig("../reports/figures/price_distribution.png")