## IEESP - â€“ Luxury Watch 
Sean Kelly X00221555
David Burgos X00229142
Daniel Alonso X00226363

## 1. Dataset Acquisition 

The dataset in which we have chosen is a luxury watch pricing dataset. The criteria in which this dataset contains includes brands, models, prices, cases, straps, movements, water resistance, case diameter, case thickness, band width, dial color, crystal material, complications and power reserves. 

The dataset has 14 columns and 508 rows. It is a publicly available dataset available on Kaggle.

This dataset is useful for Businesses, Resellers, enthusiasts and individuals wishing to further expand their knowledge in the expertise

## Objective

The Objective of this project is to **evaluate and provide statistics on the pricing of these watches compared to the prestige of their branding and the condition they are in**. We believe that these categories are important to compare as branding significantly influences the perceived value and resale potential of a luxury watch, aswell as the condition greatly affecting the collectability aswell as longevity of the singular watch itself.

## AI System

We plan to demo the AI user system by asking the user to input **Brand, User Lifestyle and Price**. With the information given, the system will take the information from the dataset and inform the user whether the watch seems like a good option for the price quoted **compared to the user's lifestyle, aswell as how reliable the watch should be**. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, auc
import joblib

In [None]:
import pandas as pd
from pathlib import Path

def load_watch_data():
    url = "https://raw.githubusercontent.com/SDKELLY06/IEESP/refs/heads/main/Luxury%20watch.csv"
    csv_path = Path("Luxury_watch.csv")

    if not csv_path.exists():
        df = pd.read_csv(url)
        df.to_csv(csv_path, index=False)
    else:
        df = pd.read_csv(csv_path)

    return df

watches = load_watch_data()
print(watches.head())

print("\n")
filename = "Luxury watch.csv"
data = np.genfromtxt(filename, delimiter=',')
print("Data shape:", data.shape)

In [None]:
watches.head()

In [None]:
watches.info()

In [None]:
watches["Price (USD)"].value_counts()

In [None]:
watches.describe()

In [None]:
plt.rc('font', size=14)
plt.rc('axes', labelsize=14, titlesize=14)
plt.rc('legend', fontsize=14)
plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)

watches.hist(bins=50, figsize=(12, 8)) 
plt.show()

## Dataset Cleaning and Wrangling


In [None]:
#Test Set Creation

def shuffle_and_split_data(data, test_ratio):
   np.random.seed(42)
   shuffled_indices = np.random.permutation(len(data))
   test_set_size = int(len(data) * test_ratio)
   test_indices = shuffled_indices[:test_set_size]
   train_indices = shuffled_indices[test_set_size:]
   return data.iloc[train_indices], data.iloc[test_indices]

train_set, test_set = shuffle_and_split_data(watches, 0.2)
len(train_set)

len(test_set)


test = shuffle_and_split_data
print(test)
print("\n")
train_set, test_set = train_test_split(watches, test_size=0.2, random_state=42)

print("Test set size:", len(test_set))
print("Train set size:", len(train_set))
test_set["Complications"].isnull().sum()

In [None]:
from sklearn.model_selection import train_test_split

#Grouping into price ranges to display in a graph
watches["Pricing"] = pd.cut(watches["Price (USD)"],
                                bins=[0, 10000, 20000, 30000, 40000, 50000, 60000, 70000., np.inf],
                                labels=["0-10k", "10-20k", "20-30k", "30-40k", "40-50k", "50-60k", "60-70k", "70+"])

watches["Pricing"].value_counts().sort_index().plot.bar(rot=0, grid=True)
plt.xlabel("Prices")
plt.ylabel("Number Of Watches")
plt.show()

## Missing Values
Power Reserve + Complications have missing values

In [None]:
null_rows_idx = watches.isnull().any(axis=1) #Finds any row within watches in which contain any form of missing values and displays them.
watches.loc[null_rows_idx].head()

In [None]:
filename = pd.read_csv("Luxury watch.csv")
filename["Price (USD)"] = filename["Price (USD)"].fillna(0)
filename["Water Resistance"] = filename["Water Resistance"].str.replace("meters", "").str.strip()
filename["Power Reserve"] = filename["Power Reserve"].str.replace("days", "").str.strip()
filename["Power Reserve"] = filename["Power Reserve"].str.replace("hours", "").str.strip()
print(filename.head)
