## IEESP - â€“ Luxury Watch 
Sean Kelly X00221555
| David Burgos X00229142
| Daniel Alonso X00226363

## 1. Dataset Acquisition 

The dataset in which we have chosen is a luxury watch pricing dataset. The criteria in which this dataset contains includes brands, models, prices, cases, straps, movements, water resistance, case diameter, case thickness, band width, dial color, crystal material, complications and power reserves. 

The dataset has 14 columns and 508 rows. It is a publicly available dataset available on Kaggle.

This dataset is useful for Businesses, Resellers, enthusiasts and individuals wishing to further expand their knowledge in the expertise

## Objective

The Objective of this project is to **evaluate and provide statistics on the pricing of these watches compared to the prestige of their branding and the condition they are in**. We believe that these categories are important to compare as branding significantly influences the perceived value and resale potential of a luxury watch, aswell as the condition greatly affecting the collectability aswell as longevity of the singular watch itself.

## AI System

We plan to demo the AI user system by asking the user to input **Brand, User Lifestyle and Price**. With the information given, the system will take the information from the dataset and inform the user whether the watch seems like a good option for the price quoted **compared to the user's lifestyle, aswell as how reliable the watch should be**. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, auc
import joblib

In [None]:
from pathlib import Path

def load_watch_data():
    url = "https://raw.githubusercontent.com/SDKELLY06/IEESP/refs/heads/main/Luxury%20watch.csv"
    csv_path = Path("Luxury_watch.csv")

    if not csv_path.exists():
        df = pd.read_csv(url)
        df.to_csv(csv_path, index=False)
    else:
        df = pd.read_csv(csv_path)

    return df

watches = load_watch_data()
print(watches.head())

print("\n")
filename = "Luxury watch.csv"
data = np.genfromtxt(filename, delimiter=',')
print("Data shape:", data.shape)

In [None]:
watches.head()

In [None]:
watches.info()

In [None]:
watches["Price (USD)"].value_counts()

In [None]:
watches.describe()

In [None]:
plt.rc('font', size=14)
plt.rc('axes', labelsize=14, titlesize=14)
plt.rc('legend', fontsize=14)
plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)

watches.hist(bins=50, figsize=(12, 8)) 
plt.show()

## Dataset Cleaning and Wrangling


In [None]:
#Test Set Creation

def shuffle_and_split_data(data, test_ratio):
   np.random.seed(42)
   shuffled_indices = np.random.permutation(len(data))
   test_set_size = int(len(data) * test_ratio)
   test_indices = shuffled_indices[:test_set_size]
   train_indices = shuffled_indices[test_set_size:]
   return data.iloc[train_indices], data.iloc[test_indices]

train_set, test_set = shuffle_and_split_data(watches, 0.2)
len(train_set)

len(test_set)


test = shuffle_and_split_data
print(test)
print("\n")
train_set, test_set = train_test_split(watches, test_size=0.2, random_state=42)

print("Test set size:", len(test_set))
print("Train set size:", len(train_set))
test_set["Complications"].isnull().sum()


In [None]:
from sklearn.model_selection import train_test_split

#Grouping into price ranges to display in a graph
watches["Pricing"] = pd.cut(watches["Price (USD)"],                 #Creating a new feature * 
                                bins=[0, 10000, 20000, 30000, 40000, 50000, 60000, 70000., np.inf],
                                labels=["0-10k", "10-20k", "20-30k", "30-40k", "40-50k", "50-60k", "60-70k", "70+"])

watches["Pricing"].value_counts().sort_index().plot.bar(rot=0, grid=True)
plt.xlabel("Prices")
plt.ylabel("Number Of Watches")
plt.show()

## Missing Values
Power Reserve + Complications have missing values

In [None]:
null_rows_idx = watches.isnull().any(axis=1) #Finds any row within watches in which contain any form of missing values and displays them.
watches.loc[null_rows_idx].head()

In [None]:
watches_option = watches.copy() #Creating copy

watches_option.dropna(subset=["Power Reserve"], inplace=True)

watches_option.loc[null_rows_idx].head()

In [None]:
#Dataset Cleansing

watches_option["Power Reserve"] = watches_option["Power Reserve"].fillna(0)
watches_option["Complications"] = watches_option["Complications"].fillna(0)
watches_option["Price (USD)"] = watches_option["Price (USD)"].fillna(0)

watches_option["Water Resistance"] = watches_option["Water Resistance"].str.replace("meters", "").str.strip()
watches_option["Power Reserve"] = watches_option["Power Reserve"].str.replace("days", "").str.strip()
watches_option["Power Reserve"] = watches_option["Power Reserve"].str.replace("hours", "").str.strip()
print(watches_option.head())


## Visualising Data

In [None]:
watches.plot(kind="scatter", x="Price (USD)", y="Water Resistance", grid=True)
plt.show()

In [None]:
watches.plot(kind="scatter", x="Price (USD)", y="Water Resistance", grid=True,
             s=watches["Price (USD)"] / 100, label="Price (USD)",
             c="blue", cmap="jet", colorbar=True,
             legend=True, sharex=False, figsize=(10, 7))
plt.show()

In [None]:
watches.plot(kind="scatter", x="Price (USD)", y="Water Resistance", grid=True, alpha=0.2)
plt.show()

## Feature Selection and Pre-Processing

The Features we have decided are most important to take into consideration are Brand and Prices (USD). Branding is a huge consideration in pricing especially in the watch industry and so we believe this correlation is most important. These are Found in the 1st and 13th column.

In [None]:
brand = watches_option.iloc[:, 0].value_counts()
brand_min = brand[brand > 5]
print(brand_min)

In [None]:
price = watches_option.iloc[:, 14].value_counts()
print(price)

## Selecting and training model

In [None]:
#Logistic Regression
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit("Brand", "Price (USD)")
y_val_lr = log_reg.predict(X_val)
y_test_lr = log_reg.predict(X_test)
val_acc_lr = accuracy_score(y_val, y_val_lr)
test_acc_lr = accuracy_score(y_test, y_test_lr)
cm_lr = confusion_matrix(y_val, y_val_lr)
print("Validation Accuracy:", val_acc_lr, "\nTest Accuracy:", test_acc_lr)
plot_confusion_matrix(cm_lr, "Logistic Regression (Validation)")

In [None]:
#Random Forrest
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit("Brand", "Price (USD)")
y_val_rf = rf.predict(X_val)
y_test_rf = rf.predict(X_test)
val_acc_rf = accuracy_score(y_val, y_val_rf)
test_acc_rf = accuracy_score(y_test, y_test_rf)
cm_rf = confusion_matrix(y_val, y_val_rf)
print("Validation Accuracy:", val_acc_rf, "\nTest Accuracy:", test_acc_rf)
plot_confusion_matrix(cm_rf, "Random Forest (Validation)")

In [None]:
#Naive Bayes
nb = GaussianNB()
nb.fit(X_train, y_train)
y_val_nb = nb.predict(X_val)
y_test_nb = nb.predict(X_test)
val_acc_nb = accuracy_score(y_val, y_val_nb)
test_acc_nb = accuracy_score(y_test, y_test_nb)
cm_nb = confusion_matrix(y_val, y_val_nb)
print("Validation Accuracy:", val_acc_nb, "\nTest Accuracy:", test_acc_nb)
plot_confusion_matrix(cm_nb, "Naive Bayes (Validation)")

## Demo application

In [None]:
def demo_app():

    print(" Watches Demo App")
    brand_input = str(input("Please enter a brand: "))
    price_input = int(input("Please enter a price: "))
    print("Business, Collective, Lavish")
    category_input = input("Please enter your desired style: ")

    inputs = np.array([[brand_input, price_input]])

    
    if category_input == "Business":
        if price_input > 15000:
            print("This", brand_input, "is Expensive for a business purpose watch.")
        elif price_input <= 15000:
            print("This", brand_input, "is a Reasonable price for a business purpose watch.")
        else:
            print("Wrong input entered. Please Restart")
    elif category_input == "Collective":
        if price_input > 35000:
            print("This", brand_input, "is Expensive for a collective purpose watch.")
        elif price_input > 15000:
            print("This", brand_input, "is a reasonable price for a collective watch.")
        elif price_input <= 15000:
            print("This", brand_input, "is a Cheaper price for a collective purpose watch.")
        else:
            print("Wrong input entered. Please Restart")
    elif category_input == "Lavish":
        if price_input > 60000:
            print("This", brand_input, "is Expensive for a lavish purpose watch.")
        elif price_input > 35000:
            print("This", brand_input, "is a reasonable price for a lavish watch.")
        elif price_input <= 35000:
            print("This", brand_input, "is a Cheaper price for a lavish purpose watch.")
        else:
            print("Wrong input entered. Please Restart")
demo_app()
