### Support Vector Machine (SVM)

Support Vector Machine (SVM) is one of the most notable supervised machine learning algorithms, it was designed to classify data by finding the most ideal boundary between different attributes. Unlike some other models that might get distracted by noise or additional details, SVM focuses on the most important data points which are the ones that are closest to the boundary to ensure that the systems decisions are made in a clear and precise way. By maximizing the margin between categories, SVM improves accuracy and make classification task reliable.

#### Justification

We chose to implement SVM because of the following reasons: 

1.	The Glowlytic recommendation system takes into consideration multiple attributes, including skin type, product type, brand, and notable effects. Also, SVM’s ability to work well in high-dimensional spaces makes sure that all these factors are taken into account for accurate recommendations.

2.	Since our system relies on a curated datasets and a massive amount of data, SVM makes an ideal choice. It promises high accuracy even with a smaller dataset, and outperforms other models that need extensive training data to be effective.

3.	Unlike models that may struggle with new inputs, SVM has balance by having clear decision boundaries. This makes it have less tendency to overfit  and ensures that recommendations remain the same across different users.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score, classification_report

# Load the new cleaned dataset
file_path = "Dataset/processed_file.csv"  # Update this path if needed
df = pd.read_csv(file_path)

# Fill missing values only for numeric columns
numeric_cols = df.select_dtypes(include=['number']).columns
df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].median())

# Selecting relevant features (Ensure 'brand' exists)
if 'brand' not in df.columns:
    raise ValueError("Error: 'brand' column is missing in the dataset!")

X = df[['brand']].copy()

# Encoding categorical 'brand' column
label_encoder = LabelEncoder()
X['brand'] = label_encoder.fit_transform(df['brand'])

# Identify binary encoded columns for notable effects and skin type
notable_effects_columns = [col for col in df.columns if col.startswith('notable_effects_')]
skin_type_columns = ['Sensitive', 'Combination', 'Oily', 'Dry', 'Normal']

# Ensure all required columns exist before selection
missing_columns = [col for col in notable_effects_columns + skin_type_columns if col not in df.columns]
if missing_columns:
    raise ValueError(f"Error: The following columns are missing: {missing_columns}")

X[notable_effects_columns + skin_type_columns] = df[notable_effects_columns + skin_type_columns]

# Standardize features (SVM works best with standardized data)
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Encoding product_type (Target Variable)
if 'product_type' not in df.columns:
    raise ValueError("Error: 'product_type' column is missing in the dataset!")

y = label_encoder.fit_transform(df['product_type'].astype(str))

# Splitting dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training Support Vector Machine (SVM) with hyperparameter tuning
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf', 'poly']}
grid_search = GridSearchCV(SVC(), param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get best parameters from GridSearchCV
best_params = grid_search.best_params_
print(f"✅ Best Parameters Found: {best_params}")

# Train final SVM model with best parameters
svm_classifier = SVC(**best_params)
svm_classifier.fit(X_train, y_train)

# Making predictions on test set
y_pred = svm_classifier.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f'🎯 SVM Model Accuracy: {accuracy:.2f}')
print(classification_report(y_test, y_pred))


✅ Best Parameters Found: {'C': 1, 'kernel': 'rbf'}
🎯 SVM Model Accuracy: 0.56
              precision    recall  f1-score   support

           0       0.68      0.42      0.52        40
           1       0.36      0.39      0.38        41
           2       0.57      0.70      0.63        79
           3       0.84      0.76      0.80        42
           4       0.38      0.38      0.38        40

    accuracy                           0.56       242
   macro avg       0.57      0.53      0.54       242
weighted avg       0.57      0.56      0.56       242

