# Upload Clean Dataset

In [1]:
import pandas as pd

# Load the cleaned data into a DataFrame
ProductsReviewsCC = pd.read_csv('productsreviews_cleaned_filled.csv')

# Display the first few rows to confirm the data is loaded correctly
print(ProductsReviewsCC.head())

  product_id               product_name brand_name  loves_count  rating  \
0    P473671    Fragrance Discovery Set      19-69         6320  3.6364   
1    P473668    La Habana Eau de Parfum      19-69         3827  4.1538   
2    P473662  Rainbow Bar Eau de Parfum      19-69         3253  4.2500   
3    P473660       Kasbah Eau de Parfum      19-69         3018  4.4762   
4    P473658  Purple Haze Eau de Parfum      19-69         2691  3.2308   

   reviews  price_usd primary_category secondary_category  tertiary_category  
0     11.0       35.0        Fragrance  Value & Gift Sets  Perfume Gift Sets  
1     13.0      195.0        Fragrance              Women            Perfume  
2     16.0      195.0        Fragrance              Women            Perfume  
3     21.0      195.0        Fragrance              Women            Perfume  
4     13.0      195.0        Fragrance              Women            Perfume  


## Defining Features and Target

Feature Selection: The selected features list specifies the columns used as inputs for the model. The features ('price_usd', 'rating', and 'reviews') are chosen because they are expected to influence the loves_count (popularity) of a product.

Define X and y: X is a DataFrame containing the selected features that will be used to predict the target variable. y is a Series that contains the target variable ('loves_count'). In this context, loves_count is a measure of a product's popularity, which the model aims to predict.

In [2]:
# Feature selection
selected_features = ['price_usd', 'rating', 'reviews']

# Define X (features) and y (target)
X = ProductsReviewsCC[selected_features]
y = ProductsReviewsCC['loves_count']


## Training Using Pasting

In this section, I implement the pasting technique, which involves training multiple decision tree regressors on different subsets of the data. This helps in reducing the variance and improving the robustness of the model.

I predict the target variable for the test set by averaging the predictions from each decision tree. This aggregation helps in achieving a more reliable prediction.

In [3]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
import numpy as np

# Number of subsets and models

n_subsets = 10
subset_size = len(X) // n_subsets
subsets_X = []
subsets_y = []

for i in range(n_subsets):
    X_subset = X.iloc[i * subset_size:(i + 1) * subset_size]
    y_subset = y.iloc[i * subset_size:(i + 1) * subset_size]
    subsets_X.append(X_subset)
    subsets_y.append(y_subset)

#Training the Model

trees = []
for X_subset, y_subset in zip(subsets_X, subsets_y):
    tree = DecisionTreeRegressor(random_state=42)
    tree.fit(X_subset, y_subset)
    trees.append(tree)
# This averaging reduces variance and improves prediction accuracy.
def average_predictions(trees, X):
    predictions = np.array([tree.predict(X) for tree in trees])
    avg_predictions = np.mean(predictions, axis=0)
    return avg_predictions

# Data Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
y_pred_pasting = average_predictions(trees, X_test)

#Evaluating
from sklearn.metrics import mean_squared_error, r2_score
mse_pasting = mean_squared_error(y_test, y_pred_pasting)
r2_pasting = r2_score(y_test, y_pred_pasting)
print(f"Pasting - Mean Squared Error: {mse_pasting}")
print(f"Pasting - R-squared: {r2_pasting}")

Pasting - Mean Squared Error: 1826009526.5866172
Pasting - R-squared: 0.5823196751407739


## Recommendation Function

This function filters products based on the user's input category and predicts the loves_count for these filtered products. It then sorts them by the predicted loves_count and returns the top recommendations.

In [4]:
def recommend_products_with_pasting(category, num_recommendations=5):
    global ProductsReviewsCC
    
    filtered_products = ProductsReviewsCC[ProductsReviewsCC['tertiary_category'].str.contains(category, case=False, na=False)]
    
    if filtered_products.empty:
        print(f"No products found for category '{category}'.")
        return pd.DataFrame()
    
    X_filtered = filtered_products[selected_features]
    filtered_products['predicted_loves_count'] = average_predictions(trees, X_filtered)
    
    top_products = filtered_products.sort_values(by='predicted_loves_count', ascending=False).head(num_recommendations)
    
    return top_products[['product_name', 'brand_name', 'price_usd', 'rating']]


## User Interaction

The user interacts with the system by entering the product category they are interested in. The system then provides the top product recommendations based on the trained model.

These are the options of products you can get recommendations on: 

Accessories, Aftershave, Anti-Aging, BB & CC Cream, BB & CC Creams
Bath Soaks & Bubble Bath, Beauty Supplements, Blemish & Acne Treatments, Blotting Papers, Blush, Body Lotions & Body Oils, Body Mist & Hair Mist, Body Products, Body Sunscreen, Body Wash & Shower Gel, Bronzer, Brush Cleaners, Brush Sets, Brushes & Combs, Candles, Cellulite & Stretch Marks, Cheek Palettes, Cologne, Cologne Gift Sets, Color Care, Color Correct, Concealer, Conditioner, Contour, Curling Irons, Damaged Hair, Decollete & Neck Creams, Deodorant & Antiperspirant, Diffusers, Dry Shampoo, Exfoliators, Eye Brushes, Eye Cream, Eye Creams & Treatments, Eye Masks, Eye Palettes, Eye Primer, Eye Sets, Eyebrow, Eyelash Curlers, Eyeliner, Eyeshadow, Face Brushes, Face Masks, Face Oils, Face Primer, Face Serums, Face Sets, Face Sunscreen, Face Wash, Face Wash & Cleansers, Face Wipes, Facial Cleansing Brushes, Facial Peels, Facial Rollers, False Eyelashes, For Body, For Face, Foundation, Hair Dryers, Hair Dye & Root Touch-Ups, Hair Masks, Hair Oil, Hair Primers, Hair Removal, Hair Spray, Hair Straighteners & Flat Irons, Hair Styling Products, Hair Supplements, Hair Thinning & Hair Loss, Hand Cream & Foot Cream, Hand Sanitizer & Hand Soap, Highlighter, Holistic Wellness, Intimate Care, Leave-In Conditioner, Lip Balm & Treatment, Lip Brushes, Lip Gloss, Lip Liner, Lip Plumper, Lip Sets, Lip Stain, Lipstick, Liquid Lipstick, Makeup & Travel Cases, Makeup Bags & Travel Cases, Makeup Removers, Manicure & Pedicure Tools, Mascara, Mists & Essences, Moisturizer & Treatments, Moisturizers, Night Creams, Perfume, Perfume Gift Sets, Rollerballs & Travel Size, Scalp Treatments, Scrub & Exfoliants, Setting Spray & Powder, Shampoo, Shampoo & Conditioner, Sharpeners, Shaving, Sheet Masks, Skincare Sets, Sponges & Applicators, Sunscreen, Teeth Whitening, Tinted Moisturizer, Toners, Tweezers & Eyebrow Tools, Under-Eye Concealer

In [5]:
def get_user_input_with_pasting():
    while True:
        category = input("Enter the specific product type you're interested in (e.g., lipstick, serum): ")
        num_recommendations = int(input("How many recommendations would you like? "))
        
        recommendations = recommend_products_with_pasting(category, num_recommendations)
        if not recommendations.empty:
            print(f"\nTop {num_recommendations} products in the specific category '{category}':")
            print(recommendations[['product_name', 'brand_name', 'price_usd', 'rating']])
        
        more_recs = input("Would you like more recommendations? (yes/no): ").strip().lower()
        if more_recs == 'yes':
            same_or_new = input("Would you like recommendations for the same category or a new one? (same/new): ").strip().lower()
            if same_or_new == 'same':
                continue
            elif same_or_new == 'new':
                continue
            else:
                print("Invalid input. Exiting.")
                break
        elif more_recs == 'no':
            print("Thank you for using the recommender system!")
            break
        else:
            print("Invalid input. Exiting.")
            break

# Main process
get_user_input_with_pasting()


Enter the specific product type you're interested in (e.g., lipstick, serum):  Sunscreen
How many recommendations would you like?  5


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_products['predicted_loves_count'] = average_predictions(trees, X_filtered)



Top 5 products in the specific category 'Sunscreen':
                                           product_name  brand_name  \
6804    Ultimate Sun Protector Lotion SPF 50+ Sunscreen    Shiseido   
7255                      Unseen Sunscreen SPF 40 PA+++  Supergoop!   
7257  Glowscreen Sunscreen SPF 40 PA+++ with Hyaluro...  Supergoop!   
3542  Daily UV Defense Invisible Broad Spectrum SPF ...   innisfree   
6807                      Clear Sunscreen Stick SPF 50+    Shiseido   

      price_usd  rating  
6804       50.0  4.4864  
7255       48.0  4.2600  
7257       38.0  4.1513  
3542       16.0  4.5393  
6807       30.0  4.6566  


Would you like more recommendations? (yes/no):  no


Thank you for using the recommender system!


## DONE