# Simple Keyword Search Recommender from Flipkart Customer Reviews  
## Introduction  
A simple keyword search recommender module is built to filter products by keywords based on a user's interest.  In this case, the module will filter keywords based on category and product title.  The user may enter a word or series of words in as the product title and even refine the search by entering a category, or vice versa.  The module will filter by the keywords entered, then rank the results by the rating method chosen by the user and return the desired amount of results.  
## Summary
#### Adjusted Rating Score
An adjusted rating system was introduced to adjust the rating of each product to account for extreme ratings and balance out ratings for products that had several thousand reviews to products that only had a hundred or so.  Products with less reviews were more likely to be affected by a series of ratings than products with more reviews, both positively and negatively.  We want a balance.  We used the formula: $$score_i = \frac{\sum_u r_{ui} + k*\mu}{n_i+k}$$
The adjusted rating score essentially does two things that make it better than a simple average rating:
- takes in the average rating of the product (perceived quality over popularity)
- takes in the number of ratings by customers (popularity over perceived quality)

Which shows that our queries return fast results.

In [1]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('new_df.csv')
df.drop('Unnamed: 0', axis=1, inplace=True)

In [3]:
# Calculate universal mean rating
universal_mean = ((df.star_rating * df.purchased_counts).sum()) / (df.purchased_counts.sum())

# Find 50% quantile for k
k = df.purchased_counts.describe()[5]

# Assign adjusted ratings to each review
df['adjusted_rating'] = (df.purchased_counts * df.star_rating + k * universal_mean) / (df.purchased_counts + k)

In [4]:
print('universal mean rating: {}\n'.format(universal_mean))

pd.options.display.float_format = "{:.2f}".format
print('product ratings distribution: \n{}'.format(df.purchased_counts.describe()))

universal mean rating: 4.274737187862538

product ratings distribution: 
count   39273.00
mean     1060.70
std      1131.02
min       102.00
25%       321.00
50%       560.00
75%      1234.00
max      3970.00
Name: purchased_counts, dtype: float64


In [5]:
# Show how purchased counts affect adjusted ratings of various random products
np.random.seed(13)

df[
    ['customer_id', 
     'product_id', 
     'product_title', 
     'purchased_counts', 
     'star_rating', 
     'adjusted_rating']
].iloc[np.random.choice(df.index, size=1000, replace=False, )].set_index('customer_id').head()

Unnamed: 0_level_0,product_id,product_title,purchased_counts,star_rating,adjusted_rating
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
37257664,B003XCVQ3C,moshi MiniDisplay Port to HDMI Adapter with Au...,211,5.0,4.47
42860198,B004DSX5B6,Gerber Bear Grylls Ultimate Multi-Tool [31-000...,377,5.0,4.57
47850835,B000O8OTNC,Smith's PP1 Pocket Pal Multifunction Sharpener...,3970,4.0,4.03
19631463,B000G6R7B8,Seiko Men's SNK803 Seiko 5 Automatic Watch wit...,558,3.0,3.64
45591762,B0070UFMOW,FiiO E17 Alpen Portable Headphone Amplifier US...,280,5.0,4.52


In [6]:
# Show how purchased counts affect adjusted ratings of one product
np.random.seed(15)

df[
    ['customer_id', 
     'product_id', 
     'product_title', 
     'purchased_counts', 
     'star_rating', 
     'adjusted_rating']
].iloc[np.random.choice(df.index, size=1000, replace=False, )][
    df.product_title == "Seiko Men's SNK803 Seiko 5 Automatic Watch with Beige Canvas Strap"].set_index('customer_id').head()

Unnamed: 0_level_0,product_id,product_title,purchased_counts,star_rating,adjusted_rating
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
8093599,B000G6R7B8,Seiko Men's SNK803 Seiko 5 Automatic Watch wit...,558,5.0,4.64
33416300,B000G6R7B8,Seiko Men's SNK803 Seiko 5 Automatic Watch wit...,558,2.0,3.14
17180774,B000G6R7B8,Seiko Men's SNK803 Seiko 5 Automatic Watch wit...,558,4.0,4.14
43575325,B000G6R7B8,Seiko Men's SNK803 Seiko 5 Automatic Watch wit...,558,4.0,4.14
36655701,B000G6R7B8,Seiko Men's SNK803 Seiko 5 Automatic Watch wit...,558,4.0,4.14


In [7]:
# Condense dataset to only have each unique product and its total average rating
# df_mean = df.groupby(['product_id', 'product_category', 'product_title'])['star_rating'].mean().reset_index()
# df_mean.to_csv('df_mean.csv')

# Group by 'product_id' and calculate the average rating, adjusted rating, and purchased counts for each product
df_mean = df.groupby('product_id').agg(
    product_title=('product_title', 'first'),
    product_category=('product_category', 'first'),
    star_rating=('star_rating', 'mean'),
    adjusted_rating=('adjusted_rating', 'mean'),  # Replace with your adjusted rating column
    purchased_counts=('purchased_counts', 'first')   # Replace with your purchased counts column
).reset_index()

df_mean.to_csv('df_mean.csv')
# Print the condensed DataFrame



In [8]:
df_mean.head()

Unnamed: 0,product_id,product_title,product_category,star_rating,adjusted_rating,purchased_counts
0,140053271X,Barnes & Noble Nook Simple Touch eBook Reader ...,Electronics,3.92,4.1,560
1,B00002N9ER,Maglite Black Universal Mounting Brackets for ...,Tools,4.57,4.4,410
2,B00005N6KG,Sony MDR-W08L Vertical In-The-Ear Headphones,Electronics,3.96,4.05,1367
3,B000065BPB,Sennheiser HD 280 Pro Headphones,Musical Instruments,4.44,4.31,129
4,B0000C9ZBW,Skagen Men's 233LTMB Black Titanium Mesh Brace...,Watches,4.25,4.27,119


In [9]:
class Recommender:
    
    def __init__(self, n=5, adjusted_rating=True):
        
        """Initiate a recommender object by passing the number of recommendations, default is 5.  
        The adjusted rating is the default rating score.  Original rating can be used by passing 
        adjusted_rating=False"""
        
        self.n = n # Number of recommendations to return, default is 5
        self.adjusted_rating = adjusted_rating # Boolean determines if original star rating or adjust rating is used
        # Initiate product variables to display in recommendation results
        self.product_variables = ['product_id', 'product_title', 
                                  'product_category', 'star_rating', 'adjusted_rating', 'purchased_counts']
        
        # Initiate list of recommendations to be sorted by rating scores, original or adjusted
        if self.adjusted_rating: # Set standard sorting criteria to adjusted rating
            rating = 'adjusted_rating'
        else: # Set sorting criteria to originial, or star rating
            rating = 'star_rating'
        self.recommend = df_mean.sort_values(rating, ascending=False)
        
        
    def _filter_by_product_category(self):
        """Filter reccomendations by the product category
        Note: should only be called in 'keyword' method"""
        
        idx = []
        for i in self.recommend.index: # Search through index
            if self.recommend.loc[i, 'product_category'] is not np.nan:
                keyword_search = self.recommend.loc[i, 'product_category'].split(',') # Locate index, product category
                if self.product_category.lower() in str(keyword_search).lower(): # Check if search item in keyword_search
                    idx.append(i) # Place index of row in a list
        self.recommend = self.recommend.loc[idx]
        
    def _filter_by_product_title(self):
        """Filter reccomendations by the product title
        Note: should only be called in 'keyword' method"""
        
        idx = []
        for i in self.recommend.index: # Search through index
            if self.recommend.loc[i, 'product_title'] is not np.nan:
                keyword_search = self.recommend.loc[i, 'product_title'].split(',') # Locate index, product category
                if self.product_title.lower() in str(keyword_search).lower(): # Check if search item in keyword_search
                    idx.append(i) # Place index of row in a list
        self.recommend = self.recommend.loc[idx] 
        
    def return_recommendations(self):
        """Returns a list of the top n recommended products"""
        
        if len(self.recommend) == 0:
            print('No products recommended.')
        elif self.n < len(self.recommend): # Returns top n products from list of recommendations
            print('Top {} recommended products for you:'.format(self.n))
            print(self.recommend.iloc[:self.n][self.product_variables])
        else: # Returns all products if amount found is less than n
            print('Top {} recommended products for you:'.format(len(self.recommend)))
            print(self.recommend[self.product_variables])
            
    # Keyword search filtering recommender module
    def keyword(self, df=df_mean, product_category=None, product_title=None):
        """Keyword search filtering recommendation system.  
        Filters by product_parent (similiar items), product title, product categoy or combination of all."""
        
        self.recommend = df # Assign dataframe
        self.product_variables = ['product_id', 'product_title', 
                                  'product_category', 'star_rating', 'adjusted_rating', 'purchased_counts']
        
        # Assign variables based on user's keyword search
        self.product_title = product_title
        self.product_category = product_category
            
        # Filter by product title
        if self.product_title != None:
            self._filter_by_product_title()
            if len(self.recommend) == 0:
                print('No matching products found for {}'.format(self.product_title))
                return None
                
        # Filter by product category
        if self.product_category != None:
            self._filter_by_product_category()
            if len(self.recommend) == 0:
                print('No matching products found for {}'.format(self.product_category))
                return None
            
        # Sort by rating of interest
        if self.adjusted_rating:
            rating = 'adjusted_rating'
        else:
            rating = 'star_rating'
            
        self.recommend = self.recommend.sort_values(rating, ascending=False)
            
        # Return top n recommendations    
        self.return_recommendations()
        
        return self.recommend

In [10]:
%%time


# Original ratings
kw = Recommender(n=4, adjusted_rating=False)

# Test 1
print('\n-------------------\nTest 1: top products only, no adjusted rating system')
kw.return_recommendations()

# Adjusted rating system
kw = Recommender(n=4)

# Test 2
print('\n-------------------\nTest 2: top products only, adjusted rating system')
kw.return_recommendations()

# Test 3
print('\n-------------------\nTest 3: No keywords, top products only')
kw.keyword()

# Test 4
print('\n-------------------\nTest 4: product title only')
kw.keyword(product_title='Seiko')

# Test 5
print('\n-------------------\nTest 5: product title and category')
kw.keyword(product_category='Watches', product_title='Seiko')



-------------------
Test 1: top products only, no adjusted rating system
Top 4 recommended products for you:
    product_id                                      product_title  \
44  B001TH7GT6  Portronics RCA Component Video Cable -  6 Feet...   
15  B000AJIF4E  Sony MDR7506 Professional Large Diaphragm Head...   
40  B001N1DPDE      Ka-Bar Becker BK2 Campanion Fixed Blade Knife   
13  B0007LGCB8                          KEEN Men's Newport Sandal   

       product_category  star_rating  adjusted_rating  purchased_counts  
44          Electronics         4.70             4.35               128  
15  Musical Instruments         4.64             4.57              2209  
40               Sports         4.63             4.52              1164  
13                Shoes         4.63             4.33               110  

-------------------
Test 2: top products only, adjusted rating system
Top 4 recommended products for you:
    product_id                                      product_title  

Unnamed: 0,product_id,product_title,product_category,star_rating,adjusted_rating,purchased_counts
81,B00756GRUE,Seiko Men's SSC017 Prospex Analog Japanese Qua...,Watches,4.6,4.38,250
17,B000B5OD4I,Seiko Men's SKX007K2 Diver's Automatic Watch,Watches,4.52,4.35,228
16,B000B5MI3Q,Seiko Men's SKX007K Diver's Automatic Watch,Watches,4.43,4.33,277
78,B006Y9BVRM,Seiko Men's SSC021 Solar Diver Chronograph Watch,Watches,4.54,4.32,129
10,B00068TJM6,Seiko Men's SNA411 Flight Alarm Chronograph Watch,Watches,4.42,4.32,246
29,B000OP1M6M,Seiko Men's SKX009K2 Diver's Analog Automatic ...,Watches,4.4,4.3,143
22,B000G6R7B8,Seiko Men's SNK803 Seiko 5 Automatic Watch wit...,Watches,4.27,4.27,558
27,B000LTAY1U,Seiko Men's SNK805 Seiko 5 Automatic Stainless...,Watches,4.16,4.19,1234
76,B006CHML4I,Seiko Men's SNK807 Seiko 5 Automatic Stainless...,Watches,4.05,4.16,637
50,B002SSUQFG,Seiko Men's SNK809 Seiko 5 Automatic Stainless...,Watches,4.09,4.13,1936
