# Recommendations Using Hybrid solutions <sup>1</sup>

Lots of examples of systems that try to combine the strengths of the two main approaches.
This can be done in a number of ways:

- Combine the predictions of a content-based system and a collaborative system.
- Incorporate content-based techniques into a collaborative approach.
- Incorporate collaborative techniques into a content-based approach.
- Unifying model.

Possible

## Imports

In [1]:
import numpy as np
import pandas as pd

## Load Data

Only loading a subset of the original data set for proof of concept reasons.

In [2]:
# 80/20 split earlier
df_train = pd.read_csv('../Data/training_data_subset.csv')
df_test = pd.read_csv('../Data/testing_data_subset.csv')

In [3]:
# Also, load the entire original dataset to make a user index.
# Note: If using the full train/test dataset several of the methods below can takes 2-6 hours to run.
df_original = pd.read_csv('../Data/eda_data.csv')

In [4]:
df_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1083170 entries, 0 to 1083169
Data columns (total 14 columns):
 #   Column      Non-Null Count    Dtype  
---  ------      --------------    -----  
 0   category    1083170 non-null  object 
 1   title       1083170 non-null  object 
 2   also_buy    926546 non-null   object 
 3   brand       1075197 non-null  object 
 4   rank        1039163 non-null  object 
 5   also_view   577060 non-null   object 
 6   main_cat    1081896 non-null  object 
 7   price       750231 non-null   float64
 8   asin        1083170 non-null  object 
 9   overall     1083170 non-null  float64
 10  verified    1083170 non-null  bool   
 11  reviewerID  1083170 non-null  object 
 12  vote        149247 non-null   float64
 13  style       559212 non-null   object 
dtypes: bool(1), float64(3), object(10)
memory usage: 108.5+ MB


In [5]:
import logging
logging.basicConfig(filename='../Logging/troubleshoot_index.log', filemode='w+', level=logging.DEBUG)

### Create a user index
To retrieve information given a specific user_id in a more convenient way.

In [6]:
# The key features and ids from earlier analysis.
# Note that additional features could be included if desired; except for the target feature: 'overall'.
user_info = df_original[['title', 'also_buy', 'also_view', 'price', 'rank', 'asin', 'reviewerID', 'vote']]
user_info.set_index('reviewerID', inplace=True)
user_info.head(2)

Unnamed: 0_level_0,title,also_buy,also_view,price,rank,asin,vote
reviewerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A1J205ZK25TZ6W,Lipton Yellow Label Tea (loose tea) - 450g,"['B00886E4K0', 'B00CREXSHY', 'B001QTRGAQ', 'B0...","['B00CREXSHY', 'B001QTRGAQ', 'B000JSQK70', 'B0...",12.46,"30,937 in Grocery & Gourmet Food (",4639725043,8.0
ACOICLIJQYECU,Lipton Yellow Label Tea (loose tea) - 450g,"['B00886E4K0', 'B00CREXSHY', 'B001QTRGAQ', 'B0...","['B00CREXSHY', 'B001QTRGAQ', 'B000JSQK70', 'B0...",12.46,"30,937 in Grocery & Gourmet Food (",4639725043,9.0


In [7]:
def compute_rmse(y_pred, y_true):
    """ Compute Root Mean Squared Error. """
    
    return np.sqrt(np.mean(np.power(y_pred - y_true, 2)))

In [8]:
def evaluate(estimate_f):
    """ RMSE-based predictive performance evaluation with pandas. """
    
    ids_to_estimate = zip(df_test.reviewerID, df_test.asin)
    estimated = np.array([estimate_f(u,i) for (u,i) in ids_to_estimate])
    real = df_test.overall.values
    return compute_rmse(estimated, real)

In [9]:
class CollaborativeRecommendation:
    """ Collaborative filtering using an implicit sim(u,u'). """

    def __init__(self, feature):
        """ Prepare data structures for estimation. """
        self._feature = feature
        self.means_by_feature = df_train.pivot_table('overall', index='asin', columns=self.feature)

    @property
    def feature(self):
        return self._feature
        
    def estimate(self, reviewer_id, product_id):
        """ Mean ratings by other users of the same feature. """
        
        if product_id not in self.means_by_feature.index: 
            return 4.0
        
        user_feature = user_info.loc[reviewer_id, self.feature]
        if ~np.isnan(self.means_by_feature.loc[product_id, user_feature]):
            return self.means_by_feature.loc[product_id, user_feature]
        else:
            return self.means_by_feature.loc[product_id].mean()
        

In [10]:
gender = CollaborativeRecommendation('also_view')
print('RMSE for Gender: %s' % evaluate(gender.estimate))

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index([                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         nan,\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                nan,\n       '['B015GBCSWK', 'B01M4P5L9A', 'B01KXRZPPE', 'B07BL69CD2', 'B079YZGVQ5', 'B06W2K24CZ', 'B07BB1VV9K', '1508421064', 'B071SH212S', 'B06ZYSLT5D', 'B01M2A5BL2', 'B01LVZ7N8G', 'B00LB6L87G', 'B074BDFXLS', 'B00K7A2VRS', 'B06Y5K54F9', 'B079NPB2X8', 'B075LFVQGR', 'B0176Q6RE8', 'B000MGR302', 'B00XB2YFBE', 'B00XQ2XGAA', 'B00K6JUG4K', 'B005KG7EDU', 'B0774P2MWS', 'B01LXADO9Z', 'B01A1G47L0', 'B07FMH447D', 'B01N4I84PY', 'B071VGJTBH', 'B071S8D69C', 'B07FKRMT81', 'B074VH9C6J', 'B07C7S9Q4F', 'B0127MNEX8', 'B07CKB9291', 'B0761S4B22', 'B01M3Q4086', 'B077SWQFXR', 'B07CGWQ5YG', 'B0778JQSJ5', 'B00NLQXXDQ', 'B00KSMVZJ0', 'B01I2HUUS4', 'B077LHTLZD', 'B0176QKIJI', 'B0759X2RLL', 'B0019LRY8A', 'B00NLR1PX0', 'B01LDNBAC4', 'B00J074W7Q']'],\n      dtype='object', name='also_view'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"