In [129]:
import pandas as pd

In [161]:
import graphlab as gl

In [165]:
from graphlab.toolkits.recommender import factorization_recommender

In [133]:
df = pd.read_pickle('reviews.pkl')

First Model trained is simply checking to root mean squared error on the SFrame based solely on user, generalizing item by brand, and included what they rated it. Base RMSE found to be 0.95111

In [114]:
sf = gl.SFrame(df[['userId', 'brandId', 'rating']])
model = gl.toolkits.recommender.create(sf, user_id = 'userId', item_id='brandId', target='rating')

This non-commercial license of GraphLab Create for academic use is assigned to bta.baltazar@gmail.com and will expire on October 21, 2017.


[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1480121902.log


Next model specified the item_id to be the exact product ID instead of brand ID and lowered the RMSE to 0.509202

In [175]:
sf = gl.SFrame(df[['userId', 'productId', 'rating']])
m = gl.toolkits.recommender.create(sf, user_id = 'userId', item_id='productId', target='rating')

Next model trained incorporated user specific attributes as well as item specific data, effectively lowering the RMSE to 0.261932

In [176]:
sf = gl.SFrame(df[['userId', 'productId', 'rating']])
user_info = gl.SFrame(df[['username', 'userId', 'height', 'weight', 'bodyfat', 'slug']])
item_info = gl.SFrame(df[['brandName', 'name', 'brandId', 'productId', 'totalItems']])

In [171]:
m = factorization_recommender.create(sf, user_id='userId', item_id='productId', target='rating',\
                                     user_data=user_info, item_data=item_info)

In [177]:
m.show()

Canvas is accessible via web browser at the URL: http://localhost:60407/index.html
Opening Canvas in default web browser.


In [178]:
m.coefficients

{'intercept': 8.747326110140092, 'productId': Columns:
 	productId	str
 	linear_terms	float
 	factors	array
 
 Rows: 2496
 
 Data:
 +-------------+----------------+-------------------------------+
 |  productId  |  linear_terms  |            factors            |
 +-------------+----------------+-------------------------------+
 | prod1620022 | -3.16549658775 | [0.0383839830756, 0.841843... |
 | prod1620026 | -3.32721018791 | [0.818790435791, -0.085964... |
 | prod4080003 | -3.32938838005 | [-0.0225170850754, -0.0425... |
 | prod2990043 | -3.2190527916  | [0.0838851481676, 0.326937... |
 | prod3100003 | -3.40285897255 | [0.0376260131598, -0.00815... |
 | prod2170077 | -3.66573357582 | [-0.12679669261, 0.9692271... |
 | prod1010004 | -3.33515977859 | [-0.562129795551, 1.012127... |
 | prod1910038 | -3.59832024574 | [0.454341650009, -0.243273... |
 | prod3710112 | -3.3422369957  | [0.0173684805632, -0.05488... |
 | prod1090006 | -3.62442660332 | [0.278952807188, 0.0376635... |
 +---------

This model parses only nutrional/food based products and lowered the RMSE to 0.12927

In [222]:
df = pd.read_pickle('nutrition.pkl')
sf = gl.SFrame(df[['userId', 'productId', 'rating']])
user_info = gl.SFrame(df[['username', 'userId', 'height', 'weight', 'bodyfat']])
item_info = gl.SFrame(df[['brandName', 'name', 'brandId', 'productId', 'totalItems']])
m = factorization_recommender.create(sf, user_id='userId', item_id='productId', target='rating',\
                                     user_data=user_info, item_data=item_info)

In [188]:
import graphlab as gl
from sklearn.decomposition import NMF
import pandas as pd
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from string import punctuation

In [195]:
def topics(n_topics):
    corpus = []
    stemmer = SnowballStemmer('english')
    for prodId in df.productId.unique():
        docs = []
        for doc in df[df.productId == prodId].text:
            if doc:
                doc = ' '.join(stemmer.stem(word.strip(punctuation).lower()) \
                                                for word in doc.split())
                docs.append(doc)
        corpus.append(' '.join(docs))
    tfidf_vectorizer = TfidfVectorizer(stop_words='english')
    tfidf = tfidf_vectorizer.fit_transform(corpus)
    nmf = NMF(n_components = n_topics).fit(tfidf)
    n_topic_words = n_topics
    for topic_idx, topic in enumerate(nmf.components_):
        print 'Topic #{}:' .format(topic_idx)
        print ' '.join([tfidf_vectorizer.get_feature_names()[i] for i \
                            in topic.argsort()[:-n_topic_words -1:-1]])
        print()
    return str(n_topics) + 'topics' 

In [196]:
topics(13)

Topic #0:
product notic use day week work differ test feel ani strength booster gain
()
Topic #1:
protein chocol mix milk whey powder vanilla tast shake flavor great best use
()
Topic #2:
workout pre pump energi focus scoop feel preworkout tri caffein great use c4
()
Topic #3:
bar protein tast chocol quest eat like snack good textur cooki flavor chewi
()
Topic #4:
vitamin multi multivitamin pill day great price need swallow easi energi feel pak
()
Topic #5:
sleep night wake zma asleep help dream feel fall rest bed hour melatonin
()
Topic #6:
bcaa workout recoveri amino glutamin product great post use help sore dure mix
()
Topic #7:
tast flavor like mix good water tri drink bad just scoop realli pretti
()
Topic #8:
fish oil burp omega fishi dha epa pill cla great softgel high supplement
()
Topic #9:
creatin mix unflavor strength monohydr gain water shake price use notic muscl mono
()
Topic #10:
product good great veri price work excel qualiti recommend use cheap best high
()
Topic #11:


'13topics'

In [197]:
topics(14)

Topic #0:
product notic week day use differ test work feel booster strength ani gain increas
()
Topic #1:
protein chocol mix milk whey powder tast vanilla shake flavor great best tri use
()
Topic #2:
workout pre pump energi focus scoop feel preworkout tri caffein use c4 like gym
()
Topic #3:
bar protein tast chocol quest eat like snack textur cooki flavor chewi great tri
()
Topic #4:
vitamin multi multivitamin pill day need swallow great easi feel price pak energi best
()
Topic #5:
sleep night wake zma asleep help dream feel fall rest bed hour melatonin deep
()
Topic #6:
bcaa workout recoveri amino glutamin post sore help dure mix use muscl day drink
()
Topic #7:
tast flavor like mix water drink tri just bad scoop sweet don realli green
()
Topic #8:
fish oil burp omega fishi dha epa pill cla softgel high tast supplement qualiti
()
Topic #9:
creatin mix unflavor strength monohydr gain water shake price notic muscl mono use creapur
()
Topic #10:
good veri price product pretti qualiti ene

'14topics'

In [198]:
topics(15)

Topic #0:
product notic day week use test differ work feel booster ani gain strength increas bottl
()
Topic #1:
protein chocol mix milk whey powder vanilla tast shake flavor great best use tri love
()
Topic #2:
pump workout pre scoop focus tri preworkout c4 best like beta use feel alanin gym
()
Topic #3:
bar protein tast chocol quest eat like snack textur cooki flavor chewi great tri peanut
()
Topic #4:
vitamin multi multivitamin pill day swallow need easi price great pak best feel onli miner
()
Topic #5:
sleep night wake zma asleep help dream feel fall rest bed melatonin deep hour befor
()
Topic #6:
bcaa workout recoveri amino glutamin post sore help dure mix use muscl supplement drink day
()
Topic #7:
tast flavor like mix tri water just bad drink scoop don sweet realli green horribl
()
Topic #8:
fish oil burp omega fishi dha epa pill softgel cla high tast supplement qualiti swallow
()
Topic #9:
creatin mix unflavor strength monohydr gain water shake price notic muscl mono use creapur

'15topics'

In [209]:
sf100 = gl.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], \
                         'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"], \
                         'rating': [1, 3, 2, 5, 4, 1, 4, 3]})
m100 = gl.factorization_recommender.create(sf100, user_id='user_id', item_id='item_id', target='rating')

In [264]:
recent_data = gl.SFrame({'userId':[122],'username':['test'], 'height': ['60'], 'weight': ['110'], 'bodyfat': ['32.0']})

m.recommend(users=[122], new_user_data=recent_data)

userId,productId,score,rank
122,28322,13.1978851373,1
122,27350,13.1895774426,2
122,prod1560084,13.0548508891,3
122,prod2140022,13.0026751979,4
122,prod2030074,12.9391159817,5
122,28321,12.9166250296,6
122,prod2690002,12.7456378103,7
122,prod1390054,12.7440938766,8
122,prod1560080,12.6116011596,9
122,25351,12.5038680675,10


In [269]:
df[df.productId == '28322']


Unnamed: 0,brandName,brandId,name,productId,username,userId,_id,height,weight,bodyfat,totalItems,text,date,modDate,slug,verifiedBuyerRating,description,updateStatusReason,rating
48418,Walden Farms,BRAND_WALDEN_FARMS,Barbeque Sauce,28322,nicoleliving,92641122,5828d4b10cf2a231f23cb158,67.0,125.0,12.0,7,not the same as sugary bad for you bbq but it ...,{u'$numberLong': u'1479070897000'},{u'$numberLong': u'1479070897000'},nicoleliving,8.6,"No Carbs, Sugar Free And Calorie Free!",,10.0
48419,Walden Farms,BRAND_WALDEN_FARMS,Barbeque Sauce,28322,alexsilverfagan,59295882,5551fa2c0cf2be3a93cc2fff,64.0,115.0,13.4,7,I have been using this product for a year now ...,{u'$numberLong': u'1431435820000'},{u'$numberLong': u'1431435820000'},alexsilverfagan,8.6,"No Carbs, Sugar Free And Calorie Free!",,10.0
48420,Walden Farms,BRAND_WALDEN_FARMS,Barbeque Sauce,28322,lilly2026,93787712,543eb66a0cf25548b189a2af,,,,7,,{u'$numberLong': u'1413396074000'},{u'$numberLong': u'1413396074000'},lilly2026,8.6,"No Carbs, Sugar Free And Calorie Free!",,1.0
48421,Walden Farms,BRAND_WALDEN_FARMS,Barbeque Sauce,28322,mrq1324,26033741,53a4bfb50cf216b9260a9b8b,68.0,188.2,17.0,7,This stuff is amazing!!! So impressed. Delicio...,{u'$numberLong': u'1403305909000'},{u'$numberLong': u'1403305909000'},mrq1324,8.6,"No Carbs, Sugar Free And Calorie Free!",,10.0
48422,Walden Farms,BRAND_WALDEN_FARMS,Barbeque Sauce,28322,Kiraavw,64988412,51f9a0b10cf25611125e688b,66.0,176.0,24.0,7,This is very tasty but loaded with sodium so u...,{u'$numberLong': u'1375314097000'},{u'$numberLong': u'1375314097000'},Kiraavw,8.6,"No Carbs, Sugar Free And Calorie Free!",,9.0
48423,Walden Farms,BRAND_WALDEN_FARMS,Barbeque Sauce,28322,quiqueaguilarm1,58871442,51a7c7740cf23b2043039b47,67.72,155.65,16.0,7,"The taste of chicken,meat,and fish with this i...",{u'$numberLong': u'1367075442000'},{u'$numberLong': u'1376087847000'},quiqueaguilarm1,8.6,"No Carbs, Sugar Free And Calorie Free!",,10.0
48424,Walden Farms,BRAND_WALDEN_FARMS,Barbeque Sauce,28322,peteni5588,31571262,51a7c76d0cf23b2043020066,67.0,180.0,,7,I thought that this tasted really good. Burned...,{u'$numberLong': u'1301968427000'},{u'$numberLong': u'1376081480000'},peteni5588,8.6,"No Carbs, Sugar Free And Calorie Free!",,10.0


In [238]:
df[df.productId == '25351']

Unnamed: 0,brandName,brandId,name,productId,username,userId,_id,height,weight,bodyfat,totalItems,text,date,modDate,slug,verifiedBuyerRating,description,updateStatusReason,rating
25963,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,dvalego,139575281,581a52600cf2729002346a26,,,,195,Highly recommend. Other than all the good stuf...,{u'$numberLong': u'1478120031000'},{u'$numberLong': u'1478120031000'},dvalego,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,10.000000
25964,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,melc79,46866871,5818e5ce0cf2b35b2ed9b971,68.0,163.0,32.0,195,This protein is very light and mixes really we...,{u'$numberLong': u'1478026702000'},{u'$numberLong': u'1478026702000'},melc79,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,8.000000
25965,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,carcar5,74662731,581013b50cf23884c0c97a90,,,,195,,{u'$numberLong': u'1477448629000'},{u'$numberLong': u'1477448629000'},carcar5,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,8.000000
25966,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,frankjfjr,100015082,580fa0ed0cf2d69400e6d530,,,,195,I can't find another protein with as many vita...,{u'$numberLong': u'1477419245000'},{u'$numberLong': u'1477419245000'},frankjfjr,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,7.000000
25967,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,prossford,142132452,580084270cf2578e700fbec8,,,,195,Creamy vanilla taste more like bitter cookies ...,{u'$numberLong': u'1476428839000'},{u'$numberLong': u'1476428839000'},prossford,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,3.000000
25968,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,OmarXV,115578291,57f34ca60cf298d361bcad70,,165.0,,195,,{u'$numberLong': u'1475562662000'},{u'$numberLong': u'1475562662000'},OmarXV,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,10.000000
25969,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,jamesez2007,81692162,57e848490cf2556e85fcf07a,,,,195,Great protein overall! Zero/low carb. The co...,{u'$numberLong': u'1474840649000'},{u'$numberLong': u'1474840649000'},jamesez2007,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,9.000000
25970,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,micahthompson35,136531551,57d960960cf2bff74ebdf4a5,,189.0,,195,I love this protein powder! It has a great tas...,{u'$numberLong': u'1473863830000'},{u'$numberLong': u'1473863830000'},micahthompson35,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,10.000000
25971,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,FrowningAngel,110928141,57cee2ba0cf2590ba94c2188,,219.0,,195,Ordering my second 7.5 lb. tub. The fact that ...,{u'$numberLong': u'1473176250000'},{u'$numberLong': u'1473176250000'},FrowningAngel,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,10.000000
25972,Nature's Best,BRAND_NATURE_S_BEST,Zero Carb Isopure,25351,kgurkovskiy,120196661,57bc57ef0cf2dff32f9e9a44,77.0,280.0,25.0,195,I have tried many other isolate protein but IS...,{u'$numberLong': u'1471961071000'},{u'$numberLong': u'1471961071000'},kgurkovskiy,8.8,With 50 Grams Of Protein From 100% Whey Protei...,,10.000000
