# A Recommendation System from Reddit

In this report, I aim to build a recommendation system for Reddit users (refered to as Redditors) using only their Reddit username.

# Motivation

In the age of the internet consumers have an abundance of choice. From movies to songs to every concievable product, it seems like we have access to more choice than we can every hope to exercise. To make the traversal of this infromation more manageable, many companies build recommendation systems to help their users find more meaningful content or products, with great success. Platforms like Netflix and YouTube would simply not exist if they could not match users with personalized content given their vast content libraries.

However, the value we gain from these recommendation engines come at a cost. These websites collect a tremendous amount of data from us. Google and Facebook particularly track our every movement, not just on the internet but, since the proliferation of smartphones, in the physical world as well. In addition, this is a huge barrier to entry for startups since the information that sites like Amazon and Facebook have about their customers allows them to provide better suggestions than new platforms.

This is why I set out to see if I could build a recommendation system for users who I have not collected any prior information from. I plan to explore how we can leverage existing, public data about a user to suggest interesting items they may find interesting. I wish to deploy this into a site that encourages a community to share interests and ideas together rather than with a corporation that is trying to sell them something.

# The Products

First of all, I had to select which products I wanted to suggest. Netflix suggests movies, YouTube suggests videos and Spotify suggests songs, but Amazon suggests all of these things and a whole lot more. However, Amazon's product discovery algorithm is mainly built on items that someone has already purchased. As a result, a user only gets recommendations based on products they are familiar with. This is to Amazon's advantage, since these products likely have a very high conversion rate. However, it does not lend itself well to finding novel products that are based on our interests rather than our purchase history. 

# Reddit as a window to the soul

Reddit is one of the most visited sites on the internet. It is a meta forum where users can post links to interesting items for every topic imaginable and discuss them with millions of other people. One of the strongest reasons why Reddit appeals to so many is because of its anonymity. It is one of the few forums where you can sign up without an email address, and is a far cry from something like Facebook where you have profile pictures and family information. Therefore, redditors can be open and expressive about their interests. Indeed, their personalities are associated with their Reddit usernames rather than their real names. This may lead to trolling but that is outside the scope of this exercise. 

The true value of Reddit for this tool is that redditors comment on different topics, and we can access these topics. This means that by accessing the topics where they comment (known as subreddits), we can see which topics they are interested in. A clear limitation of this is that certain redditors only browse subreddits without commenting on them (colloquially known as 'lurking'). There is no way to access this information. Indeed, I do not even think that reddit makes the subreddits that a user is subscribed to public. Therefore, we have to use the comments and get the subreddits from them.

# Collaborative Filtering

There are a variety of ways that recommender systems can work. In order to harness the power of the social fabric of Reddit, I will use 'Collaborative Filtering'. Collaborative Filtering is a way to utilize user inputs to suggest products to similar users. Usually, it is done in the manner that if User 1 likes X,Y and Z and User 2 likes X and Y, there is a high probability that User 2 likes Z. This is the way that Amazon does it. However, we do not have access to the users purchase history and, like mentioned above, we want to suggest novel products that are independent of our target user's purchase history. Therefore, we extend the same concept to subreddits rather than products. In our case, we postulate that if User X and User Y comment on similar topics, they have similar tastes and therefore we can recommend the products that User X likes to User Y.

# Getting Data

For Collaborative Filtering to work, however, we need a seed dataset of products to suggest to our users. This is where most platforms fail in their recommender systems. Without this critical mass of data, there would be nothing to suggest. However, upon research, I found a particular subreddit that provided me with a workaround; r/randomactsofamazon. 

R/RandomActsOfAmazon is an offshoot of an idea that the Reddit community has had for a long time, the act of randomly gifting something to another redditor. It originated with R/RandonActsOfPizza where, like the title would suggest, redditors would gift each other pizza delivered to their doorstep. R/RandomActsOfAmazon takes this concept further and allows redditors to post wishlists of things they want from Amazon. These redditors then hope that some stranger might like them and buy them a gift from their wishlist. This data is perfect for me as this contains the products that a particular redditor wants/thinks are cool. 

After contacting the moderators for r/randomactsofamazon, I was referred to http://n8fq.org/temp/links.sql This is an sql dump of the entire database of active users on r/randomactsofamazon. It is a public site that is used to power functionality for r/randomactsofamazon and I was given permission to use it. I was able to parse this sql dump into a database and convert it into a csv file in another script (see github repo). I will use this csv file for my recommendation engine. I will also encourage users to gift their 'Reddit Twin' (i.e. the redditor in my database who is thier closest match) something from their wishlist.  However, since I do not have informed consent for every reddit user in my database, I will not disclose the reddit id of their twin. Amazon holds the address of the wishlist owner secret to allow for safe, anonymous gifting. 

Now, lets get to the good stuff.

Lets import our dependecies. In this case, we need pandas to manipulate our data and PRAW, the Python Reddit API Wrapper.
Find out more about PRAW at https://praw.readthedocs.io/en/latest/

In [203]:
import pandas as pd
import praw

We instantiate our Reddit instance. We have registered this script with Reddit and obtained a key (client secret). The account details are mine.

In [204]:
reddit = praw.Reddit(client_id = 'tOwGdbmWXETjEA',
                    client_secret = 'FPfgf52wvSG0TN-y_wlTRavGlpU',
                    username = 'amazonrecommender',
                    password = 'amazon',
                    user_agent='test')

In [207]:
getsubs('abhi91')

{'4chan',
 'Barca',
 'Cricket',
 'CryptoCurrency',
 'Entrepreneur',
 'FIFA',
 'FantasyPL',
 'Games',
 'Gunners',
 'Jokes',
 'LiverpoolFC',
 'MCFC',
 'Overwatch',
 'PS4Deals',
 'ProgrammerHumor',
 'The_Donald',
 'TrueReddit',
 'UNBGBBIIVCHIDCTIICBG',
 'asoiaf',
 'churning',
 'consulting',
 'cringe',
 'datascience',
 'financialindependence',
 'gameofthrones',
 'hearthstone',
 'iamverysmart',
 'investing',
 'learnmachinelearning',
 'me_irl',
 'movies',
 'pic',
 'poker',
 'politics',
 'reactiongifs',
 'rickandmorty',
 'soccer',
 'videos',
 'wallstreetbets',
 'whowouldwin'}

Lets import the data. We will have 3 columns, the username a raw link and a processed link that I have cleaned up and added tracking information to. This will allow me to monitor the traffic being sent to Amazon when the site is published. We will use 'New Url' in this program.

In [199]:
df = pd.DataFrame.from_csv('newlist.csv')
df.head()


Unnamed: 0,User,Link,New Url
0,Ali-Sama,http://amzn.com/w/2ZCUBM2WM9JZQ,https://www.amazon.com/gp/registry/wishlist/2Z...
1,Rysona,http://amzn.com/w/2AOCVWCB8FPZF,https://www.amazon.com/gp/registry/wishlist/2A...
2,G0ATLY,http://smile.amazon.com/registry/wishlist/3RB6...,http://smile.amazon.com/registry/wishlist/3RB6...
3,dancemasterv,http://a.co/5xAfzvI,https://www.amazon.com/registry/wishlist/2JKVV...
4,chunkopunk,http://amzn.com/w/1Z4QEA0MGEXBU,https://www.amazon.com/gp/registry/wishlist/1Z...


We now write a function that allows us to utilize PRAW to get the subreddits that a redditor has commented in. To keep in touch with the redditors current interests, we will only identify the most recent 500 comments. Note that we are using a set so that we have unique subreddits and these are not in any order. Also we need to normalize these by removing the most popular subreddits as well as randomactsofamazon. This will ensure that the subreddits that influence unique tastes are given higher weight.

In [200]:
defaultsubs = set(['announcements','funny','AskReddit','todayilearned','science','pics','IAmA','randomactsofamazon'])
def normalizesubs(usersubs):
    normalizedsubs = usersubs - defaultsubs
    return (normalizedsubs)
def getsubs(username):
    subs = set()
    redditor = reddit.redditor(str(username))
    for comment in redditor.comments.new(limit=500):
                    subs.add(str(comment.subreddit))                        
    #normalize subs by removing most popular subs.
    subs=normalizesubs(subs)
    return(subs)

In order to give our user a good experience, we need to minimize loading times for our recommendations. The most time consuming part of this whole script is extracting all the comments from a user. We can minimize this time by pulling a list of subreddits that our database of redditors have commented in as of today (12/10/2017). While this will mean that our recommendation engine is not dynamic with the changing interests of our database redditors, it significantly cuts down on loading time. We can update this list of subreddits at regular intervals in the future. We will append this list of subreddits to our dataframe. 

In [165]:
#takes a long time. Just import redditproducts.csv and the code in that cell
subslist = []
for index,rows in df.iterrows():
    user = rows['User']
    print('Trying for user number: ',index,' username: ',user)
    try:
        subslist.append(getsubs(user))
    except :
        #if there is an error, the user has deleted their reddit account. Store Null and continue with the loop
        subslist.append('Null')
        print('user not there')
        continue
df['Subs']=subslist

Trying for user number:  0  username:  Ali-Sama
Trying for user number:  1  username:  Rysona
Trying for user number:  2  username:  G0ATLY
Trying for user number:  3  username:  dancemasterv
Trying for user number:  4  username:  chunkopunk
Trying for user number:  5  username:  Browntizzle
Trying for user number:  6  username:  rockinDS24
Trying for user number:  7  username:  rsgamg
Trying for user number:  8  username:  82364
Trying for user number:  9  username:  RyanOver9000
Trying for user number:  10  username:  L_Cranston_Shadow
Trying for user number:  11  username:  neuromorph
Trying for user number:  12  username:  havechanged
Trying for user number:  13  username:  TAPorter
Trying for user number:  14  username:  cj151695
Trying for user number:  15  username:  NibbleFish
Trying for user number:  16  username:  ninja_nicci
Trying for user number:  17  username:  ShiroiMana
Trying for user number:  18  username:  mitsimac
Trying for user number:  19  username:  FirstLadyOfB

Trying for user number:  159  username:  dnd1980
Trying for user number:  160  username:  TheRubyRedPirate
Trying for user number:  161  username:  SardonicKiller
Trying for user number:  162  username:  kramdiw
Trying for user number:  163  username:  wirette
Trying for user number:  164  username:  ThatGilbertKid
Trying for user number:  165  username:  zzddjj
Trying for user number:  166  username:  cinnabubbles
Trying for user number:  167  username:  digitalyss
user not there
Trying for user number:  168  username:  charlimonster
Trying for user number:  169  username:  RumpleAndBelle
Trying for user number:  170  username:  Aerys1
Trying for user number:  171  username:  SinnerOfAttention
Trying for user number:  172  username:  doublestop23
Trying for user number:  173  username:  LizziPizzo
Trying for user number:  174  username:  Thesmy
Trying for user number:  175  username:  Gedrean
Trying for user number:  176  username:  Fr_Time
Trying for user number:  177  username:  kle

Trying for user number:  314  username:  jenni5
Trying for user number:  315  username:  SatansUterus
Trying for user number:  316  username:  MrZissman
Trying for user number:  317  username:  swallowtails
Trying for user number:  318  username:  GreatCatch
Trying for user number:  319  username:  sarskatt
Trying for user number:  320  username:  FaeryLynne
Trying for user number:  321  username:  Kneecoleelaine
Trying for user number:  322  username:  whynot1991
Trying for user number:  323  username:  pinkfluffs
Trying for user number:  324  username:  Fnerb
Trying for user number:  325  username:  runslow
Trying for user number:  326  username:  LadyDarkKitten
Trying for user number:  327  username:  rubydoobiedooooo
Trying for user number:  328  username:  LadyOops
Trying for user number:  329  username:  codismycopilot
Trying for user number:  330  username:  EskimoPrincess
Trying for user number:  331  username:  crinnie
Trying for user number:  332  username:  Catainia
Trying f

Trying for user number:  470  username:  AnguisetteAntha
Trying for user number:  471  username:  mskalak
Trying for user number:  472  username:  annaleiia
Trying for user number:  473  username:  Legion991
Trying for user number:  474  username:  lollialice
Trying for user number:  475  username:  Beelazyy
Trying for user number:  476  username:  duckingcluttered
Trying for user number:  477  username:  katier127
Trying for user number:  478  username:  Jonesxj
Trying for user number:  479  username:  bettyellen
Trying for user number:  480  username:  The_Ky_Guy
Trying for user number:  481  username:  slaytebastardes
user not there
Trying for user number:  482  username:  UnicornBomber
Trying for user number:  483  username:  maybeawolf
Trying for user number:  484  username:  DrayKitty1331
Trying for user number:  485  username:  acwellen
Trying for user number:  486  username:  Iappreciatecats
Trying for user number:  487  username:  clorissareed
Trying for user number:  488  use

Trying for user number:  624  username:  teenaamariee
Trying for user number:  625  username:  LoverOLife
Trying for user number:  626  username:  Ryanestrasz
Trying for user number:  627  username:  OisinS
Trying for user number:  628  username:  theonewiththetits
Trying for user number:  629  username:  Resurrected123
Trying for user number:  630  username:  cimbrag90
Trying for user number:  631  username:  Makovu
Trying for user number:  632  username:  Nex_renegade
Trying for user number:  633  username:  LauraMiggs
Trying for user number:  634  username:  happywalks
Trying for user number:  635  username:  LorraineRenee
Trying for user number:  636  username:  Sconniebuffalo
Trying for user number:  637  username:  Chunksmommy
Trying for user number:  638  username:  cxbu
Trying for user number:  639  username:  Porcelain_princess89
user not there
Trying for user number:  640  username:  MichiganCubbie
Trying for user number:  641  username:  maikeloco
Trying for user number:  64

Trying for user number:  777  username:  JacksBleedingColon
Trying for user number:  778  username:  AntisocialOatmeal
Trying for user number:  779  username:  termiteaward
user not there
Trying for user number:  780  username:  pumpkin-cat
Trying for user number:  781  username:  narcicide
Trying for user number:  782  username:  kukolsghost
Trying for user number:  783  username:  Elijahs-Wood
Trying for user number:  784  username:  lazzysunflowerr
Trying for user number:  785  username:  babyraspberry
Trying for user number:  786  username:  ElephantRLife
Trying for user number:  787  username:  Stuffferz
Trying for user number:  788  username:  jh6196
Trying for user number:  789  username:  dinosaurweasel
Trying for user number:  790  username:  girlwhow8d
Trying for user number:  791  username:  Sonylicious
Trying for user number:  792  username:  Harrywheeler_
Trying for user number:  793  username:  inputmethod
Trying for user number:  794  username:  chartreuselion
Trying for

Trying for user number:  930  username:  autumnfalln
Trying for user number:  931  username:  cstuekey87
Trying for user number:  932  username:  Jenwith1N
Trying for user number:  933  username:  elpese
Trying for user number:  934  username:  Spiritimvu
Trying for user number:  935  username:  rosecrayons
Trying for user number:  936  username:  Blackmaille
Trying for user number:  937  username:  relish1922
Trying for user number:  938  username:  UMKcentersnare
Trying for user number:  939  username:  BQJJ
Trying for user number:  940  username:  pummelo4l
Trying for user number:  941  username:  author124
Trying for user number:  942  username:  TacticalJok3r
Trying for user number:  943  username:  killajay41889
Trying for user number:  944  username:  konamiko
Trying for user number:  945  username:  twiztdfred
Trying for user number:  946  username:  VeganMinecraft
Trying for user number:  947  username:  Dracoprimus
Trying for user number:  948  username:  ChiefMcClane
Trying 

Trying for user number:  1083  username:  Kite_Moonwall
Trying for user number:  1084  username:  Savascha
Trying for user number:  1085  username:  verissey
Trying for user number:  1086  username:  The_Quantum_Moose
Trying for user number:  1087  username:  stefani187
Trying for user number:  1088  username:  abby080798
Trying for user number:  1089  username:  247naptime
Trying for user number:  1090  username:  transemacabre
Trying for user number:  1091  username:  Candicehxo
Trying for user number:  1092  username:  MissPerry
Trying for user number:  1093  username:  ninehundredways
Trying for user number:  1094  username:  anaesthetic
Trying for user number:  1095  username:  Kandydish
Trying for user number:  1096  username:  Chain-smokers
user not there
Trying for user number:  1097  username:  DaRoyalJester
Trying for user number:  1098  username:  ReapeRx124
Trying for user number:  1099  username:  Mrs_partyrocq
Trying for user number:  1100  username:  HuggingTheJellyfish


Trying for user number:  1234  username:  Psychovore
Trying for user number:  1235  username:  Squishee-Face
Trying for user number:  1236  username:  LucyGoosey5
Trying for user number:  1237  username:  Clquigs
Trying for user number:  1238  username:  Lord0fTheHarem
user not there
Trying for user number:  1239  username:  skekVex
Trying for user number:  1240  username:  MadgePadge
Trying for user number:  1241  username:  Travis100
Trying for user number:  1242  username:  ABNDT
Trying for user number:  1243  username:  hyrulerofcanada
Trying for user number:  1244  username:  iloverickmoranis
Trying for user number:  1245  username:  daijoubudayo
Trying for user number:  1246  username:  epion-viragos
Trying for user number:  1247  username:  melvinismad
Trying for user number:  1248  username:  minisixx
Trying for user number:  1249  username:  Tramiiepoo
Trying for user number:  1250  username:  idene
Trying for user number:  1251  username:  linzal87
Trying for user number:  12

If you are running the script, run the cell below to avoid waiting a bunch for the code above.

In [214]:
#If you are running this script, please just use this as the Dataframe
df=pd.DataFrame.from_csv('redditproducts.csv')
# If you are running the script then run this cell. My functions 
import ast
newlist = []
for i in range(len(df)):
    try:
        print(i)
        newlist.append(ast.literal_eval(df['Subs'][i]))
        print(type(newlist[i]))
    except:
        print(i,'Null Subs, user does not exist')
        newlist.append('Null')
df['Subs']=newlist

0
<class 'set'>
1
<class 'set'>
2
<class 'set'>
3
<class 'set'>
4
<class 'set'>
5
<class 'set'>
6
<class 'set'>
7
<class 'set'>
8
<class 'set'>
9
<class 'set'>
10
<class 'set'>
11
<class 'set'>
12
<class 'set'>
13
<class 'set'>
14
<class 'set'>
15
<class 'set'>
16
<class 'set'>
17
<class 'set'>
18
<class 'set'>
19
<class 'set'>
20
<class 'set'>
21
<class 'set'>
22
<class 'set'>
23
<class 'set'>
24
<class 'set'>
25
<class 'set'>
26
<class 'set'>
27
<class 'set'>
28
<class 'set'>
29
<class 'set'>
30
<class 'set'>
31
<class 'set'>
32
<class 'set'>
33
<class 'set'>
34
34 Null Subs, user does not exist
35
<class 'set'>
36
<class 'set'>
37
<class 'set'>
38
<class 'set'>
39
<class 'set'>
40
<class 'set'>
41
<class 'set'>
42
<class 'set'>
43
<class 'set'>
44
<class 'set'>
45
<class 'set'>
46
<class 'set'>
47
<class 'set'>
48
<class 'set'>
49
<class 'set'>
50
<class 'set'>
51
<class 'set'>
52
<class 'set'>
53
<class 'set'>
54
54 Null Subs, user does not exist
55
55 Null Subs, user does not exis

652
<class 'set'>
653
<class 'set'>
654
<class 'set'>
655
<class 'set'>
656
<class 'set'>
657
<class 'set'>
658
<class 'set'>
659
<class 'set'>
660
<class 'set'>
661
<class 'set'>
662
<class 'set'>
663
<class 'set'>
664
<class 'set'>
665
<class 'set'>
666
<class 'set'>
667
<class 'set'>
668
<class 'set'>
669
<class 'set'>
670
<class 'set'>
671
<class 'set'>
672
672 Null Subs, user does not exist
673
<class 'set'>
674
<class 'set'>
675
<class 'set'>
676
<class 'set'>
677
<class 'set'>
678
<class 'set'>
679
<class 'set'>
680
<class 'set'>
681
<class 'set'>
682
<class 'set'>
683
<class 'set'>
684
<class 'set'>
685
<class 'set'>
686
<class 'set'>
687
<class 'set'>
688
<class 'set'>
689
<class 'set'>
690
<class 'set'>
691
<class 'set'>
692
<class 'set'>
693
<class 'set'>
694
<class 'set'>
695
<class 'set'>
696
696 Null Subs, user does not exist
697
<class 'set'>
698
<class 'set'>
699
<class 'set'>
700
<class 'set'>
701
<class 'set'>
702
<class 'set'>
703
<class 'set'>
704
<class 'set'>
705


1225
<class 'set'>
1226
<class 'set'>
1227
<class 'set'>
1228
<class 'set'>
1229
<class 'set'>
1230
<class 'set'>
1231
<class 'set'>
1232
<class 'set'>
1233
<class 'set'>
1234
<class 'set'>
1235
<class 'set'>
1236
<class 'set'>
1237
<class 'set'>
1238
1238 Null Subs, user does not exist
1239
<class 'set'>
1240
<class 'set'>
1241
<class 'set'>
1242
<class 'set'>
1243
<class 'set'>
1244
<class 'set'>
1245
<class 'set'>
1246
<class 'set'>
1247
<class 'set'>
1248
<class 'set'>
1249
<class 'set'>
1250
<class 'set'>
1251
<class 'set'>
1252
<class 'set'>
1253
<class 'set'>
1254
<class 'set'>
1255
<class 'set'>
1256
<class 'set'>
1257
<class 'set'>
1258
<class 'set'>
1259
<class 'set'>
1260
1260 Null Subs, user does not exist
1261
<class 'set'>
1262
<class 'set'>
1263
<class 'set'>
1264
<class 'set'>
1265
<class 'set'>
1266
<class 'set'>
1267
<class 'set'>
1268
<class 'set'>
1269
<class 'set'>
1270
<class 'set'>
1271
<class 'set'>
1272
<class 'set'>
1273
<class 'set'>
1274
<class 'set'>
1275
<

We will use jaccard similarity to find similar subreddits. Jaccard similarity is computed by taking the intersections and dividing by the union of two sets. The function will take 4 arguments, the username, dbsubs, usersubs and the url. We wll return a list that has the username of the potential match, the similarity score and the recommendation link. We also have to check if the usersubs field is a set, since if we return a 'Null' string if the user doesn't exist when populating the list of subs. 


In [201]:
def jaccardsimilarity(username,dbsubs,usersubs,link):
    temp = []
    temp.append(username)
    if type(dbsubs)==set:
        intersection = dbsubs.intersection(usersubs)
        temp.append(intersection)
        score = (len(intersection)/len(dbsubs.union(usersubs)))
        temp.append(score)
        temp.append(link)
        return temp
    else: 
        score = 0
        temp.append('Not Active User')
        temp.append(score)
        return temp

Now we put it all together. We check to see if its a valid usersanme. Then we iterate through our dataframe. We create a temporary list and run it through our jaccard similarity function. We then sort it by the jaccard similarity cost (in descending order) and prune out all but the top 5 results. Then we just loop through the list and print our results. 

In [218]:
def main(username):
    cesc = []
    try:
        usersubs = getsubs(username)
    except:
        print("Not a valid username.")
        return 
    for i in range(len(df)-1):
    
        try:
            cesc.append(jaccardsimilarity(df['User'][i],df['Subs'][i],usersubs,df['New Url'][i]))
        except:
            continue
    # Sort the list by the 3rd item, the Jaccard similarity score
    sortedlist = sorted(cesc,key=itemgetter(2),reverse=True)
    # Return the top 5 results
    sortedlist = sortedlist[:5]
    print("Your reddit twins are:")
    
    for i in sortedlist:
        print('http://www.reddit.com/u/'+str(i[0]))
        print('Similarity Score:',i[2])
        print('The Subreddits you share are: ',i[1])
        print('And here is some stuff they like. Check it out you may like it as well! ' ,i[3] )


In [219]:
# Just call main with your username as a string to run the recommender
username = 'abhi91'
main(username)

Your reddit twins are:
http://www.reddit.com/u/CelticMara
Similarity Score: 0.0759493670886076
The Subreddits you share are:  {'videos', 'The_Donald', 'politics', 'asoiaf', 'movies', 'gameofthrones'}
And here is some stuff they like. Check it out you may like it as well!  https://www.amazon.com/gp/registry/wishlist/3F7V5UDKFLX6G/ref=top&tag=reddittwin-20
http://www.reddit.com/u/hugehair
Similarity Score: 0.07317073170731707
The Subreddits you share are:  {'Overwatch', 'rickandmorty', 'videos', 'The_Donald', 'politics', 'Jokes'}
And here is some stuff they like. Check it out you may like it as well!  https://www.amazon.com/gp/registry/wishlist/2FFJNK4E8C1GX/ref=top&tag=reddittwin-20
http://www.reddit.com/u/skekVex
Similarity Score: 0.07142857142857142
The Subreddits you share are:  {'me_irl', 'Overwatch', 'hearthstone', 'videos', 'reactiongifs', 'movies'}
And here is some stuff they like. Check it out you may like it as well!  http://a.co/9ksbaaR
http://www.reddit.com/u/EmergencyPizza
S

In [167]:
df.to_csv('redditproducts.csv')

# Limitations and Next Steps

I tested my tool with 10 of my friends and got some feedback. It was clear that the results were not in the best format possible. I am returning wishlists rather than individual products. Some of these wishlists have mundane items like snacks and giftcards which is a poor recommendation for users. However, there are some interesting products like board games and books that my friends found intriguing. Of course, I want to test the tool on more than 10 people. I plan on publishing this tool online and trying to use reddit to generate traffic to it. I want to encourage people to tip their reddit match or buy them something from their wishlists, as this would incentivize them to submit their own Amazon wishlists and make my seed dataset richer. I hope that as more and more people use the tool and sign up, the product starts recommending cooler and cooler things for people to find. This cycle of generosity and altruism married with finding novel products, in my opinion, would be a great tool for redditors and showcase that an efficient peer to peer recommendation engine can be built by sharing interests with each other rather than with a company.