# Amazon Recommendation System - Lab

## Introduction

Now that you've gotten an introduction to collaborative filtering and recommendation systems, it's time to put your skills to test and build a recommendation system for a real world dataset! For this lab, you'll be using a dataset regarding the book reviews on the Amazon marketplace. While the previous lesson focused on user-based recommendation systems, you'll apply a parallel process for an item-based recommendation system to recommend similar books at the bottom of the product page.

## Objectives

In this lab you will: 

- Use graph-based similarity metrics to create a collaborative filtering recommender system

## Load the Dataset

In [1]:
import pandas as pd
import networkx as nx
G = nx.Graph()

df = pd.read_csv('books_data.edgelist', names=['source', 'target', 'weight'], delimiter=' ')
df.head()

Unnamed: 0,source,target,weight
0,827229534,0804215715,0.7
1,827229534,156101074X,0.5
2,827229534,0687023955,0.8
3,827229534,0687074231,0.8
4,827229534,082721619X,0.7


## Load the Metadata 

Import the metadata available in the file `'books_meta.txt'` (note it is `'\t'` seperated). 

In [3]:
# Your code here
meta_data = pd.read_csv('books_meta.txt', delimiter='\t')
meta_data

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
0,1,0827229534,Patterns of Preaching: A Sermon Sampler,clergi sermon subject religion preach spiritu ...,Book,396585,2,5.0,8,0.80
1,2,0738700797,Candlemas: Feast of Flames,subject witchcraft earth religion spiritu base...,Book,168596,12,4.5,9,0.85
2,3,0486287785,World War II Allied Fighter Planes Trading Cards,general hobbi subject craft home garden book,Book,1270652,1,5.0,0,0.00
3,4,0842328327,Life Application Bible Commentary: 1 and 2 Tim...,spiritu translat commentari christian book gui...,Book,631289,1,4.0,6,0.79
4,5,1577943082,Prayers That Avail Much for Business: Executive,subject religion spiritu busi christian live w...,Book,455160,0,0.0,4,1.00
...,...,...,...,...,...,...,...,...,...,...
393556,548541,9700507734,Para alcanzar el orgasmo,mind general subject health bodi book,Book,0,1,4.0,0,0.00
393557,548542,9627762644,Starting a Hedge Fund : A US Perspective,general subject busi book invest,Book,0,3,2.5,0,0.00
393558,548543,0970020503,Facts Every Injured Worker Should Know,general subject busi book law practic guid lab...,Book,0,5,4.5,0,0.00
393559,548546,1930519206,Adobe Photoshop 6 VTC Training CD,book com offic graphic subject photoshop inter...,Book,0,2,5.0,0,0.00


## Select Books to Test Your Recommender On

Select a small subset of books that you are interested in generating recommendations for. 

In [17]:
# Your code here
df_select = meta_data[meta_data.Title.str.contains('Thrones')]
for i in df_select.Title.unique():
    temp = df_select[df_select['Title'] == i]
    print (temp[['ASIN', 'Title']])

              ASIN                                              Title
59750   0553103547  A Game of Thrones (A Song of Ice and Fire, Boo...
183820  0553573403  A Game of Thrones (A Song of Ice and Fire, Boo...
261763  0553381687  A Game of Thrones (A Song of Ice and Fire, Boo...
              ASIN                 Title
130560  1572701293  Thrones, Dominations
              ASIN                                         Title
331188  0312968302  Thrones, Dominations (A Lord Wimsey Mystery)


In [18]:
# Filter the DataFrame for rows containing 'Thrones' in the Title
df_select = meta_data[meta_data['Title'].str.contains('Thrones', na=False)]

# Group by Title and print ASIN and Title for each unique Title
for title, group in df_select.groupby('Title'):
    print(f"Title: {title}")
    print(group[['ASIN', 'Title']])
    print()  # Add a blank line for better readability

Title: A Game of Thrones (A Song of Ice and Fire, Book 1)
              ASIN                                              Title
59750   0553103547  A Game of Thrones (A Song of Ice and Fire, Boo...
183820  0553573403  A Game of Thrones (A Song of Ice and Fire, Boo...
261763  0553381687  A Game of Thrones (A Song of Ice and Fire, Boo...

Title: Thrones, Dominations
              ASIN                 Title
130560  1572701293  Thrones, Dominations

Title: Thrones, Dominations (A Lord Wimsey Mystery)
              ASIN                                         Title
331188  0312968302  Thrones, Dominations (A Lord Wimsey Mystery)



## Generate Recommendations for a Few Books of Choice

The `'books_data.edgelist'` has conveniently already calculated the distance between items for you. Given this preprocessed data, it's time to employ collaborative filtering to generate recommendations! Generate the top 10 recommendations for each book in the subset you chose. Be sure to print the book name that you are generating recommendations for as well as the name of the books being recommended. 

In [25]:
def recommend(book):
    try:
        # Filter the ASIN of the selected book
        asin = df_select[df_select['Title'] == book]['ASIN'].values[0]

        # Filter rows where the source matches the ASIN
        filtered_df = df[df['source'] == asin]

        # Check if there are recommendations to display
        if filtered_df.empty:
            print(f"No recommendations found for the book: '{book}'")
            return

        # Group by 'source' and iterate through groups
        for source, group in filtered_df.groupby('source'):
            print(f"Recommendations for source '{book}':")
            
            # Sort by 'weight' and get the top 10 'target' ASINs
            top_targets = group.sort_values(by='weight', ascending=False).head(10)['target']

            # Fetch and print the corresponding titles from meta_data
            for target_asin in top_targets:
                title = meta_data[meta_data['ASIN'] == target_asin]['Title'].values
                if title.size > 0:  # Ensure title exists
                    print(f" - {title[0]}")
                else:
                    print(f" - Title not found for ASIN: {target_asin}")
            print()  # Add a blank line for better readability

    except IndexError:
        print(f"The book '{book}' was not found in the dataset.")

In [27]:
# Example usage
recommend('Thrones, Dominations (A Lord Wimsey Mystery)')

Recommendations for source 'Thrones, Dominations (A Lord Wimsey Mystery)':
 - Busman's Honeymoon
 - A Presumption of Death (Mystery Masters Series)
 - A Presumption of Death (Mystery Masters Series)



## Summary

Well done! In this lab, you effectively created a recommendation system for a real world dataset!