# 02807 Final project: Recommendation system
Recommendation system of products from __Digital Music__ category on __Amazon__. Products are suggested based on a short description inserted by a user.
[**Data source**](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/)

In [32]:
# Imports
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import json
import gzip
import spacy
import warnings
# import os
import pandas as pd
import numpy as np
# import torch
from sklearn.decomposition import TruncatedSVD
from sklearn.cluster import DBSCAN, KMeans
from scipy import sparse
from hdbscan import HDBSCAN
from collections import Counter, defaultdict
from lxml import html, etree
from nrclex import NRCLex
from transformers import AutoTokenizer, AutoModelWithLMHead
from preprocess_data import *
# from preprocess_data import pre_process_for_description

# Load the data 

In [33]:
# Download dataset if it is not downloaded yet
if not os.path.exists('Dataset/meta_Digital_Music.json.gz'):
    !wget https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/metaFiles2/meta_Digital_Music.json.gz -P ./Dataset
else:
    print('Dataset already downloaded.')

Dataset already downloaded.


__Data format__
   * `asin`: ID of the product, e.g. 0000031852
   * `title`: name of the product
   * `feature`: bullet-point format features of the product
   * `description`: description of the product
   * `price`: price in US dollars (at time of crawl)
   * `imageURL`: url of the product image
   * `imageURL`: url of the high resolution product image
   * `related`: related products (also bought, also viewed, bought together, buy after viewing)
   * `salesRank`: sales rank information
   * `brand`: brand name
   * `categories`: list of categories the product belongs to
   * `tech1`: the first technical detail table of the product
   * `tech2`: the second technical detail table of the product
   * `similar`: similar product table

_Note that there are usually multiple attributes left out blank for each product (specific attributes differs from product to product)._ 


In [34]:
### Load the meta data
data = []
with gzip.open('Dataset/meta_Digital_Music.json.gz') as f:
    for l in f:
        data.append(json.loads(l.strip()))
    
# Total length of list, this number equals total number of products
print("Total number of items in the dataset: ", len(data))

Total number of items in the dataset:  74347


In [35]:
# convert list into pandas dataframe
df = pd.DataFrame.from_dict(data)

# set size of display in pandas
pd.set_option('display.max_colwidth', 300)
pd.set_option('display.max_rows', 20 )

# first row of the list
print("Columns of the dataset: ", df.columns)

# show dataframe with columns and rows
# df.head()
# df2.info()

Columns of the dataset:  Index(['category', 'tech1', 'description', 'fit', 'title', 'also_buy', 'tech2',
       'brand', 'feature', 'rank', 'also_view', 'main_cat', 'similar_item',
       'date', 'price', 'asin', 'imageURL', 'imageURLHighRes', 'details'],
      dtype='object')


# Data processing

- Remove empty description
- Remove HTML tag
- Remove URLs
- Remove HTML hidden carachters
- Remove punctuation
- Remove numbers
- Transform every word into lowercase
- Remove stop words
- Perform stemming 

In [36]:
# Drop rows with no description (description is empty)
df = df[df['description'].map(lambda d: len(d)) > 0]
df.description
# df2.head()

4        [1. Losing Game 2. I Can't Wait 3. Didn't He Shine 4. Never Seen...Righteous... 5. A Broken Heart 6. Looking Back 7. Here We Are 8. I Saw The Lord 9. Jesus Is A River Of Love 10. Hittin' The Road 11. I've Never Been Out Of... 12. Jesus Gotta Hold Of My Life 13. Saved- Saved- Saved 14. What Will ...
9                                                                                                                                                                                                                                                                                                                [.]
10       [The Music Connection by Silver Burdett Ginn is a teaching aid for  \nan elementary music or a homeroom teacher. Created by authorities  \nin Music, The Music Connection: by Silver Burdett provides an  \nexcellent foundation for Music studies. Silver Burdetts style is  \nsuited towards Music stu...
12                                                                       

In [37]:
# each description is a list of strings,we want to remove the empty strings, and join the list of strings into one string
df.description = df.description.apply(lambda x: [string for string in x if string != ""])
df.description = df.description.apply(lambda x: " ".join(x))
df.iloc[0].description


"1. Losing Game 2. I Can't Wait 3. Didn't He Shine 4. Never Seen...Righteous... 5. A Broken Heart 6. Looking Back 7. Here We Are 8. I Saw The Lord 9. Jesus Is A River Of Love 10. Hittin' The Road 11. I've Never Been Out Of... 12. Jesus Gotta Hold Of My Life 13. Saved- Saved- Saved 14. What Will You Do? 15. Rise Again"

In [38]:
# f = open("descriptionHTMLbefore.txt", "w")
# for i in range(5000):
#     f.write(df2.iloc[i].description)
# f.close()
df_similarity_scores = df.copy()

print("Example of description before preprocessing: ")
print(df.description.iloc[0:2])
df.description = df.description.apply(lambda x: preprocess_data(x))
print()
print("Example of description after preprocessing: ")
print(df.description.iloc[0:2])

# f = open("descriptionHTMLafter.txt", "w")
# for i in range(5000):
#     f.write(df2.iloc[i].description)
# f.close()



Example of description before preprocessing: 
4    1. Losing Game 2. I Can't Wait 3. Didn't He Shine 4. Never Seen...Righteous... 5. A Broken Heart 6. Looking Back 7. Here We Are 8. I Saw The Lord 9. Jesus Is A River Of Love 10. Hittin' The Road 11. I've Never Been Out Of... 12. Jesus Gotta Hold Of My Life 13. Saved- Saved- Saved 14. What Will Y...
9                                                                                                                                                                                                                                                                                                              .
Name: description, dtype: object

Example of description after preprocessing: 
4    lose game wait shine never seen righteou broken heart look back saw lord jesu river love hittin road never jesu got ta hold life save save save rise
9                                                                                                              

## Does any product contain different descriptions?  
There exists products which are not unique. The asin and the descriptions are duplicated. 
We process the data in order to have unique products.

Removing the duplicates products -> now each product is unique

In [39]:
df_asin_description = df[["asin","description"]].copy()
df_asin_description.drop_duplicates(subset = "description", inplace=True)
# print(len(df_asin_description))
df_asin_description

Unnamed: 0,asin,description
4,0001526146,lose game wait shine never seen righteou broken heart look back saw lord jesu river love hittin road never jesu got ta hold life save save save rise
9,0159024684,
10,0382262921,music connect silver burdett ginn teach aid elementari music homeroom teacher creat author music music connect silver burdett provid excel foundat music studi silver burdett style suit toward music studi teach student materi clearli without overcompl subject contain varieti record vocal track pe...
12,0545069882,spanish know gold edit learn spanish flash
13,0545109620,cd book long sinc vanish great condit classic
...,...,...
74336,B01HG2DW1I,track list butter ball zaq attack zona walk like guv sentiment pacif daylight trombon institut technolog san jose fog citi show crb trombon giant
74338,B01HH5R7LK,coldplay head full dream tour live etihad stadium manchest england june th cd intro head full dream yellow everi teardrop waterfal scientist bird paradis everglow lover japan magic clock midnight charli brown hymn weekend fix hero viva la vida cd adventur lifetim kaleidoscop troubl see soon amaz...
74339,B01HH68B96,known live version that way life goe steam blacktop witha demo version superfici love sang hughi instead chri hick
74342,B01HH7D5KU,free last southsid never gon lose purpl come southsid diamond africa southsid southsid compadr southsid march mad tarentino trap nigga southsid da fam da gram skit southsid night southsid total length


# Sentiment Analysis

In [40]:
df_process = df_asin_description
df_process.head()

Unnamed: 0,asin,description
4,1526146,lose game wait shine never seen righteou broken heart look back saw lord jesu river love hittin road never jesu got ta hold life save save save rise
9,159024684,
10,382262921,music connect silver burdett ginn teach aid elementari music homeroom teacher creat author music music connect silver burdett provid excel foundat music studi silver burdett style suit toward music studi teach student materi clearli without overcompl subject contain varieti record vocal track pe...
12,545069882,spanish know gold edit learn spanish flash
13,545109620,cd book long sinc vanish great condit classic


In [41]:
# Suppressing warning about old version of spacy
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    # Applying Spacy affect model emotions
    nlp_affect = spacy.load('Spacy-Affect-Model/affect_ner')

    
df_process['emotion_spacy'] = df_process.description.apply(lambda x: Counter([item.label_.lower() for item in nlp_affect(x).ents]))

In [42]:
# Applying NRCLex emotions
#df['emotion_nrc'] = df.description.apply(lambda x: NRCLex(x).raw_emotion_scores) 

In [43]:

# Extracting most significant emotion of a particular description
def get_most_significant_emotion(emotions):
    try:
        sign_emotion = max(emotions, key=emotions.get)
    except ValueError:
        sign_emotion = None
    return sign_emotion

#df['most_significant_emotion_nrc'] = df.emotion_nrc.apply(lambda x: get_most_significant_emotion(x))
df_process['most_significant_emotion_spacy'] = df_process.emotion_spacy.apply(lambda x: get_most_significant_emotion(x))

df_process.head(100)
save_csv = True
if save_csv:
    df_process.to_csv('digital_music.csv')



In [44]:

# Output of user emotion based on input.txt
file_input = open("input.txt", "r")
text = file_input.read()
nlp_affect = spacy.load('Spacy-Affect-Model/affect_ner')

def measure_affect_score(sentence : str, nlp_affect):
    affect_percent = {'fear': 0.0, 'anger': 0.0, 'anticipation': 0.0, 'trust': 0.0, 'surprise': 0.0, 'positive': 0.0,
                      'negative': 0.0, 'sadness': 0.0, 'disgust': 0.0, 'joy': 0.0}
    emotions = []
    doc = nlp_affect(sentence)
    if len(doc.ents) != 0:
        for ent in doc.ents:
            emotions.append(ent.label_.lower())
        affect_counts = Counter()
        for emotion in emotions:
            affect_counts[emotion] += 1
        sum_values = sum(affect_counts.values())
        for key in affect_counts.keys():
            affect_percent.update({key: float(affect_counts[key]) / float(sum_values)})
    return affect_percent

user_emotion_scores = measure_affect_score(text,nlp_affect)
max_emotion = max(user_emotion_scores, key=user_emotion_scores.get)
user_emotion = max_emotion

print(user_emotion)



positive


In [45]:
# Find all items with the emotion "anticipation"
import pandas as pd

# read file with all emotions 
df_emotion = pd.read_csv('digital_music.csv')  
# filter satisfied lines（emotion == anticipation）
filtered_df = df_emotion[df_emotion['most_significant_emotion_spacy'] == user_emotion]

# generated new lines 
filtered_df.to_csv('grouped_emotion.csv', index=False) 



In [56]:
df_emotion.columns

Index(['Unnamed: 0', 'asin', 'description', 'emotion_spacy',
       'most_significant_emotion_spacy'],
      dtype='object')

# Similar Items System
Program that reads the dataset, preprocess the data and output the most similar items based on a user description of a product.

In [46]:
import json
from collections import defaultdict
import gzip
import pandas as pd
from lxml import html,etree
import numpy as np
import ipywidgets as widgets
import re
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import nltk
from nltk.stem import PorterStemmer
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
import os
from preprocess_data import preprocess_data


# set stopwords vocabulary
nltk.download('stopwords')

# set tokenizer
nltk.download('punkt')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/ariannabianchi/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/ariannabianchi/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [58]:
df_asin_description = df_emotion[["asin","description"]].copy()
df_asin_description.drop_duplicates(subset = "description", inplace=True)
df_asin_description.dropna(subset=['description'], inplace=True)
# print(len(df_asin_description))
df_asin_description

Unnamed: 0,asin,description
0,0001526146,lose game wait shine never seen righteou broken heart look back saw lord jesu river love hittin road never jesu got ta hold life save save save rise
2,0382262921,music connect silver burdett ginn teach aid elementari music homeroom teacher creat author music music connect silver burdett provid excel foundat music studi silver burdett style suit toward music studi teach student materi clearli without overcompl subject contain varieti record vocal track pe...
3,0545069882,spanish know gold edit learn spanish flash
4,0545109620,cd book long sinc vanish great condit classic
5,0545352886,entertain bright joyou romp fill play laughter learn set audio cd wonder way read along belov stori
...,...,...
26851,B01HG2DW1I,track list butter ball zaq attack zona walk like guv sentiment pacif daylight trombon institut technolog san jose fog citi show crb trombon giant
26852,B01HH5R7LK,coldplay head full dream tour live etihad stadium manchest england june th cd intro head full dream yellow everi teardrop waterfal scientist bird paradis everglow lover japan magic clock midnight charli brown hymn weekend fix hero viva la vida cd adventur lifetim kaleidoscop troubl see soon amaz...
26853,B01HH68B96,known live version that way life goe steam blacktop witha demo version superfici love sang hughi instead chri hick
26854,B01HH7D5KU,free last southsid never gon lose purpl come southsid diamond africa southsid southsid compadr southsid march mad tarentino trap nigga southsid da fam da gram skit southsid night southsid total length


In [59]:
df_asin_description.head()

Unnamed: 0,asin,description
0,1526146,lose game wait shine never seen righteou broken heart look back saw lord jesu river love hittin road never jesu got ta hold life save save save rise
2,382262921,music connect silver burdett ginn teach aid elementari music homeroom teacher creat author music music connect silver burdett provid excel foundat music studi silver burdett style suit toward music studi teach student materi clearli without overcompl subject contain varieti record vocal track pe...
3,545069882,spanish know gold edit learn spanish flash
4,545109620,cd book long sinc vanish great condit classic
5,545352886,entertain bright joyou romp fill play laughter learn set audio cd wonder way read along belov stori


## Creating shingles

In [60]:
# Given a string input, return the list of shingles
def shingle(s, q, delimiter=' '):
    all_shingles = []
    if isinstance(s, float):
        print(s)
    if delimiter != '':
        words_list = s.split(delimiter)
    else:
        words_list = s
    for i in range (len(words_list)-q+1):
        all_shingles.append(delimiter.join(words_list[i:i+q]))
    return list(set(all_shingles))

In [62]:
# Apply shingles to the df_asin_description
df_asin_description["shingles"] = df_asin_description["description"].apply(lambda x: shingle(x, 3))
# aaa = df_asin_description["description"].apply(lambda x: shingle(x, 3))
# df_asin_description

### Similarity of sets
Computing Jaccuard similarity

In [63]:
# function that takes an intersection set and a union set and returns the Jaccard similarity
def similarity(intersection_set, union_set):
    return len(intersection_set)/len(union_set)

In [64]:
# input = "In the dynamic landscape of higher education, universities are continually redefining the traditional boundaries of learning. The integration of arts, music, and literature has become a cornerstone in fostering a holistic educational experience. At the heart of this transformation is the commitment to connect students with a diverse range of disciplines, preparing them not only for academic success but also for a life enriched by creativity and cultural understanding. In this context, universities such as New School are pioneering integrated learning models that transcend conventional subject silos. Their innovative approach, backed by cutting-edge teaching methodologies, empowers students to explore the intersections of arts, music, and literature. The vision goes beyond a mere confluence of disciplines; it seeks to create an immersive educational environment where students can seamlessly weave their academic pursuits into the fabric of their daily lives. One key player in this educational evolution is McGraw, a renowned arts author whose work has become a guiding light for both educators and students alike. McGraw's contributions extend beyond the conventional boundaries of a university classroom, resonating with a global audience. His writings not only inspire a love for the arts but also emphasize the transformative power of integrated learning in shaping well-rounded individuals. The concept of an integrated learning environment transcends the boundaries of time and space. It is not confined to the four walls of a classroom; rather, it permeates every facet of a student's journey. In this dynamic world, students are no longer passive recipients of knowledge but active participants in a vibrant community of learners. The university becomes a nexus where diverse ideas converge, fostering a collaborative spirit that extends far beyond graduation. In this interconnected world, the New School's commitment to integrated learning is a beacon of innovation. Students are not just acquiring knowledge; they are forging connections between seemingly disparate fields, discovering the harmonies between arts and sciences, and navigating the rhythms of a multicultural world. This transformative journey prepares them to navigate the complexities of the modern world with a deep appreciation for diversity and a keen sense of intellectual curiosity. As we stand at the intersection of arts, music, and literature, the integrated learning paradigm championed by universities like New School, guided by visionary authors such as McGraw, is shaping the future of education. It is a testament to the idea that learning is not a compartmentalized experience but a symphony of knowledge, where every note, every discipline, plays a crucial role in the harmonious melody of life."

file_input = open("input.txt", "r")
input = file_input.read()
# print(input)
user_description = preprocess_data(input)
user_description = shingle(user_description, 3)  
# intersection_set = set(user_description).intersection(set(df_asin_description.shingles.iloc[0]))
# union_set = set(user_description).union(set(df_asin_description.shingles.iloc[0]))
# # perform similarity
# sim = similarity(intersection_set, union_set)
# print(sim)


In [65]:
# df_asin_description
df_asin_description["similarity"] = df_asin_description["shingles"].apply(lambda x: similarity(set(user_description).intersection(set(x)), set(user_description).union(set(x))))
df_asin_description


Unnamed: 0,asin,description,shingles,similarity
0,0001526146,lose game wait shine never seen righteou broken heart look back saw lord jesu river love hittin road never jesu got ta hold life save save save rise,"[look back saw, never jesu got, river love hittin, never seen righteou, righteou broken heart, jesu river love, back saw lord, love hittin road, heart look back, lord jesu river, broken heart look, seen righteou broken, life save save, game wait shine, save save rise, ta hold life, road never je...",0.0
2,0382262921,music connect silver burdett ginn teach aid elementari music homeroom teacher creat author music music connect silver burdett provid excel foundat music studi silver burdett style suit toward music studi teach student materi clearli without overcompl subject contain varieti record vocal track pe...,"[burdett style suit, record vocal track, pick track danc, suit toward music, creat author music, studi teach student, ginn teach aid, homeroom teacher creat, teach aid elementari, excel foundat music, foundat music studi, burdett ginn teach, contain varieti record, perform track pick, author mus...",0.0
3,0545069882,spanish know gold edit learn spanish flash,"[know gold edit, edit learn spanish, spanish know gold, learn spanish flash, gold edit learn]",0.0
4,0545109620,cd book long sinc vanish great condit classic,"[sinc vanish great, long sinc vanish, cd book long, great condit classic, book long sinc, vanish great condit]",0.0
5,0545352886,entertain bright joyou romp fill play laughter learn set audio cd wonder way read along belov stori,"[entertain bright joyou, way read along, play laughter learn, joyou romp fill, audio cd wonder, fill play laughter, romp fill play, learn set audio, set audio cd, cd wonder way, laughter learn set, along belov stori, read along belov, bright joyou romp, wonder way read]",0.0
...,...,...,...,...
26851,B01HG2DW1I,track list butter ball zaq attack zona walk like guv sentiment pacif daylight trombon institut technolog san jose fog citi show crb trombon giant,"[guv sentiment pacif, san jose fog, show crb trombon, walk like guv, track list butter, fog citi show, technolog san jose, jose fog citi, citi show crb, like guv sentiment, crb trombon giant, zaq attack zona, pacif daylight trombon, attack zona walk, institut technolog san, trombon institut tech...",0.0
26852,B01HH5R7LK,coldplay head full dream tour live etihad stadium manchest england june th cd intro head full dream yellow everi teardrop waterfal scientist bird paradis everglow lover japan magic clock midnight charli brown hymn weekend fix hero viva la vida cd adventur lifetim kaleidoscop troubl see soon amaz...,"[manchest england june, lover japan magic, day sky full, nme award viva, cd intro head, weekend fix hero, teardrop waterfal scientist, viva la vida, midnight charli brown, paradis everglow lover, head full dream, live etihad stadium, troubl see soon, vida charli brown, magic clock midnight, etih...",0.0
26853,B01HH68B96,known live version that way life goe steam blacktop witha demo version superfici love sang hughi instead chri hick,"[that way life, love sang hughi, demo version superfici, steam blacktop witha, superfici love sang, sang hughi instead, version superfici love, way life goe, instead chri hick, goe steam blacktop, blacktop witha demo, known live version, witha demo version, life goe steam, hughi instead chri, ve...",0.0
26854,B01HH7D5KU,free last southsid never gon lose purpl come southsid diamond africa southsid southsid compadr southsid march mad tarentino trap nigga southsid da fam da gram skit southsid night southsid total length,"[mad tarentino trap, gram skit southsid, march mad tarentino, come southsid diamond, gon lose purpl, night southsid total, purpl come southsid, trap nigga southsid, lose purpl come, southsid total length, diamond africa southsid, da fam da, tarentino trap nigga, free last southsid, southsid diam...",0.0


Dataframe sorted by similarity

In [66]:

df_asin_description.sort_values(by="similarity", ascending=False, inplace=True)
df_asin_description


# if os.path.exists("10RecommendedItems.csv"):
#   os.remove("10RecommendedItems.csv")
# df_asin_description[:11].to_csv('10RecommendedItems.csv', index=False)

Unnamed: 0,asin,description,shingles,similarity
3471,B00032N1V0,grand daddi tap cd short repeat piano orchestr select design meet need teach tap techniqu build routin note contain technic exercis tap fundament terminolog grade routin doubl length cd note avail,"[teach tap techniqu, techniqu build routin, build routin note, grade routin doubl, tap cd short, terminolog grade routin, piano orchestr select, routin note contain, orchestr select design, repeat piano orchestr, select design meet, cd short repeat, contain technic exercis, exercis tap fundament...",0.010000
24162,B00M8B98SO,music companion book best sell author max lucado collect wonder backdrop prayer person devot offer song chosen perfectli complement book also includ new song darlen zschech paul rita baloch written specif project,"[offer song chosen, song chosen perfectli, also includ new, darlen zschech paul, paul rita baloch, zschech paul rita, devot offer song, book also includ, collect wonder backdrop, companion book best, sell author max, rita baloch written, complement book also, baloch written specif, includ new so...",0.009901
26,0829736522,combin style rock ska zona releas new product song fill posit messag theme hope motiv touch live young peopl life chang way vision goal group album also includ new version neblina one hit previou product,"[goal group album, hit previou product, life chang way, also includ new, combin style rock, album also includ, group album also, motiv touch live, ska zona releas, way vision goal, includ new version, style rock ska, live young peopl, touch live young, theme hope motiv, rock ska zona, messag the...",0.009615
21741,B00B18ULGS,livetun song collect year featur hatsun miku come also includ new song subject chang come bonu dvd unreleas music clip previou song four repres clip subject chang edit avail may dvd disc encod region japan europ middl east subtitl includ,"[europ middl east, music clip previou, also includ new, may dvd disc, east subtitl includ, year featur hatsun, song four repres, dvd unreleas music, repres clip subject, come bonu dvd, subject chang come, japan europ middl, collect year featur, song collect year, disc encod region, miku come als...",0.009174
20104,B006BAVVFQ,meet alzheim companionship journey inform inspir holist introduct alzheim diseas uniqu audio resourc design meet need busi caregiv provid critic inform engag access format assimil distil mani book resourc meet alzheim compassion bring forth critic point care someon dementia goal reduc caregiv st...,"[qualiti life caregiv, metaphor art sensit, poetri myth metaphor, dementia goal reduc, teresa avila meet, thing think much, need busi caregiv, caregiv symptom alzheim, caregiv stress improv, someon dementia goal, resourc design meet, meet alzheim companionship, care someon dementia, alzheim comp...",0.006329
...,...,...,...,...
8951,B000RC4W32,black tulip record present royalett singl collect featur origin record track big thing yesterday lover willi wolf blue summer come goe cri gone watch happen poor boy gon na take miracl sight mind want meet never bring lone big mistak better know affair rememb want one summer gone love without en...,"[take miracl bonu, sight mind want, happen poor boy, wolf blue summer, na take miracl, lone big mistak, record track big, present royalett singl, featur origin record, boy gon na, want meet never, bring lone big, miracl sight mind, never bring lone, know affair rememb, take miracl sight, watch h...",0.000000
8950,B000RC9692,trombolin jack across way chilli wind shannon kansa citi railroad blue land lincoln waltz g bill blue come along jodi northern white cloud right right waltz c golden west frog lilypad lloyd loar front back smoki mountain schottisch old mountain rocki run,"[smoki mountain schottisch, northern white cloud, blue land lincoln, frog lilypad lloyd, right right waltz, chilli wind shannon, c golden west, waltz c golden, lincoln waltz g, mountain rocki run, cloud right right, golden west frog, back smoki mountain, citi railroad blue, old mountain rocki, f...",0.000000
8949,B000RC8FMG,koli shema mi ha ish ori veyishi betza bedami matai hayom lekha dodi mizmor ledavid al taster im zmirot amar hashem leyaakov karev yom yedid nefesh et ruhi tifdeh medley vehu yoshieni lemaan achai vereai zion zion,"[hashem leyaakov karev, tifdeh medley vehu, taster im zmirot, zmirot amar hashem, bedami matai hayom, vehu yoshieni lemaan, veyishi betza bedami, vereai zion zion, leyaakov karev yom, shema mi ha, koli shema mi, mi ha ish, et ruhi tifdeh, ruhi tifdeh medley, yom yedid nefesh, ha ish ori, mizmor ...",0.000000
8948,B000RC8J7M,deryn pur ei di r deryn du ar fore dydd nadolig mordaith america dacw nghariad donald ym mhontypridd yr eneth glaf cariad cyntaf even prayer glomen hiraeth feirion yr eneth gadd ei gwrthod llangollen market cyfri r geifr,"[even prayer glomen, glaf cariad cyntaf, r deryn du, yr eneth glaf, deryn du ar, deryn pur ei, cyfri r geifr, market cyfri r, glomen hiraeth feirion, di r deryn, nghariad donald ym, prayer glomen hiraeth, llangollen market cyfri, eneth gadd ei, gadd ei gwrthod, dacw nghariad donald, eneth glaf c...",0.000000


In [67]:
print("Similarity of items")
print(df_asin_description.similarity)

Similarity of items
3471     0.010000
24162    0.009901
26       0.009615
21741    0.009174
20104    0.006329
           ...   
8951     0.000000
8950     0.000000
8949     0.000000
8948     0.000000
26855    0.000000
Name: similarity, Length: 26855, dtype: float64
