#Inference Notebook for our Tasks in SNAP product co-purchasing networks

##Proposed Problem Statements and Solutions-

- Product Recommendation System basis of Feature Extracted from the product details and the co-purchases made.

- Similar Products Recommendation System which might make customers explore other options

- Recommendation of products based on their Consumer Confidence and category similarity using different metrics.

In [24]:
"""from google.colab import drive
drive.mount('/content/drive')"""

"from google.colab import drive\ndrive.mount('/content/drive')"

In [25]:
pip install streamlit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [26]:
import numpy as np
import pandas as pd
import networkx as nx
import streamlit as st

##Downloading Extracted Datasets for our models and for solutions of our problem statements

In [27]:
!gdown --id 1mUjr_vFJa7ny5WDDX4OmYs8nw_y3ft-6

Downloading...
From: https://drive.google.com/uc?id=1mUjr_vFJa7ny5WDDX4OmYs8nw_y3ft-6
To: /content/pdssimilar (2).csv
100% 111M/111M [00:00<00:00, 199MB/s] 


In [28]:
!gdown --id 1w2bK4j4Puff_2WhRSgjy82RaRVVKObS8

Downloading...
From: https://drive.google.com/uc?id=1w2bK4j4Puff_2WhRSgjy82RaRVVKObS8
To: /content/finalpreprocesseddata.csv
100% 37.5M/37.5M [00:00<00:00, 178MB/s]


In [29]:
!gdown --id 12mqlYM38ISyPAjRXopjGbP3PLvngqHK7

Downloading...
From: https://drive.google.com/uc?id=12mqlYM38ISyPAjRXopjGbP3PLvngqHK7
To: /content/customer_insights_recommendations2.csv
100% 111M/111M [00:00<00:00, 221MB/s]


In [30]:
!gdown --id 1flQHM7M_tNRAxYmUyZn1F3VV0qy1Xr3-

Downloading...
From: https://drive.google.com/uc?id=1flQHM7M_tNRAxYmUyZn1F3VV0qy1Xr3-
To: /content/G.gpickle
100% 87.7M/87.7M [00:00<00:00, 140MB/s]


In [31]:
!gdown --id 1s7J0B4P60IWER56J6LnJ_k4pFqW3x3qZ

Downloading...
From: https://drive.google.com/uc?id=1s7J0B4P60IWER56J6LnJ_k4pFqW3x3qZ
To: /content/G_Rec.gpickle
100% 96.3M/96.3M [00:00<00:00, 122MB/s]


### Loading Various Datasets for various tasks

In [32]:
# Datasets of Similarities and Recommendations
df = pd.read_csv('/content/pdssimilar (2).csv')
df_rec = pd.read_csv('/content/finalpreprocesseddata.csv')
df1 = pd.read_csv("/content/customer_insights_recommendations2.csv")

# Graphs of Similarities and Recommendations based on Above Datasets
G_Sim = nx.read_gpickle("/content/G.gpickle")
G_Rec = nx.read_gpickle("/content/G_Rec.gpickle")

# Unique Nodes of Similarity Graph
n_sim = list(G_Sim.nodes) 
n_sim =  np.array(n_sim)
n_sim = np.unique(n_sim)

# Unique Nodes of Recommendation Graph
n_rec = list(G_Rec.nodes)
n_rec = np.array(n_rec)
n_rec = np.unique(n_rec)

### Storing the common top products as a fallback for cold start problem 
Incase we encounter any new or unknown items we recommend user popular top products

In [33]:
title_cluster0 = ["Change Your Child's Behavior by Changing Yours...",
                  "Practical Aspects of Interview and Interrogati...",
                  "Practical Aspects of Interview and Interrogati...",
                  "Results-Oriented Job Descriptions: More Than 2...",
                  "Decorative Letters CD-ROM and Book (Dover Elec...",
                  "Log Cabins (Architecture and Design Library)"     ,
                  "Berlioz: Symphonie fantastique",
                  "Smith & Hawken: Hands On Gardener: Composting ...",
                  "Beauty in Exile: The Artists, Models, and Nobi...",
                  "Grammar Wars: 179 Games and Improvs for Learni..."
                ]

title_cluster1 = ["Visualizing Data	",
                  "Remembering Farley",
                  "Northrop Frye on Shakespeare",
                  "Zappa in N.Y.	",
                  "Where the Birds Are: The 100 Best Birdwatching...	",
                  "The Wasp Cookbook	",
                  "The Supreme Court's Greatest Hits",
                  "I Know Why the Caged Bird Sings (Cliffs Notes)",
                  "Color, Environment, & Human Response",
                  "Miles to Go",
                ]

title_cluster2 = ["Thief of Hearts",
                  "Harry Potter and the Goblet of Fire (Book 4 Au...",
                  "Harry Potter and the Goblet of Fire (Book 4, A...	",
                  "Harry Potter and the Goblet of Fire (Book 4)",
                  "Harry Potter and the Goblet of Fire (Book 4)",
                  "Harry Potter and the Goblet of Fire (Book 4)",
                  "Looking For-Best of David Hasselhoff",
                  "Harry Potter and the Sorcerer's Stone (Book 1 ...",
                  "Harry Potter and the Sorcerer's Stone (Book 1 ...",
]

top5_rec = ["Harry Potter and the Goblet of Fire (Book 4)",
            "Looking For-Best of David Hasselhoff	",
            "Harry Potter and the Sorcerer's Stone (Book 1 ...",
            "Remembering Farley",
            "The Supreme Court's Greatest Hits"
]

#### Helper Functions for Tasks

In [34]:
# Function to Get Similar Nodes
def getsimilar(arr):
    indarr = []
    counter = 0
    print("  ")
    print("Similar Products are:")
    for i in arr:
        indx = df.index[df['ASIN'] == i][0]
        if(indx in n_sim):
            print(G_Sim.nodes[indx]['Title'])
            indarr.append(indx)
        else:
            counter = counter+1
    print("   ")
    print("   ")
    return counter, indarr

In [35]:
# Cleaning the Product ID Output
def getclean(pro_id):    
    l = str(list(G_Sim.edges(pro_id)))
    l = l.replace('[','')
    l = l.replace(']','')
    l = l.replace(',','')
    l = l.replace('(','')
    l = l.replace(')','')
    l = l.replace(str(pro_id), '')
    b = l.split()
    b = np.array(b)
    b = b.astype(int)
    return b

In [36]:
# Searching for Index With Respect to Title
def search(s):
  s = s.lower()
  indx = df.index[(df['Title'].str.contains(s, case = False))==True]   
  indx = np.array(indx)
  indx = indx.astype(int)
  return indx

In [37]:
# Main Function for Similar Nodes
flag = 0
def check_sim(val):
    
    val = val[0]
    if val == None:
        print("Hello")
    else:
        temp = int(val)
        if(temp in n_sim):
            pro_id = temp
            pro_dict = G_Sim.nodes[pro_id]
            arr = pro_dict['Copurchased']
            arr = arr.split(' ')
            arr = np.array(arr)
            if len(arr) == 0:
                print("Product Not Found")
                print("**You Might Like These Products**")
                for t in top5_rec:
                    print(t)
                return
            else:
                html_temp = """
                <div style="background-color:#0072bc;padding:10px">
                <h2 style="color:white;text-align:center;">Similar Products</h2>
                </div>
                """
                st.markdown(html_temp,unsafe_allow_html=True)
                counter, indarr = getsimilar(arr)
                #print(counter,"nodes have been removed from the graph")
                print(pro_dict)
        else:
            flag = 1
            print("Empty node has been removed from the graph!")

In [38]:
# Function to Show Titles
def showtitles(array):
    titles = []
    for i in range(len(array)):
        titles.append(G_Rec.nodes[array[i]]['title'])
    return titles

In [39]:
# Function to Show ASIN
def showasin(array):
    asins = []
    for i in range(len(array)):
        asins.append(G_Rec.nodes[array[i]]['ASIN'])
    asins = np.array(asins)
    #asins = asins.astype(int)
    return asins

In [40]:
# Function to Find Jaccard Similarity Between 2 Lists
def jaccard_similarity(list1, list2):
    intersection = len(list(set(list1).intersection(list2)))
    union = (len(set(list1)) + len(set(list2))) - intersection
    return float(intersection) / union

In [41]:
# Getting Highest Jaccard Similarity using jaccard function
def gethighestjaccard(pro_id, finalresult):
  jaccdict = {}    
  for i in range(0,len(finalresult)):     
    if(pro_id != i):      
      tempneighbours = []
      resarray = getclean(finalresult[i])        
      jaccdict[finalresult[i]] = jaccard_similarity(finalresult, resarray)
  return jaccdict

In [42]:
# Main Function for Recommended Nodes
def check_rec(val):
    if flag == 1:
        print("Empty Node has been removed from the graph")
        return
    val = val[0]
    if val == None:
        print("Hello")
    else:
        temp = int(val)
        if(temp in n_rec):
            pro_id = temp
            finalresult = []
            resultarray = np.unique(getclean(pro_id))
            for i in range(1, len(resultarray)):
                if(resultarray[i] in n_rec):
                    finalresult.append(resultarray[i])
                else:
                    print("Empty node has been removed from the graph!")
    t = showtitles(finalresult)
    finaldictjaccard = gethighestjaccard(pro_id, finalresult)
    finaldictjaccard = dict(sorted(finaldictjaccard.items(), key=lambda item: item[1], reverse = True))
    top5jac = np.array(list(finaldictjaccard.keys()))[:5]
    print("   ")
    print("Recommended Products Are:")
    values = showtitles(top5jac)
    for top5 in values:
        print(top5)

In [43]:
# Jaccard Function
def jaccard(a, b):
    a = set((str(a)).split())
    c = a.intersection(b)
    return float(len(c)) / (len(a) + len(b) - len(c))

In [44]:
def prod_label_recomm(x_counter):
    if flag == 1:
        st.write("Empty Node has been removed from the graph")
        return
    else:
        if x_counter == None:
             print("Hello")
        else:
            # get label of asin
            df_counter = df1.loc[df1['ASIN'] == x_counter]
            x = int(df_counter['label_code'])
            y = set((str(df_counter['Categories'])).split())
            df_counter = df1.loc[df1['label_code'] == x]
            df_counter = df_counter.loc[df_counter['AvgRating']>=4.5]
            df_counter['score_cat_inter']= df_counter['Categories'].apply(lambda x: jaccard(x,y))
            sorted_df = df_counter.sort_values(["score_cat_inter"], ascending=False)
            return sorted_df[1:6]['ASIN'].tolist()


In [45]:
def show(values):
    t = []
    for i in range(len(values)):
        t.append(df1['Title'][i])
    return t

###Main Function to show solutions to tasks

In [46]:
# Main Function to Wrap all Things
def main():
    
    # Similar Products
    val = str(input("Enter Product Name"))
    iarray = search(val)
    iarray_names = []
    if len(iarray) == 0:
        print("Product Does Not Exist")
        print("Product Not Found")
        print("You Might Like These Products")
        for t in top5_rec:
            print(t)
        return
    else:
        for i in range(0, len(iarray)):
            if(iarray[i] in n_sim):
                pro_id = int(iarray[i])
                iarray_names.append(G_Sim.nodes[iarray[i]]['Title'])
        name = st.selectbox("Select Product", iarray_names)
        print("    ")
        prod_id = search(name)
        if len(prod_id) == 0:
            print("Product Not Found")
            print("You Might Like These Products")
            for t in top5_rec:
                print(t)
            return
        check_sim(prod_id)

    # Recommended Products
    if len(iarray) == 0:
        print("Product Does Not Exist")
        print("Product Not Found")
        print("You Might Like These Products")
        for t in top5_rec:
            print(t)
        return
    else:
        for i in range(0, len(iarray)):
            if(iarray[i] in n_sim):
                pro_id = int(iarray[i])
                iarray_names.append(G_Rec.nodes[iarray[i]]['title'])
        prod_id = search(name)
        if len(prod_id) == 0:
            print("Product Not Found")
            print("You Might Like These Products")
            for t in top5_rec:
                print(t)
            return
        check_rec(prod_id)

    # Clustering Products
    asin_val = (showasin(prod_id))
    print("  ")
    print("Customer Might Also Like")
    for val in asin_val:
        values = (prod_label_recomm(val))
    val = show(values)
    if len(val) == 0:
        print("Product Not Found")
        print("You Might Like These Products")
        for t in top5_rec:
            print(t)
        return
    for v in val:
        print(v)

    print(" ")
    print(" ")
    print("Top Products of Cluster 1")
    for title in title_cluster0:
        print(title)
    
    print(" ")
    print(" ")
    print("Top Products of Cluster 2")
    for title in title_cluster1:
        print(title)

    print(" ")
    print(" ")
    print("Top Products of Cluster 3")
    for title in title_cluster2:
        print(title)

if __name__=='__main__':
    main()

Enter Product Namepython cookbook
    
  
Similar Products are:
Python in a Nutshell
Programming Python, Second Edition with CD
Python Essential Reference (2nd Edition)
   
   
{'Title': 'Python Cookbook', 'ASIN': '0596001673', 'Categories': 'python script tool comput internet book store publish develop reilli subject design orient object general softwar languag program specialti', 'Group': 'Book', 'Copurchased': '0596001886 0596000855 0735710910', 'SalesRank': 173364, 'TotalReviews': 13, 'AvgRating': 5.0, 'NoSim': 5}
   
Recommended Products Are:
I Got No Kick Against Modern Jazz
How to Beat Your Dad at Chess
The Berry Big Storm (All Aboard Reading. Station Stop 1)
Classical Zoo
Violin Concerto Op.14
  
Customer Might Also Like
Patterns of Preaching: A Sermon Sampler
Candlemas: Feast of Flames
World War II Allied Fighter Planes Trading Cards
Life Application Bible Commentary: 1 and 2 Timothy and Titus
Prayers That Avail Much for Business: Executive
 
 
Top Products of Cluster 1
Change