# Making The Recommendation System

Made by: Alexander Beaucage

Date: June 23 2023

Contact Info: Beaucagealex202@gmail.com

The goal of this notebook is to get a recommender running. To do this I will be using a association rules table, this will get songs that co-occur frequently.

In [None]:
# Import the librarys I need
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [None]:
# Get the directory for the data
datadir = r"../csv_files/cleandata.csv"
# Load in the data
data = pd.read_csv(datadir, index_col = 0)# Use index column to use the index on the data

In [None]:
# Take a look at the data
data.head()

I'm going to convert the 4 columns down to 2. I'll do this by combining `user_id`, `playlistname`, then `artistname`, and `trackname`. This is to get the unique playlists and songs, because there can be many songs and playlists with the exact same name.

In [None]:
# Making a new column for the song artist combination
new_col = []
# Loop through each item in data frame only getting the trackname and artistname
for item in data[["trackname","artistname"]].values:
    #print(item[0], item[1])
    new_col.append(str(item[0]) + " by " + str(item[1]))

In [None]:
# Adding the new column to the data set
data["song_artist"] = new_col

In [None]:
# Drop the old columns
data.drop(columns = ["artistname","trackname"], inplace = True)

In [None]:
# See if it looks right
data.head()

In [None]:
# Making a new column for the playlist user combinations
new_col = []
# Loop through each item in the data frame
for item in data[["playlistname","user_id"]].values:
    #print(item[0], item[1])
    new_col.append(str(item[0]) + " by " + str(item[1]))

In [None]:
# Add the new column to the data frame
data["playlist_user"] = new_col

In [None]:
# Drop the old columns
data.drop(columns = ["playlistname","user_id"], inplace = True)

In [None]:
# Take a look at the new data frame
data.head()

Now that I've got the columns sorted out I will be selecting songs that appear 10 or more times in the data.

In [None]:
# This making a list of songs that appear more then 10 times
songs10 = list(data["song_artist"].value_counts()[data["song_artist"].value_counts().values >= 10].index)

In [None]:
# Creating a selector where songs appear more than 10 times
selector = []
# Looping through each item in the song_artist column
for item in data["song_artist"].values:
    # If the item appeared more then 10 times
    if item in songs10:
        selector.append(True)
    # Else if the item appeared less than 10 times
    else:
        selector.append(False)

In [None]:
# Does the selector look right?
selector

In [None]:
# Filter down to the songs that appear 10 or more times
data = data[selector]

In [None]:
# How many unique playlists?
data["playlist_user"].unique().shape

In [None]:
# How may unique songs?
data["song_artist"].unique().shape

After reducing the size of the data by only using songs that appear 10 or more times it's time to make a association rules table.

In [None]:
# Get the playlist_users and get a list of the songs in the playlist
plgrouped = data.groupby("playlist_user")["song_artist"].apply(list)

In [None]:
# Take a look at the data to see if it makes sense
plgrouped.head()

In [None]:
# Instatiate the transaction encooder
te = TransactionEncoder()

# Fit the encoder to the grouped data
onehot = te.fit_transform(plgrouped)

# Make the fitted encoder into a data frame
onehot_df = pd.DataFrame(onehot, columns=te.columns_)

# Take a look at the encoded data
onehot_df.head()

In [None]:
x = apriori(onehot_df, 
            min_support=0.0001, 
            use_colnames=True, 
            max_len = 2, 
            low_memory = True)

# Create the associaton rules table
assorules = association_rules(x, metric="lift", min_threshold=1.0)

In [None]:
# Take a look at the association rules table I just made
assorules.head()

In [None]:
# How many rows of data?
assorules.shape

In [None]:
# Create a string of the alphabet with uppercase and numbers
alpha = "abcdefghijklmnopqrstuvwxyz"

# Adding the alphabet in uppercase, a space, and numbers
alpha += alpha.upper() + " " + "1234567890"

# Seeing if the output looks right
alpha

In [None]:
# This function takes in a frozen list and turns it into a string of alphanumeric characters
def remove_punctuation(string):
    # This is getting the string out of the frozen list by splitting on '
    string = str(string).split(r"'")[1]
    
    # Creating an empty list to return
    new = ""
    
    # Loop through each character in string
    for item in string:
        
        # If the item is a alphanumeric character
        if item in alpha:
            
            # Append it to the string 
            new += item
    
    # Return the string with only the alphanumeric characters of string
    return new

In [None]:
# Make the antecedents row strings insted of frozen lists
assorules["antecedents"] = assorules["antecedents"].apply(remove_punctuation)

In [None]:
# Use regular expression to get the song for a recommendation 
#selector = assorules["antecedents"].str.match("^One More Time.*$")

In [None]:
# Create a selector  for this song (Get recommendations for this song)
selector = (assorules["antecedents"] == "Magic by Coldplay")

In [None]:
# See what the top 10 recommendations are
assorules[selector].sort_values(by = "support", ascending = False)[0:10]

In [None]:
# Make a recommendation
randnum = np.random.randint(0,10)
assorules[selector].sort_values(by = "support", ascending = False)["consequents"].values[randnum]