# LightFM model for influence marketing
Using a LightFM model for recommending influencers to companies and brands can help identify influencers who are likely to have a positive impact on the target audience and improve the effectiveness of the influence marketing campaign. 


In this notebook, we will conduct the following steps:

1. Collect data about the influencers and their social media activity, and data about companies and brands.

2. Preprocess the data.

3. Define the problem: Relate and give a score to the relationships between companies and influencers.

4. Build the model. The model can be trained using the collected data and can then be used to make recommendations.

5. Evaluate the model.

6. Use the model for influence marketing. Once the model is trained and evaluated, use it to make personalized recommendations to companies and brands, suggesting a list of influencers who are a good fit for the brand and campaign.

In further steps, we will monitor and adjust the model (updating the data, retraining the model, or tweaking the parameters).

### Model theoretical explanation
In general, recommendation models can be divided into two categories: content-based and collaborative filtering. Content-based models recommend based on the similarity of items or users using their description or metadata. Collaborative filtering models compute the latent factors of users and items based on the assumption that people who express similar opinions on one item will have similar opinions on other items. The choice between the two models depends on data availability, with collaborative filtering being effective when sufficient ratings or feedbacks are available, and content-based models being useful when there is a lack of ratings and metadata is available. Content-based models are also useful for addressing cold-start issues.

To address cold-start issues, hybrid approaches have been proposed that combine content-based and collaborative filtering. One such approach is the hybrid matrix factorization model.

**LightFM** is a Python implementation of a hybrid recommendation algorithm for both implicit and explicit feedbacks. It is a content-collaborative model that represents users and items as linear combinations of their content features’ latent factors. User and item embeddings are estimated for every feature, and these features are added together to be the final representations for users and items. LightFM computes the representation for each item and user by retrieving the corresponding row or column in the feature matrix and adding together the embeddings for features with non-zero weights. The resulting representations produce scores for every item for a given user, with highly scored items being more likely to be interesting to the user.

#### Modelling approach
For further mathematical explanation, please go to https://github.com/microsoft/recommenders/blob/main/examples/02_model_hybrid/lightfm_deep_dive.ipynb.

### 1. Import Libraries

In [11]:
# Install all the libraries in requirements.txt
import sys
import os

import itertools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scrapbook as sb
import requests
import io
import ast

import lightfm
from lightfm import LightFM
from lightfm.data import Dataset
from lightfm import cross_validation

# Import LightFM's evaluation metrics
from lightfm.evaluation import precision_at_k as lightfm_prec_at_k
from lightfm.evaluation import recall_at_k as lightfm_recall_at_k

# Import repo's evaluation metrics
from recommenders.evaluation.python_evaluation import precision_at_k, recall_at_k

from recommenders.utils.timer import Timer
from recommenders.datasets import movielens
from recommenders.models.lightfm.lightfm_utils import (
    track_model_metrics, prepare_test_df, prepare_all_predictions,
    compare_metric, similar_users, similar_items)

print("System version: {}".format(sys.version))
print("LightFM version: {}".format(lightfm.__version__))

System version: 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:59:51) [MSC v.1928 64 bit (AMD64)]
LightFM version: 1.16


### 2. Defining Variables

In [12]:
# default number of recommendations
K = 10
# percentage of data used for testing
TEST_PERCENTAGE = 0.25
# model learning rate
LEARNING_RATE = 0.25
# no of latent factors
NO_COMPONENTS = 20
# no of epochs to fit model
NO_EPOCHS = 20
# no of threads to fit model
NO_THREADS = 32
# regularisation for both user and item features
ITEM_ALPHA = 1e-6
USER_ALPHA = 1e-6

# seed for pseudonumber generations
SEED = 42

### 3. Retrieve Data

#### Load Dataset

In [52]:
# Upload dataset with influencers
influ_df = pd.read_csv('C:/Users/manue/OneDrive - IE Students/Escritorio/BCSAI 3º/Chatbots & Recomendation Engines/influence_marketing/influence_marketing_reco/datasets/influencers_df.csv')
influ_df = influ_df.rename(columns={"ID": "itemID"})
influ_df

Unnamed: 0,itemID,ID.1,user_Name,followers,user_Description,likes,post_Description,location,locations2,hashtags,post_Url,engagement_rate,hashtags_translated,all_descriptions,traducido,max_category,location3
0,1,1,0nlyfitgirls_,824122,⭕️nlyFitGirls\nBlogger\n12 years sharing the w...,26022.720,"@anabra7\n#OnlyFitGirls,@daniellebrandon7\n#On...",[],[],"['onlyfitgirls', 'onlyfitgirls', 'onlyfitgirls...","https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"['onlyfitgirls', 'onlyfitgirls', 'onlyfitgirls...",⭕️nlyfitgirl blogger hare athlet photograph...,⭕️nlyfitgirl blogger 12 year share work athlet...,finance_investing,[]
1,2,2,100de100marifet,898694,Öznur Uslu\n🇹🇷Türkiye-🇫🇷 Fransa\nYeni ve deta...,6380.100,Video’mu görenlerden ricam yoruma bir ❤️ koyab...,[],[],"['kiş', 'quiche']","https://www.instagram.com/p/ClWkdFID3TI/,https...",0.007099,"['kiş', 'quiche']",öznr l 🇹🇷türkiye-🇫🇷 frana yeni detaylı l tari...,öznur uslu 🇹🇷türkiye-🇫🇷 fransa yeni detaylı v...,fitness_sport,[]
2,3,3,100montaditos,214440,𝟭𝟬𝟬 𝗠𝗢𝗡𝗧𝗔𝗗𝗜𝗧𝗢𝗦\nDepende de ti cómo te lo monte...,319.900,"¿Cuál es el tuyo?\n*Todos llevan kiko crushed,...",[],[],[],"https://www.instagram.com/p/Cqi3186NCvJ/,https...",0.001492,[],𝟭𝟬𝟬 𝗠𝗢𝗡𝗧𝗔𝗗𝗜𝗧𝗢𝗦 depend cómo mont 🔥 tiktok : 00m...,𝟭𝟬𝟬 𝗠𝗢𝗡𝗧𝗔𝗗𝗜𝗧𝗢𝗦 Depend how Mont 🔥 Tiktok: 100mo...,social,[]
3,4,4,12storeez,893972,12 STOREEZ\nTienda de ropa\nКоллекция для мужч...,7082.600,"Мы уверены: гардероб — это больше, чем просто ...","['Россия', 'Россия', 'Italia', 'France']",[],[],"https://www.instagram.com/p/CqxI42KMV1Y/,https...",0.007923,[],toreez tienda ropa коллекция мужчин : @ toree...,12 Storeez Tienda Ropa Collection of men: @ 12...,parenting_family,"['Россия', 'Россия', 'Italia', 'France']"
4,5,5,1gpmuthu,663592,GPMuthu 24 🔘\nArtist\nBiggboss S6\nCookuwithco...,127999.976,"@kuraishi_the_entertainer,சமயபுரம் மாரியம்மன் ...",['India'],[],[],"https://www.instagram.com/p/CqVM8-APGCs/,https...",0.192890,[],gpmth 🔘 artit biggbo 6 cookwithali bi enqiri...,gpmuthu 24 🔘 artist biggboss s6 cookuwithcomal...,fashion,['India']
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2221,2222,2222,zubairatukhugov,51954,Zubaira Tukhugov\nUFC Fighter 19-4 Record\n•\n...,35503.232,Нереальный вкус @gorillaenergy @gorillafightin...,[],[],"['дубай', 'gorillaenergy', 'gorillafighting']","https://www.instagram.com/p/CqazvpjMM5l/,https...",0.683359,"['дубай', 'gorillaenergy', 'gorillafighting']",zbaira tkhv fc fighter 9- record • вопросам со...,Zubaira Tukhugov UFC Fighter 19-4 Record • Wri...,parenting_family,[]
2222,2223,2223,zubbymichael,5749589,zubby michael\nActor\nhttps://youtu.be/drTnOhf...,251222.500,Midnight cruise 🔥GOD is real #ZM #A1 #doings #...,[],[],"['zm', 'a1', 'doings', 'nawedeyhere', 'blessup...","https://www.instagram.com/p/CqMJCMXjlCF/,https...",0.043694,"['zm', 'a1', 'doings', 'nawedeyhere', 'blessup...",zbbi michael actor http : //yot.be/drtnohfzafa...,zubbi michael actor http : //youtu.be/drtnohfz...,travel,[]
2223,2224,2224,zvezdegranda,408734,Zvezde Granda\nTV Programme\ngrand.nova.rs,28208.856,Sa kim to @voja.nedeljkovic obara ruke u Zvezd...,[],[],"['zvezdegranda', 'zvezdegranda']","https://www.instagram.com/p/CqV3wt4KAq-/,https...",0.069015,"['zvezdegranda', 'zvezdegranda']",zvezd granda tv programm grand.nova.r kim @ vo...,Star Granda TV Programm Grand.nova.r Kim @ voj...,social,[]
2224,2225,2225,zyzzmad,894950,Aziz Zyzz shavershian\nComunidad\n| motivation...,16446.000,"nan,Zyzz last vlog\n.\n.\n👉 @zyzzmad Follow : ...",[],[],"['zyzzpose', 'zyzz', 'zyzzmotivation', 'zyzzle...","https://www.instagram.com/p/CoppL0cIqJm/,https...",0.018376,"['zyzzpose', 'zyzz', 'zyzzmotivation', 'zyzzle...",aziz zyzz haverhian nidad | motiv pot everi |...,aziz zyzz shavershian comunidad | motiv post e...,music,[]


In [56]:
# Upload dataset with brands
brand_df = pd.read_csv('C:/Users/manue/OneDrive - IE Students/Escritorio/BCSAI 3º/Chatbots & Recomendation Engines/influence_marketing/influence_marketing_reco/datasets/brands_df.csv')
brand_df['ID_brand'] = range(1, len(brand_df) + 1)
brand_df = brand_df.rename(columns={"ID_brand": "userID"})
brand_df

Unnamed: 0,brand_name,brand_name.1,brand_description,brand_likes,brand_post_description,all_descriptions,descriptions_translated,hashtags,category,hashtags_translated,followers,location,userID
0,abcnetwork,abcnetwork,ABC\nAmerica’s Network.\nabc.com,1871.40,Who will make Top 24!? Tune in to #AmericanIdo...,ABC\nAmerica’s Network.\nabc.com Who will make...,ABC\nAmerica’s Network.\nabc.com Who will make...,"['americanidol', 'americanidol', 'americanidol...",social,"['americanidol', 'americanidol', 'americanidol...",4172512,Pakistan,1
1,acerojoaquintorres,acerojoaquintorres,Joaquin Torres\nArchitectural designer\nA-cero...,1504.80,"Esta es, sin duda, una de las viviendas más gr...",Joaquin Torres\nArchitectural designer\nA-cero...,Joaquin Torres\nArchitectural Designer\nSteel\...,"['sitevisit', 'luxuryhomes', 'acerojoaquintorr...",social,"['Sitevisit', 'Luxuryhomes', 'Steeljoaquintorr...",4228417,Turkey,2
2,adidas,adidas,adidas\n#ImpossibleIsNothing\nlinktr.ee/adidas,155061.90,HIM 📸 @jharden13\n\n#HardenVol7 #adidasBasketb...,adidas\n#ImpossibleIsNothing\nlinktr.ee/adidas...,Adidas\n#Impossibleisnoting\nlinktr.ee/adidas ...,"['hardenvol7', 'adidasbasketball', 'impossible...",social,"['hardenvol7', 'adidasbasketball', 'impossible...",2041166,United Kingdom,3
3,airfrance,airfrance,Air France\n✈️ Welcome on board!\nPost your be...,12396.00,Getting ready for departure 😎\nWhich destinati...,Air France\n✈️ Welcome on board!\nPost your be...,Air France\n✈️ Welcome on board!\nPost your be...,"['airfrance', 'airplane', 'avgeek', 'aviation'...",social,"['airfrance', 'airplane', 'avgeek', 'aviation'...",11110475,Tanzania,4
4,amazon,amazon,Amazon\nRetail company\nCurrently manifesting ...,1851.95,"📍Okinawa, Japan 📍Murano, Italy 📍Mackinac Islan...",Amazon\nRetail company\nCurrently manifesting ...,Amazon\nRetail company\nCurrently manifesting ...,"['blackisremarkable', 'itsonprime', 'blackisre...",social,"['blackisremarkable', 'itsonprime', 'blackisre...",1932665,DR Congo,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,virginatlantic,virginatlantic,Virgin Atlantic\nWe’ve always championed indiv...,7758.00,"Our iconic Vivienne Westwood uniforms, designe...",Virgin Atlantic\nWe’ve always championed indiv...,Virgin Atlantic\nWe’ve always championed indiv...,"['seetheworlddifferently', 'seetheworlddiffere...",social,"['seetheworlddifferently', 'seetheworlddiffere...",6308799,South Africa,89
89,volkswagen,volkswagen,Volkswagen\nCars\nWelcome to our official chan...,9510.10,World premiere of our new ID. 2all.\n\n#concep...,Volkswagen\nCars\nWelcome to our official chan...,Volkswagen\nCars\nWelcome to our official chan...,"['conceptcar', 'worldpremiere', 'vwid2all', 'v...",parenting_family,"['conceptcar', 'worldpremiere', 'vwid2all', 'v...",9416439,South Korea,90
90,wholesome_for.you,wholesome_for.you,wholesome memes\n💝 I try to make your relation...,744.25,"Tag someone ❤️\n-\n\n.\nDM for Credit\n.\n.,Ta...",wholesome memes\n💝 I try to make your relation...,wholesome memes\n💝 I try to make your relation...,[],social,[],1786934,Bangladesh,91
91,williamssonoma,williamssonoma,Williams Sonoma\nShopping & retail\n📷 Share yo...,4101.10,"Make these Chicken, Spinach and Gruyère Turnov...",Williams Sonoma\nShopping & retail\n📷 Share yo...,Williams Sonoma\nShopping & retail\n📷 Share yo...,"['weeknightdinner', 'makeahead', 'kidslunch', ...",social,"['weeknightdinner', 'makeahead', 'kidslunch', ...",7418292,United States,92


In [46]:
# Upload dataset with interacions
df = pd.read_csv('C:/Users/manue/OneDrive - IE Students/Escritorio/BCSAI 3º/Chatbots & Recomendation Engines/influence_marketing/influence_marketing_reco/datasets/influencer_brand_df.csv')
df.head()

Unnamed: 0,itemID,user_name,user_followers,user_likes_mean,user_links,user_eng_rate,user_hashtags_en,user_category,user_country,brand_name,brand_likes_mean,brand_category,brand_hashtags_en,brand_followers,brand_country,userID,rating
0,1,0nlyfitgirls_,824122,26022.72,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"['onlyfitgirls', 'onlyfitgirls', 'onlyfitgirls...",finance_investing,[],abcnetwork,1871.4,social,"['americanidol', 'americanidol', 'americanidol...",4172512,Pakistan,1,3.831507e-08
1,1,0nlyfitgirls_,824122,26022.72,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"['onlyfitgirls', 'onlyfitgirls', 'onlyfitgirls...",finance_investing,[],acerojoaquintorres,1504.8,social,"['Sitevisit', 'Luxuryhomes', 'Steeljoaquintorr...",4228417,Turkey,2,3.831507e-08
2,1,0nlyfitgirls_,824122,26022.72,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"['onlyfitgirls', 'onlyfitgirls', 'onlyfitgirls...",finance_investing,[],adidas,155061.9,social,"['hardenvol7', 'adidasbasketball', 'impossible...",2041166,United Kingdom,3,3.831507e-08
3,1,0nlyfitgirls_,824122,26022.72,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"['onlyfitgirls', 'onlyfitgirls', 'onlyfitgirls...",finance_investing,[],airfrance,12396.0,social,"['airfrance', 'airplane', 'avgeek', 'aviation'...",11110475,Tanzania,4,3.831507e-08
4,1,0nlyfitgirls_,824122,26022.72,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"['onlyfitgirls', 'onlyfitgirls', 'onlyfitgirls...",finance_investing,[],amazon,1851.95,social,"['blackisremarkable', 'itsonprime', 'blackisre...",1932665,DR Congo,5,3.831507e-08


As some of the variables are not in the desired format, we will transform them to be able to work with them.

In [14]:
df['user_hashtags_en'] = df['user_hashtags_en'].apply(lambda x: ast.literal_eval(x))
df['user_country'] = df['user_country'].apply(lambda x: ast.literal_eval(x))
df['brand_hashtags_en'] = df['brand_hashtags_en'].apply(lambda x: ast.literal_eval(x))

In [34]:
df.head

Unnamed: 0,itemID,user_name,user_followers,user_likes_mean,user_links,user_eng_rate,user_hashtags_en,user_category,user_country,brand_name,brand_likes_mean,brand_category,brand_hashtags_en,brand_followers,brand_country,userID,rating
0,1,0nlyfitgirls_,824122,26022.720,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"[onlyfitgirls, onlyfitgirls, onlyfitgirls, onl...",finance_investing,[],abcnetwork,1871.40,social,"[americanidol, americanidol, americanidol, the...",4172512,Pakistan,1,3.831507e-08
1,1,0nlyfitgirls_,824122,26022.720,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"[onlyfitgirls, onlyfitgirls, onlyfitgirls, onl...",finance_investing,[],acerojoaquintorres,1504.80,social,"[Sitevisit, Luxuryhomes, Steeljoaquintorres, A...",4228417,Turkey,2,3.831507e-08
2,1,0nlyfitgirls_,824122,26022.720,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"[onlyfitgirls, onlyfitgirls, onlyfitgirls, onl...",finance_investing,[],adidas,155061.90,social,"[hardenvol7, adidasbasketball, impossibleisnot...",2041166,United Kingdom,3,3.831507e-08
3,1,0nlyfitgirls_,824122,26022.720,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"[onlyfitgirls, onlyfitgirls, onlyfitgirls, onl...",finance_investing,[],airfrance,12396.00,social,"[airfrance, airplane, avgeek, aviation, travel...",11110475,Tanzania,4,3.831507e-08
4,1,0nlyfitgirls_,824122,26022.720,"https://www.instagram.com/p/CqbVLmrjEKV/,https...",0.031576,"[onlyfitgirls, onlyfitgirls, onlyfitgirls, onl...",finance_investing,[],amazon,1851.95,social,"[blackisremarkable, itsonprime, blackisremarka...",1932665,DR Congo,5,3.831507e-08
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207013,2226,zzdaozz,804472,63443.316,"https://www.instagram.com/p/Cdk4RPsFXpK/,https...",0.078863,"[mcdaoke, tpopstageshowxthaifestivaljp, thaife...",social,[japan],virginatlantic,7758.00,social,"[seetheworlddifferently, seetheworlddifferentl...",6308799,South Africa,89,1.000000e+01
207014,2226,zzdaozz,804472,63443.316,"https://www.instagram.com/p/Cdk4RPsFXpK/,https...",0.078863,"[mcdaoke, tpopstageshowxthaifestivaljp, thaife...",social,[japan],volkswagen,9510.10,parenting_family,"[conceptcar, worldpremiere, vwid2all, vwforthe...",9416439,South Korea,90,9.803113e-08
207015,2226,zzdaozz,804472,63443.316,"https://www.instagram.com/p/Cdk4RPsFXpK/,https...",0.078863,"[mcdaoke, tpopstageshowxthaifestivaljp, thaife...",social,[japan],wholesome_for.you,744.25,social,[],1786934,Bangladesh,91,1.000000e+01
207016,2226,zzdaozz,804472,63443.316,"https://www.instagram.com/p/Cdk4RPsFXpK/,https...",0.078863,"[mcdaoke, tpopstageshowxthaifestivaljp, thaife...",social,[japan],williamssonoma,4101.10,social,"[weeknightdinner, makeahead, kidslunch, sunday...",7418292,United States,92,1.000000e+01


### 4. Extract and prepare item features.

The influencers' hashtags will be used as the item metadata.

In [18]:
all_hashtags = []
for x in df['user_hashtags_en']:
  all_hashtags.append(x)

all_hashtags = sorted(list(set(itertools.chain.from_iterable(all_hashtags))))
all_hashtags

['001',
 '0192035898',
 '02',
 '0274journey',
 '09112022',
 '1',
 '1 2k',
 '10',
 '1000_. Thank God, safety',
 '1000_damullah_Amah Al -Salama',
 '100carataura',
 '100day',
 '100daysofmakeup',
 '100percent',
 '100persenberkah',
 '100persenuntukindonesia',
 '100thieves',
 '100yearsbmwmotorrad',
 '101',
 '1016',
 '1016industries',
 '101snbthacks',
 '10days5shows',
 '10kfollowme',
 '10likes',
 '10nov',
 '10yearchallenge',
 '10yearslater',
 '11',
 '1111 Studio Jakarta',
 '11yearsofishq',
 '120mm',
 '1245',
 '12anosrbr',
 '12thgen',
 '1320video',
 '13900',
 '13thgen',
 '13thrajab',
 '14',
 '14şubat',
 '15anosmcer',
 '15napésindulok',
 '15thshaban',
 '16',
 '175years',
 '17th',
 '18martçanakkalezaferi',
 '19',
 '19 Marzo',
 '19 years old',
 '1920s',
 '1990s',
 '1993houstonastrodomeselena',
 '1b',
 '1ballingolf',
 '1daytogo',
 '1jan2021',
 '1k',
 '1minmusic',
 '1of1clothing',
 '1ring1life1love',
 '1stdibs',
 '1stringversary',
 '1xbat',
 '1xbatsportinglines',
 '2',
 '2 months after birth',
 '20

### 5. Extract and prepare user features.

The brands' hashtags will be used as the user metadata.

In [19]:
all_hashtagsCo = []
for x in df['brand_hashtags_en']:
  all_hashtagsCo.append(x)

all_hashtagsCo = sorted(list(set(itertools.chain.from_iterable(all_hashtagsCo))))
all_hashtagsCo

['1',
 '2023oscars',
 '542',
 '5millionfollowers',
 '777F',
 '777radio',
 'AIRASIAIN Forms',
 'AIRASIAPROMO',
 'AIRASIASUPERAP',
 'ANA',
 'ANA Intercontinental Ishigaki Resort',
 'ANA Travelers',
 'AOJ',
 'Air Asia',
 'AirAsia Cabin Crew',
 'AirAsia Shop',
 'Airplane',
 'Airport',
 'Allnipponairways',
 'AnainterContinentalishigakiresort',
 'Anawings',
 'Andalusia',
 'Aomori Airport',
 'Archdesign',
 'Archdigest',
 'Archdily',
 'Archilover',
 'Archilovers',
 'Architect',
 'Architecture',
 'Architecture_hinter',
 'Architecturephotography',
 'Architektur',
 'Architeural',
 'Archoskar',
 'Australia',
 'Aviation',
 'AviationLovers',
 'Aviationdaily',
 'Babyliss',
 'Beoforeandafter',
 'Blackpink',
 'Blazerev',
 'BlueJay',
 'Boeing',
 'Boeing777F',
 'Bombardia',
 'BrandnewDay_ Challenge',
 'BrandnewDay_challenge',
 'CNP',
 'CNY2023',
 'CONSTRUCTION',
 'COPS',
 'Cabincrew',
 'Cargo -only machine',
 'Cartagena',
 'Cartierbaignoire',
 'Cartierfums',
 'Cartierwatchmaking',
 'Chalas',
 'ChaseMalib

Before fitting the LightFM model, we need to create an instance of Dataset which holds the interaction matrix.

The data is required to be converted into a Dataset instance and then create a user/item id mapping with the fit method.

In [20]:
dataset2 = Dataset()
dataset2.fit(df['userID'], 
            df['itemID'], 
            item_features=all_hashtags,
            user_features=all_hashtagsCo)

The hashtags are then converted into a item feature matrix using the build_item_features method as follows:

In [22]:
item_features = dataset2.build_item_features((x, y) for x,y in zip(df.itemID, df.user_hashtags_en))

The user occupations are then converted into an user feature matrix using the build_user_features method as follows:

In [24]:
user_features = dataset2.build_user_features((x, y) for x,y in zip(df.userID, df.brand_hashtags_en))

We will use cross_validation.random_train_test_split method to split the interaction data and splits it into two disjoint training and test sets.

Once the item and user features matrices have been completed, we build the interaction matrix and split the interactions into train and test sets as follows:

In [26]:
interactions2, weights2 = dataset2.build_interactions(df.iloc[:, [15,0,16]].values)

train_interactions2, test_interactions2 = cross_validation.random_train_test_split(
    interactions2, 
    test_percentage=TEST_PERCENTAGE,
    random_state=np.random.RandomState(SEED)
)

### 6. Fit the LightFM model with additional user and item features

The LightFM model will be using the weighted Approximate-Rank Pairwise (WARP) as the loss. It maximises the rank of positive examples by repeatedly sampling negative examples until a rank violation has been located. This approach is recommended when only positive interactions are present, as in our case.

In [27]:
model2 = LightFM(loss='warp', no_components=NO_COMPONENTS, 
                 learning_rate=LEARNING_RATE, 
                 item_alpha=ITEM_ALPHA,
                 user_alpha=USER_ALPHA,
                 random_state=np.random.RandomState(SEED)
                )

In [28]:
model2.fit(interactions=train_interactions2,
           user_features=user_features,
           item_features=item_features,
           epochs=NO_EPOCHS
           )

<lightfm.lightfm.LightFM at 0x1f68106d0d0>

### 7. Prepare model evaluation data

The evaluation data needs to be prepared in order to get them into a format consumable with this repo's evaluation methods. Firstly the train/test indices and id mappings are extracted using the new interations matrix as follows:

In [29]:
uids, iids, interaction_data = cross_validation._shuffle(
    interactions2.row, 
    interactions2.col, 
    interactions2.data, 
    random_state=np.random.RandomState(SEED)
)

uid_map, ufeature_map, iid_map, ifeature_map = dataset2.mapping()
cutoff = int((1.0 - TEST_PERCENTAGE) * len(uids))
test_idx = slice(cutoff, None)

The test dataframe is then constructed as follows:

In [30]:
with Timer() as test_time:
    test_df2 = prepare_test_df(test_idx, uids, iids, uid_map, iid_map, weights2)
print(f"Took {test_time.interval:.1f} seconds for prepare and predict test data.") 

Took 3.6 seconds for prepare and predict test data.


The predictions of all unseen user-item pairs can be prepared as follows:

In [31]:
with Timer() as test_time:
    all_predictions2 = prepare_all_predictions(df, uid_map, iid_map, 
                                              interactions=train_interactions2,
                                               user_features=user_features,
                                               item_features=item_features,
                                               model=model2,
                                               num_threads=NO_THREADS)

print(f"Took {test_time.interval:.1f} seconds for prepare and predict all data.")

Took 13.4 seconds for prepare and predict all data.


In [43]:
all_predictions2

Unnamed: 0,userID,itemID,prediction
0,1,2,-62.045940
1,1,5,-61.529671
2,1,9,-61.925331
3,1,10,-64.083961
4,1,16,-60.715229
...,...,...,...
51750,93,2208,-59.319656
51751,93,2213,-60.464909
51752,93,2214,-61.932323
51753,93,2217,-61.366215


In [61]:
predictions_final = pd.merge(pd.merge(all_predictions2, influ_df[['itemID', 'user_Name']], on='itemID', how='left'), brand_df[['userID', 'brand_name']], on='userID', how='left')

In [62]:
predictions_final

Unnamed: 0,userID,itemID,prediction,user_Name,brand_name
0,1,2,-62.045940,100de100marifet,abcnetwork
1,1,5,-61.529671,1gpmuthu,abcnetwork
2,1,9,-61.925331,30deepgrimeyy,abcnetwork
3,1,10,-64.083961,42psy42,abcnetwork
4,1,16,-60.715229,_.bracefacelaii._,abcnetwork
...,...,...,...,...,...
51750,93,2208,-59.319656,zaroutayoucef,ysl
51751,93,2213,-60.464909,zico,ysl
51752,93,2214,-61.932323,zionylennox,ysl
51753,93,2217,-61.366215,zoeabbasjackson,ysl


In [63]:
predictions_final.to_csv('predictions_final.csv', index=False)

In [None]:
predictions_final

In [35]:
all_predictions2[all_predictions2['userID']==89].sort_values(by=['prediction'], ascending=False)

Unnamed: 0,userID,itemID,prediction
49321,89,1634,-59.411480
48943,89,189,-59.603539
49216,89,1215,-59.613579
49465,89,2188,-59.747307
49391,89,1884,-59.842468
...,...,...,...
49318,89,1618,-65.272636
49412,89,1939,-65.478203
49268,89,1450,-65.535286
48996,89,408,-65.698486


### 8. Model evaluation and comparison

The predictive performance of the new model can be computed and compared with the previous model (which used only the explicit rating) as follows:

In [38]:
eval_precision2 = precision_at_k(rating_true=test_df2, 
                                rating_pred=all_predictions2, k=K)
eval_recall2 = recall_at_k(test_df2, all_predictions2, k=K)

print(
    "------ Using Repo's evaluation methods ------",
    f"Precision@K:\t{eval_precision2:.6f}",
    f"Recall@K:\t{eval_recall2:.6f}")

------ Using Repo's evaluation methods ------ Precision@K:	1.000000 Recall@K:	0.017995


### 9. Similar users and items

As the LightFM package operates based on latent embeddings, these can be retrieved once the model has been fitted to assess user-user and/or item-item affinity.

#### User affinity

The user-user affinity can be retrieved with the get_user_representations method from the fitted model as follows:

In [39]:
_, user_embeddings = model2.get_user_representations(features=user_features)
user_embeddings

array([[-0.43510783,  0.20015045,  0.19657445, ...,  0.08449354,
        -0.083403  ,  0.7011684 ],
       [-1.2330592 ,  0.02829643,  0.0611331 , ...,  0.09005737,
        -0.12180785,  0.38892084],
       [-0.78088444,  0.12318058,  0.53962827, ...,  1.3196372 ,
        -0.07472667,  0.06078026],
       ...,
       [-0.56059635,  0.12238348,  0.8459313 , ..., -0.2973754 ,
        -0.62205946,  0.5132858 ],
       [ 0.00434159, -0.32049945,  0.38239112, ...,  0.45412457,
        -0.93531203,  0.9800799 ],
       [ 0.26084453,  0.5688839 , -0.24061933, ...,  0.75206554,
        -0.85542417,  1.9509572 ]], dtype=float32)

In order to retrieve the top N similar users, we can use the similar_users from recommenders. For example, if we want to choose top 10 users most similar to the user 1:

In [41]:
similar_users(user_id=90, 
              user_features=user_features, 
              model=model2)

Unnamed: 0,userID,score
0,74,0.533359
1,86,0.442499
2,16,0.429372
3,44,0.35064
4,21,0.34493
5,11,0.333574
6,47,0.30347
7,75,0.293317
8,63,0.291378
9,31,0.256239


#### Item affinity

The item-item affinity can be retrieved with the get_item_representations method using the fitted model.

In [42]:
similar_items(item_id=10, 
              item_features=item_features, 
              model=model2)

Unnamed: 0,itemID,score
0,384,0.668584
1,1206,0.665505
2,859,0.642831
3,70,0.619522
4,1176,0.59932
5,1183,0.584614
6,1878,0.574631
7,892,0.564446
8,1245,0.558215
9,2040,0.543438


The predict() method of the model object takes a user ID and item ID as input and returns a predicted score for the user-item pair. To get the predicted scores for all items in the dataset for the specified user, we pass the user ID and an array of item IDs from 0 to num_items-1 to the predict() method. We then use np.argsort() to rank the items in descending order by predicted score, and take the top N items from the sorted list.

### 10. Recommendations

Demo Recommendations for a specific user (company), returning 10 top recommended items (influencers).

In [208]:
user_id = 1

# Get the predicted scores for all items
scores = model2.predict(user_id, np.arange(1000))
top_items = np.argsort(-scores)

# Print the top N items
N = 10
print("Top {} recommended items for user {}: {}".format(N, user_id, top_items[:N]))

Top 10 recommended items for user 1: [700 127 146 264 691 343 625 477 965 881]
