# LightGBM model for influence marketing
Using a LightGBM model for recommending influencers to companies and brands can help identify influencers who are likely to have a positive impact on the target audience and improve the effectiveness of the influence marketing campaign. 


In this notebook, we will conduct the following steps:

1. Collect data about the influencers and their social media activity, and data about companies and brands.

2. Preprocess the data.

3. Define the problem: Relate and give a score to the relationships between companies and influencers.

4. Build the model. The model can be trained using the collected data and can then be used to make recommendations.

5. Evaluate the model.

6. Use the model for influence marketing. Once the model is trained and evaluated, use it to make personalized recommendations to companies and brands, suggesting a list of influencers who are a good fit for the brand and campaign.

In further steps, we will monitor and adjust the model (updating the data, retraining the model, or tweaking the parameters).

### Model theoretical explanation
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

Faster training speed and higher efficiency.

Lower memory usage.

Better accuracy.

Support of parallel, distributed, and GPU learning.

Capable of handling large-scale data.

#### Modelling approach
For further mathematical explanation, please go to https://github.com/microsoft/LightGBM/blob/master/docs/Features.rst

### 1. Import Libraries

In [1]:
import numpy as np
import pandas as pd
import random
import os
import ast

import lightgbm as lgbm # Mac users require cmake & libomp to import lightgbm
from sklearn.metrics import precision_score, recall_score, f1_score, average_precision_score, ndcg_score
from sklearn.model_selection import train_test_split
from collections import Counter
from dotenv import load_dotenv

from sklearn.metrics import (
    accuracy_score, 
    confusion_matrix, 
    roc_auc_score
)
from sklearn.ensemble import RandomForestClassifier

### 2. Retrieve Data

#### Load Dataset

In [2]:
# Upload dataset with interacions
df = pd.read_csv('C:/Users/manue/OneDrive - IE Students/Escritorio/BCSAI 3º/Chatbots & Recomendation Engines/influence_marketing/influence_marketing_reco/datasets/data1145rows.csv')
df

Unnamed: 0,itemID,user_followers,user_likes_mean,user_eng_rate,brand_likes_mean,brand_followers,userID,rating,user_category_art_design,user_category_belleza,...,飛,鯉,鹿,꽃,랙,블,수,지.1,크.1,핑
0,1,824122,26022.720,0.031576,1871.40,4172512,1,3.831507e-08,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,824122,26022.720,0.031576,1504.80,4228417,2,3.831507e-08,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,824122,26022.720,0.031576,155061.90,2041166,3,3.831507e-08,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,824122,26022.720,0.031576,12396.00,11110475,4,3.831507e-08,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,824122,26022.720,0.031576,1851.95,1932665,5,3.831507e-08,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207013,2226,804472,63443.316,0.078863,7758.00,6308799,89,1.000000e+01,0,0,...,0,0,0,0,0,0,0,0,0,0
207014,2226,804472,63443.316,0.078863,9510.10,9416439,90,9.803113e-08,0,0,...,0,0,0,0,0,0,0,0,0,0
207015,2226,804472,63443.316,0.078863,744.25,1786934,91,1.000000e+01,0,0,...,0,0,0,0,0,0,0,0,0,0
207016,2226,804472,63443.316,0.078863,4101.10,7418292,92,1.000000e+01,0,0,...,0,0,0,0,0,0,0,0,0,0


As some of the variables are not in the desired format, we will transform them to be able to work with them.

### 3. Feature Engeneering

In [8]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [10]:
df.dtypes.set_option('display.max_rows', None)

AttributeError: 'Series' object has no attribute 'set_option'

### 4. Fit the Model

In [3]:
# split train and test
X = df.drop("rating", axis=1)
y = df["rating"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.30, random_state=43
)

In [4]:
model = lgbm.LGBMRegressor()
model.fit(X_train, y_train)
print(); print(model)

LightGBMError: Do not support special JSON characters in feature name.