# MyAnimeList User Recommendations
* This is a signal based off of MyAnimeList user-submitted 
  recommendations. (See https://myanimelist.net/recommendations.php?s=recentrecs&t=anime)
* The predicted score for a series is the weighted sum over all recommended series that
  the user has seen
* To get the weight between two series, we first construct the undirected adjacency graph of recommendations
* Then, we normalize each edge (i->j) by dividing by degree of i and of j.
* We raise the adjacency matrix to a given power, normalizing at each step, to reduce sparsity
* Finally we apply the transformation weight -> weight^alpha for some fixed alpha

In [1]:
# CHANGE THIS PARAMETER
recommendee = "taapaye"

In [2]:
import os
import pickle

import numpy as np
import pandas as pd
from tqdm import tqdm

In [3]:
outdir = f"../../data/recommendations/{recommendee}"
os.chdir(outdir)

In [4]:
# parameters chosen by cross-validation
parameters = pickle.load(open('parameters/maluserrec.best.pkl', 'rb'))
power = parameters['power'].squeeze()
α = 1
symmetric_recommendations = True
signal_name = f"maluserrec"

In [5]:
anime = pd.read_csv("../../cleaned_data/anime.csv")

In [6]:
user_df = pickle.load(open("user_anime_list.pkl", "rb"))

In [7]:
rec_df = pickle.load(open(f"../../processed_data/mal_user_recs_pow_{power}.pkl", "rb"))

In [8]:
rec_df["weight"] = rec_df["weight"] ** α

In [9]:
user_recs = (
    user_df.set_index("anime_id")
    .merge(rec_df, left_on=["anime_id"], right_on=["source"])
    .drop("source", axis=1)
    .rename({"target": "anime_id"}, axis=1)
)

In [10]:
pred_scores = user_recs.groupby("anime_id").apply(
    lambda x: np.dot(x["score"], x["weight"]) / x["weight"].sum()
)

In [11]:
pred_vars = user_recs.groupby("anime_id").apply(
    lambda x: np.dot(x["score_var"], x["weight"] ** 2) / (x["weight"]).sum() ** 2
)
# TODO apply a bessel correction

In [12]:
signal = pd.DataFrame()
signal["delta"] = pred_scores
signal["delta_var"] = pred_vars

In [13]:
# This signal does not use the rating of an item
# in its prediction for the score of that item
# so there are no overfitting concerns
signal.to_pickle(f"{signal_name}_loocv.pkl")
signal.to_pickle(f"{signal_name}.pkl")

In [14]:
import functools
import scipy.stats as st
import statsmodels.formula.api as smf
@functools.wraps(smf.ols)
def lm(*args, **kwargs):
    return smf.ols(*args, **kwargs).fit()

In [15]:
pred_df = user_df.merge(signal, on = 'anime_id', how='left').fillna(0)

In [16]:
print(lm('score ~ delta ', pred_df).summary())

                            OLS Regression Results                            
Dep. Variable:                  score   R-squared:                       0.049
Model:                            OLS   Adj. R-squared:                  0.046
Method:                 Least Squares   F-statistic:                     17.78
Date:                Sat, 29 May 2021   Prob (F-statistic):           3.17e-05
Time:                        08:17:58   Log-Likelihood:                -582.19
No. Observations:                 349   AIC:                             1168.
Df Residuals:                     347   BIC:                             1176.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.1149      0.088     -1.307      0.1

### 