# 220309 ~ 220310

## loss 마스킹을 하긴 했는데 loss 계산을 전부 포함시켜서 해버림. -> masking한 것들 제외하고 연산할 것.

# Purpose
The purpose of this notebook is to review [AutoRec: Autoencoders Meet Collaborative Filtering](https://users.cecs.anu.edu.au/~akmenon/papers/autorec/autorec-paper.pdf).

# Introduction

- Collaborative filtering models make personalised recommendations by using information about users' preferences for items.

- The paper propsed AutoRec, a novel CF based on autoencoder.

- The authors said that the proposed model has representational and computational advantages over other approaches.

# The AutoRec Model

- The basic setup of rating based collaborative filtering
  - $m$ : The number of users.
  - $n$ : The number of items.
  - $R \in \mathbb{R}^{m \times n}$ : A observerd user-item rating matrix which has many missing values.
  - $u \in U = {1,...,m}$ : Each user
    - a partially observed vector 
    $r^{(u)} = (R_{u1},...,R_{un}) \in \mathbb{R}^n$
  - $i \in I = {1,...,n}$
    - a partially observed vector 
    $r^{(i)}=(R_{1i},...,R_{mi}) \in \mathbb{R}^m$

## Purpose
The aim is to design an autoencoder which take each partially observed $r^{(i)} (r^{(u)})$ as input, project the input into a low latent space, and reconstruct ratings in the output space.




The autoencoder optimizes 

$$
min_{\theta} \sum_{r \in S}||r-h(r;\theta)||_2^2
$$

where

$$
h(r;\theta) = f(W \cdot g(Vr+\mu)+b)
$$

$\theta = \{W,V,\mu,b\}$
- $W \in \mathbb{R}^{d\times k} $ : Encoder weights 
- $V \in \mathbb{R}^{k\times d} $ : Decoder weights
- $\mu \in \mathbb{R}^{k}$ : Encoder bias
- $b \in \mathbb{R}^{d}$ : Decoder bias

The autoencoder has a single and k dimesional hidden layer.

# Two important details in the model

1. When updating the model, only observed ratings are considered.

2. To prevent overfitting on the observed ratings, $l2$ regularization loss is applied to $W$ and $V$.

$$
min_{\theta} \sum_{i=1}^{n}||r^{(i)} - h(r^{(i)};\theta)||_O^2 + \frac{\lambda}{2}\cdot(||W||_F^2+||V||_F^2)
$$

where $||\cdot||_O^2$ means only observed ratings are considered to update the model.

$$
\hat R_{ui} = h(r^{(i)};\hat\theta)_u
$$

In [3]:
import pandas as pd
import numpy as np
from zipfile import ZipFile
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from pathlib import Path
import matplotlib.pyplot as plt

In [5]:
movielens_data_file_url = (
    "http://files.grouplens.org/datasets/movielens/ml-latest-small.zip"
)
movielens_zipped_file = keras.utils.get_file(
    "ml-latest-small.zip", movielens_data_file_url, extract=False
)
keras_datasets_path = Path(movielens_zipped_file).parents[0]
movielens_dir = keras_datasets_path / "ml-latest-small"

# Only extract the data the first time the script is run.
if not movielens_dir.exists():
    with ZipFile(movielens_zipped_file, "r") as Zip:
        # Extract files
        print("Extracting all the files now...")
        Zip.extractall(path=keras_datasets_path)
        print("Done!")

ratings_file = movielens_dir / "ratings.csv"
df = pd.read_csv(ratings_file)

Extracting all the files now...
Done!


In [7]:
rating_matrix=pd.pivot(df,"userId","movieId","rating")

In [8]:
rating_matrix=rating_matrix.fillna(0)

In [9]:
rating_matrix=rating_matrix.T

마스킹 구현할 것.
아주 정확히 따질 거면 중간에 reg loss 계산하는 것도 다시 해야함.

https://keras.io/guides/training_with_built_in_methods/



https://keras.io/guides/customizing_what_happens_in_fit/




[이 사람은 loss를 custom해서 구했음.](https://github.com/supkoon/AutoRec-tf/blob/master/AutoRec.py)

# 아래가 내가 최초로 작성한 것. 잘못된 점을 찾아 고침

In [12]:
class AutoRec(keras.layers.Layer):
  def __init__(self, num_col=32, hidden_node=32,rate=10):
    super(AutoRec, self).__init__()
    self.num_col = num_col
    self.hidden_node = hidden_node
    #self.encoder_weight = tf.Variable((num_col,hidden_node),trainable = True)
    #self.decoder_weight = tf.Variable((hidden_node,num_col),trainable = True)
    #self.encoder_bias = tf.Variable((hidden_node,),trainable = True)
    #self.decoder_bias =tf.Variable((num_col,),trainable = True)
    self.encoder_weight = self.add_weight(
            shape=(num_col, hidden_node), initializer="random_normal", trainable=True
        )
    self.decoder_weight = self.add_weight(
            shape=(hidden_node, num_col), initializer="random_normal", trainable=True
        )
    self.encoder_bias = self.add_weight(
            shape=(hidden_node,), initializer="zero", trainable=True
        )
    self.decoder_bias =self.add_weight(
            shape=(num_col,), initializer="zero", trainable=True
        )
    self.rate = rate
    self.encoder_regularization = keras.regularizers.l2(rate*0.5)
    self.decoder_regularization = keras.regularizers.l2(rate*0.5)
    
  def call(self,input):
    x=tf.matmul(input,self.encoder_weight) + self.encoder_bias
    out=tf.matmul(x,self.decoder_weight) + self.decoder_bias

    self.add_loss(self.encoder_regularization(self.encoder_weight))
    self.add_loss(self.encoder_regularization(self.decoder_weight))

    #out=keras.activations.sigmoid(x)
    return out



### 1. 자꾸  Missing required positional argument가 발생해서 instantiation을 못하고 있었는데 tf.Variable을 잘못사용하고 있었음.

### 2. 생성자 overriding할때 super선언.
```
super(AutoRec,self).__init__()
```


### [이 부분 initial_value 지정에서 틀렸음. self.add_weight으로 고쳐 쓰는 것이 quicker shortcut이라고 함.](https://keras.io/guides/making_new_layers_and_models_via_subclassing/#the-layer-class-the-combination-of-state-weights-and-some-computation)
#### 아마도 initializer를 외부에서 설정하여 사용하지 않아도 된다는 점 때문일듯.
```
    self.encoder_weight = tf.Variable((num_col,hidden_node),trainable = True)
    self.decoder_weight = tf.Variable((hidden_node,num_col),trainable = True)
->
w_init은 초기화할 방법에 대해 미리 선언한 뒤에 아래와 같은 방식을 사용할 수 있음.

    self.encoder_weight = tf.Variable(
            initial_value=w_init(shape=(input_dim, units), dtype="float32"),
            trainable=True,
        )

```

```
self.encoder_weight = self.add_weight(
            shape=(num_col, hidden_node), initializer="random_normal", trainable=True
        )
self.decoder_weight = self.add_weight(
            shape=(hidden_node, num_col), initializer="random_normal", trainable=True
        )

```


### 3. Activation function 도 tf.Variable을 사용했었는데, add_weight으로 대체함.


### 4. Regularization term 추가해서 scale도 조절해야함.



## 앞에서 만든 layer를 포함하여 keras.Model을 subclassing 함.

### 일단 막 짜는데 이상한 느낌들었던 부분에서 멈췄음. 

#### tape이 어떻게 loss를 토대로 계산했는지 명확히 기억이 안 남
#### train step이 뭘 반환 했는지 기억이 안 남.

그래서 아래 예제를 가져와봄

```
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
```

이렇게 계산했었구나

train step에서 tape안에서 관측되는 것은 self (아마 다른 방식도 가능하겠지만)
```
y_pred=self(x,training=True)
```

layer subclassing 하는 것처럼 __init__ 과 call을 통해서 연산에 대한 디테일한 설정을 한 후에 train_step
```
class model(keras.Model):
  def __init__(self,num_col,hidden,rate):
    self.autoencoder = AutoRec(num_col,hidden,rate)

  def call(self,input):
    return self.autoencoder(input)

  def train_step(self,data):
    x,y = data
    mask = y !=0
    with tf.gradientTape as tape:
      y_pred=self(x,training=True)
      y_pred=tf.multiply(y_pred,mask)
      loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)

    trainable_vars = self.trainable_variables
    gradients = tape.gradient(loss, trainable_vars)
    self.optimizer.apply_gradients(zip(gradients, trainable_vars))
    
    self.compiled_metrics.update_state(y, y_pred)
    
    return {m.name: m.result() for m in self.metrics}
```

In [104]:
class model(keras.Model):
  '''에러 나서 이 방법 안 쓰기로 함
  공식문서 찾아봐도 이거랑 아래 조합해서 쓰는 예제가 없네..
  def __init__(self,num_col,hidden,rate):
    super(model,self).__init__()
    self.autoencoder = AutoRec(num_col,hidden,rate)

  def call(self,input):
    return self.autoencoder(input)
'''
  def train_step(self,data):
    x,y = data
    
    mask = y !=0
    
    mask=tf.cast(mask, tf.float32)
    
    with tf.GradientTape() as tape:
      y_pred=self(x,training=True)
     # print("###########y_pred############333",y_pred)
      #print(mask)
      
      y_pred=tf.multiply(y_pred,mask)
      y=tf.cast(y, tf.float32)
      #print(y_pred-y)
      #print(tf.reduce_sum(tf.math.square(y_pred-y),axis=1))
      #print(tf.reduce_sum(mask,axis =1))
      #print(tf.reduce_sum(tf.math.square(y_pred-y),axis=1)/tf.reduce_sum(mask,axis =1))
      
      # 아예 0인건 계산에서 빼버려야하는 거 아닌가? 맞음.
      #print(tf.reduce_sum(mask,axis =1))
      loss=tf.reduce_mean(tf.reduce_sum(tf.math.square(y_pred-y),axis=1)/tf.reduce_sum(mask,axis =1)) # loss custom 해봄.
      # 문제는 이렇게 tf.GradientTape 안에서 compiled_loss를 사용하지 않고 이렇게 써도 되는지 모르겠음.

      #print(tf.reduce_mean(tmp))
      #loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
    
    #print(loss)
    trainable_vars = self.trainable_variables
    gradients = tape.gradient(loss, trainable_vars)
    #print("###########gradients############",zip(gradients, trainable_vars))
    self.optimizer.apply_gradients(zip(gradients, trainable_vars))
    
    self.compiled_metrics.update_state(y, y_pred)
    print("\n========================================================")
    print(loss)
    print("========================================================\n")
    return {m.name: m.result() for m in self.metrics}

In [102]:
input=layers.Input(shape = (rating_matrix.shape[1]))
output= AutoRec(rating_matrix.shape[1],500,10)(input)
m = model(input,output)
m.compile(optimizer="adam", loss="mse", metrics=["mae"],run_eagerly=True)

In [None]:
m.fit(rating_matrix, rating_matrix, epochs=50)

In [107]:
rating_matrix

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,0.0,0.0,4.0,0.0,4.5,0.0,0.0,0.0,...,4.0,0.0,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0,0.0,0.0,...,0.0,4.0,0.0,5.0,3.5,0.0,0.0,2.0,0.0,0.0
3,4.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193583,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193585,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
193587,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [108]:
pd.DataFrame(m.predict(rating_matrix))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,600,601,602,603,604,605,606,607,608,609
0,5.844972,4.594985,1.525867,4.549614,3.731117,2.047221,4.924703,3.741935,3.002219,2.667315,...,3.210745,4.961127,3.029745,3.997644,3.240583,1.419571,3.897580,2.589986,2.760815,3.974326
1,2.810694,3.138782,1.064634,3.561512,2.369931,4.023575,2.720996,2.523205,3.181528,0.749501,...,3.326495,3.978842,1.504849,4.444207,3.637628,0.207120,3.223299,2.144930,1.701288,1.247058
2,4.213321,2.215713,0.342085,-0.013551,1.875339,5.563408,1.906689,2.049202,1.330980,0.191644,...,3.185682,2.378269,-0.348938,3.034824,1.488632,0.675076,3.161333,1.930621,0.959571,1.528508
3,1.248401,0.823507,0.512388,0.456561,0.989806,2.818503,0.802297,0.875989,0.315764,0.100902,...,1.190710,1.024129,-0.337405,0.945827,-0.090366,-0.309352,0.791049,0.137053,0.877094,-0.430943
4,1.773895,2.392127,1.185333,1.647378,2.181593,5.011390,1.129585,2.101210,2.645389,0.136154,...,2.968727,2.129581,-0.645748,3.138445,0.135776,1.570883,3.757101,0.588425,1.877959,2.476816
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9719,0.963879,1.190617,0.690791,0.624387,0.865361,0.721183,0.464213,0.681566,0.328523,0.159483,...,1.178959,0.750882,0.419192,0.759898,0.146504,0.337749,0.734244,0.100395,0.647216,0.044765
9720,0.961995,1.150426,0.658937,0.592730,0.846558,0.698676,0.463599,0.677393,0.325936,0.165043,...,1.153136,0.770852,0.371332,0.752576,0.150548,0.313324,0.758904,0.106619,0.671012,0.045473
9721,0.961995,1.150426,0.658937,0.592730,0.846558,0.698676,0.463599,0.677393,0.325936,0.165043,...,1.153136,0.770852,0.371332,0.752576,0.150548,0.313324,0.758904,0.106619,0.671012,0.045473
9722,0.961995,1.150426,0.658937,0.592730,0.846558,0.698676,0.463599,0.677393,0.325936,0.165043,...,1.153136,0.770852,0.371332,0.752576,0.150548,0.313324,0.758904,0.106619,0.671012,0.045473


In [None]:
m.predict(rating_matrix.iloc[0:1,:])