### DDTCDR VS SVD
This notebook is for validting the generality of the DDTCDR model.
- We use the original code and partial dataset publish on [github.com](https://github.com/lpworld/DDTCDR)
- The train test split we keep identical to the original code
- The evluation is the same as the original paper: MAE

### 1. Import Libs and datasets

In [1]:
import pandas as pd
from copy import deepcopy

In [2]:
book = pd.read_csv('../DDTCDR/book.csv')
movie = pd.read_csv('../DDTCDR/movie.csv')

### 2. Rating normalization

In [3]:
def normalize(ratings):
    """normalize into [0, 1] from [0, max_rating]"""
    ratings = deepcopy(ratings)
    max_rating = ratings.rating.max()
    ratings['rating'] = ratings.rating * 1.0 / max_rating
    return ratings

In [4]:
book = normalize(book)
movie = normalize(movie)

### 3. train test split

In [5]:
cut = 4 * len(book) // 5
train_book = book[:cut]
test_book = book[cut:]

cut_movie = 4 * len(movie) // 5
train_movie = movie[:cut_movie]
test_movie = movie[cut_movie:]

### 4.1 Use SVD on training Movie data and evluate on testing data

In [6]:
from surprise import SVD, Dataset, Reader
from surprise import accuracy
import random
import numpy as np

In [7]:
my_seed = 13
random.seed(my_seed)
np.random.seed(my_seed)
reader = Reader(rating_scale=(0,1))
training_data = Dataset.load_from_df(train_movie[['userId', 'itemId', 'rating']], reader)
testing_data = Dataset.load_from_df(test_movie[['userId', 'itemId', 'rating']], reader)
algo = SVD(verbose = True, n_factors = 10)
training_data = training_data.build_full_trainset()
testing_data = testing_data.build_full_trainset().build_testset()

algo.fit(training_data,)
training_eval = training_data.build_testset()
train_pre = algo.test(training_eval)
train_mae = accuracy.mae(train_pre, verbose=False)
test_pre = algo.test(testing_data)
test_mae = accuracy.mae(test_pre, verbose=False)

Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
Processing epoch 5
Processing epoch 6
Processing epoch 7
Processing epoch 8
Processing epoch 9
Processing epoch 10
Processing epoch 11
Processing epoch 12
Processing epoch 13
Processing epoch 14
Processing epoch 15
Processing epoch 16
Processing epoch 17
Processing epoch 18
Processing epoch 19


In [8]:
print(f"The only using in-domain movie data on SVD: {round(test_mae, 3)}")

The only using in-domain movie data on SVD: 0.147


### 4.2 Use SVD on training Book data and evluate on testing data

In [9]:
my_seed = 13
random.seed(my_seed)
np.random.seed(my_seed)
reader = Reader(rating_scale=(0,1))
training_data = Dataset.load_from_df(train_book[['userId', 'itemId', 'rating']], reader)
testing_data = Dataset.load_from_df(test_book[['userId', 'itemId', 'rating']], reader)
algo = SVD(verbose = True, n_factors = 10)
training_data = training_data.build_full_trainset()
testing_data = testing_data.build_full_trainset().build_testset()

algo.fit(training_data,)
training_eval = training_data.build_testset()
train_pre = algo.test(training_eval)
train_mae = accuracy.mae(train_pre, verbose=False)
test_pre = algo.test(testing_data)
test_mae = accuracy.mae(test_pre, verbose=False)

Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
Processing epoch 5
Processing epoch 6
Processing epoch 7
Processing epoch 8
Processing epoch 9
Processing epoch 10
Processing epoch 11
Processing epoch 12
Processing epoch 13
Processing epoch 14
Processing epoch 15
Processing epoch 16
Processing epoch 17
Processing epoch 18
Processing epoch 19


In [10]:
print(f"The only using in-domain book data on SVD: {round(test_mae, 3)}")

The only using in-domain book data on SVD: 0.13
