# Extract User and Item Latent Factors Using Neural Collaborative Filtering

Here we train neural collaborative filtering (NCF) model to predict the ratings given the user_ids and item_ids. The neural collaborative filtering model consists of Generalized Matrix Factorization (GMF) stream and Multi-Layer Perceptron (MLP) stream, which represents the matrix factorization and the non-linear relation of the user embedding and the item embedding. The neural collaborative filtering model fuses GMF stream and MLP stream and is able to predict the ratings given the ids of the user and the item. After training, the user and item embeddings from GMF and MLP stream are treated as user and item latent factors. The latent factors are stored into "user_latent.csv" and "item_latent.csv" in GCP bucket.

---
The implementation is related to the following paper:
- [Neural Collaborative Filtering](https://arxiv.org/abs/1708.05031)

## 1. import libraries

In [1]:
# import libraries
import os
import pandas as pd
import tensorflow as tf
from google.cloud import bigquery

In [2]:
# set constants
PROJECT = "hybrid-recsys-gcp"
BUCKET = "hybrid-recsys-gcp-bucket"
REGION = 'us-central1'
DATASET = 'news_recommend_dataset'
MODEL = "neural_collaborate_filter_trained_model"

os.environ["PROJECT"] = PROJECT
os.environ["BUCKET"] = BUCKET
os.environ["REGION"] = REGION
os.environ["DATASET"] = DATASET
os.environ["MODEL"] = MODEL

## 2. create neural_collaborate_filter package

This is the package for the neural colaborative filtering model. The trainer/task.py defines the input arguments for training. In trainer/model.py, the "create_dataset" function creates tf.dataset for input user and item ids. The "NeuMF_Model" class defines the tf.keras.Model which takes user and item id to predict rating. And the "train_model_and_save_latent_factors" function defines the custom training loop for training the model.

The NeuMF_Model uses the following architecture:

<img src="./img/neural_CF.png" width="65%" height="65%" />

In [3]:
%%bash
mkdir -p neural_collaborate_filter/trainer
touch neural_collaborate_filter/trainer/__init__.py

In [4]:
%%writefile neural_collaborate_filter/trainer/task.py

import argparse
import tensorflow as tf
from trainer import model

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--job-dir",
        help="job dir to store training outputs and other data",
        required=True
    )
    
    parser.add_argument(
        "--train_data_path",
        help="path to import train data",
        required=True
    )
    
    parser.add_argument(
        "--test_data_path",
        help="path to import test data",
        required=True
    )
    
    parser.add_argument(
        "--output_dir",
        help="output dir to export checkpoints or trained model",
        required=True
    )
    
    parser.add_argument(
        "--batch_size",
        help="batch size for training",
        type=int,
        default=2048
    )
    
    parser.add_argument(
        "--epochs",
        help="number of epochs for training",
        type=int,
        default=1
    )
    
    parser.add_argument(
        "--latent_num",
        help="number of latent factors for gmf and mlp each",
        type=int,
        default=8
    )
    
    parser.add_argument(
        "--user_id_path",
        help="path to import user_id_list.txt",
        required=True
    )
    
    parser.add_argument(
        "--item_id_path",
        help="path to import item_id_list.txt",
        required=True
    )
    
    parser.add_argument(
        "--user_latent_path",
        help="output path to save user latent factors",
        default="./"
    )
    
    parser.add_argument(
        "--item_latent_path",
        help="output path to save item latent factors",
        default="./"
    )
    
    parser.add_argument(
        "--save_latent_factors",
        help="set to save latent factors",
        default=False,
        action="store_true"
    )
    
    args = parser.parse_args()
    args = args.__dict__
    
    model.train_model_and_save_latent_factors(args)

Writing neural_collaborate_filter/trainer/task.py


In [5]:
%%writefile neural_collaborate_filter/trainer/model.py

import os
import pandas as pd
import tensorflow as tf
import shutil
import datetime

# create dataset function
def create_dataset(path, column_names, label_name, defaults, batch_size, shuffle):
    """ Create tf.dataset from csv file.
    
    Args:
        path (str): Path to the csv file.
        column_names (list:str): List of string to specify which columns to use in dataset (including label).
        label_name (str): Column name for the label.
        defaults (list:str): List of string to set default values for columns.
        batch_size (str): Batchsize of the dataset.
        shuffle (bool): True for shuffling dataset and False otherwise.

    Returns:
        (tf.dataset): dataset used for training or testing
    """
    dataset = tf.data.experimental.make_csv_dataset(
        file_pattern=path,
        select_columns=column_names,
        label_name=label_name,
        column_defaults = defaults,
        batch_size=batch_size,
        num_epochs=1,
        shuffle=shuffle
    )
    return dataset

# model class
class NeuMF_Model(tf.keras.Model):
    """ The NeuMF_Model class. Takes user ids and item ids to predict ratings through matrix factorization and 
    multilayer perceptron. The user and item embeddings are user and item latent factors.

    Attributes:
        user_index_table (tf.lookup.StaticVocabularyTable): Table to transform user_id to user indices.
        item_index_table (tf.lookup.StaticVocabularyTable): Table to transform item_id to item indices.
        
        gmf_u_embed (tf.keras.layers.Embedding): User embedding layer for General Matrix Factorization.
        gmf_i_embed (tf.keras.layers.Embedding): Item embedding layer for General Matrix Factorization.
        mlp_u_embed (tf.keras.layers.Embedding): User embedding layer for Multilayer Perceptron.
        mlp_i_embed (tf.keras.layers.Embedding): Item embedding layer for Multilayer Perceptron.
        
        dense_1 (tf.keras.layers.Dense): First dense layer of multilayer perceptron stream.
        dense_2 (tf.keras.layers.Dense): Second dense layer of multilayer perceptron stream.
        dense_3 (tf.keras.layers.Dense): Third dense layer of multilayer perceptron stream.
        dense_4 (tf.keras.layers.Dense): Fourth dense layer of multilayer perceptron stream.
        output_layer (tf.keras.layers.Dense): Output dense layer for concat of mlp and gmf stream.
        
        mlp_concat (tf.keras.layers.Concatenate): Concatenate layer to combine user and item embedding in mlp stream.
        stream_concat (tf.keras.layers.Concatenate): Concatenate layer to combine mlp and gmf stream. 
    """
    def __init__(self, user_file_path, item_file_path, latent_num):
        """ init method for NeuMF_Model class
        
        Args:
            user_file_path (str): Path to txt file containing user ids.
            item_file_path (str): Path to txt file containing item ids.
            
        Returns:
            None
        """
        super(NeuMF_Model, self).__init__()
        self.user_index_table = self.create_lookup_table(user_file_path)
        self.item_index_table = self.create_lookup_table(item_file_path)
        
        user_id_size = self.get_size(user_file_path)
        item_id_size = self.get_size(item_file_path)
        self.gmf_u_embed = tf.keras.layers.Embedding(user_id_size, latent_num, name='gmf_u_embed')
        self.gmf_i_embed = tf.keras.layers.Embedding(item_id_size, latent_num, name='gmf_i_embed')
        self.mlp_u_embed = tf.keras.layers.Embedding(user_id_size, latent_num, name='mlp_u_embed')
        self.mlp_i_embed = tf.keras.layers.Embedding(item_id_size, latent_num, name='mlp_i_embed')

        self.dense_1 = tf.keras.layers.Dense(64, activation='relu', name='mlp_dense_1')
        self.dense_2 = tf.keras.layers.Dense(32, activation='relu', name='mlp_dense_2')
        self.dense_3 = tf.keras.layers.Dense(16, activation='relu', name='mlp_dense_3')
        self.dense_4 = tf.keras.layers.Dense(8, activation='relu', name='mlp_dense_4')
        self.output_layer = tf.keras.layers.Dense(1, activation='sigmoid', name='output_layer')
        
        self.mlp_concat = tf.keras.layers.Concatenate(axis=1, name='mlp_concat')
        self.stream_concat = tf.keras.layers.Concatenate(axis=1, name='stream_concat')

    def create_lookup_table(self, file_path):
        """ create lookup table to translate ids to indices
        
        Args:
            file_path (str): Path to txt file containing ids.
            
        Returns:
            (tf.lookup.StaticVocabularyTable): The lookup table.
        """
        file_initializer = tf.lookup.TextFileInitializer(file_path, key_dtype=tf.string, key_index=tf.lookup.TextFileIndex.WHOLE_LINE, \
                            value_dtype=tf.int64, value_index=tf.lookup.TextFileIndex.LINE_NUMBER, delimiter="\n")
        lookup_table = tf.lookup.StaticVocabularyTable(file_initializer, num_oov_buckets=1)
        return lookup_table
    
    def get_size(self, file_path):
        """ Get the total number of lines for a txt file, indicating the size of the column.
        
        Args:
            file_path (str): Path to txt file.
            
        Returns:
            (int): The total number of lines for the txt file.
        """
        id_text = tf.io.read_file(file_path)
        id_tensor = tf.strings.split(id_text, '\n')
        return id_tensor.shape[0]

    @tf.function
    def call(self, inputs, training):
        """The call method for NeuMF_Model class.

        Args:
            inputs (OrderedDict:tf.Tensor): OrderedDict of tensor containing user_id and item_id
        
        Returns:
            output (tf.Tensor): The predicted rating for the user and item combination.
        """
        user_id = inputs['user_id']
        item_id = inputs['item_id']

        # convert id to index
        user_index = self.user_index_table.lookup(user_id)
        item_index = self.item_index_table.lookup(item_id)

        # GMF stream
        gmf_u_latent = self.gmf_u_embed(user_index)
        gmf_i_latent = self.gmf_i_embed(item_index)

        # multiply latent factors
        gmf_out = gmf_u_latent * gmf_i_latent

        # MLP stream
        mlp_u_latent = self.mlp_u_embed(user_index)
        mlp_i_latent = self.mlp_i_embed(item_index)

        # concat latent factors and pass to dense layers
        mlp_concat_out = self.mlp_concat([mlp_u_latent, mlp_i_latent])
        dense_1_out = self.dense_1(mlp_concat_out)
        dense_2_out = self.dense_2(dense_1_out)
        dense_3_out = self.dense_3(dense_2_out)
        mlp_out = self.dense_4(dense_3_out)

        # concat GMF and MLP stream
        stream_concat_out = self.stream_concat([gmf_out, mlp_out])

        output = self.output_layer(stream_concat_out)
        return output
    
    
def save_latent_factors_to_bucket(col_name, col_path, tensor_weight, output_path, latent_num):
    """Store tensor weights as user or item latent factors to bucket in csv file.

    Args:
        col_name (str): Column name for the latent factors (user or item).
        col_path (str): Path to user or item ids.
        tensor_weight (tf.Tensor): Tensors of the embedding layer used as latent factors.
        output_path (str): Path to ouput file in bucket.
        latent_num (int): Number of latent factors

    Returns:
        None
    """
    id_tensors = tf.strings.split(tf.io.read_file(col_path), "\n")
    id_list = [tf.compat.as_str_any(x) for x in id_tensors.numpy()]

    latent_df = pd.DataFrame(tensor_weight)
    latent_df[col_name] = id_list

    key = range(latent_num * 2)
    value = ['{}_latent_'.format(col_name[0]) + str(x) for x in key]
    column_dict = dict(zip(key, value))
    
    latent_df = latent_df.rename(columns=column_dict)
    latent_df = latent_df[[col_name] + value]
    latent_df.to_csv("./latent.csv", index=False)
    
    script = "gsutil mv ./latent.csv {}".format(output_path)
    os.system(script)
    
    
def train_model_and_save_latent_factors(args):
    """ Train the NeuMF_Model and save embeddings as latent_factors to bucket in csv files.

    Args:
        args (dict): dict of arguments from task.py

    Returns:
        None
    """
    # create dataset
    column_name = ['user_id', 'item_id', 'rating']
    label_name = 'rating'
    defaults = ['unknown', 'unknown', 0.0]
    batch_size = args["batch_size"]
    train_path = args["train_data_path"]
    test_path = args["test_data_path"]
    
    train_dataset = create_dataset(train_path, column_name, label_name, defaults, batch_size, True)
    test_dataset = create_dataset(test_path, column_name, label_name, defaults, batch_size, False)
    
    # create model
    model = NeuMF_Model(args["user_id_path"], args["item_id_path"], args["latent_num"])
    
    # loss function and optimizers
    bc_loss_object = tf.keras.losses.BinaryCrossentropy()
    optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001)
    
    # loss metrics
    train_bc_loss = tf.keras.metrics.Mean(name='train_bc_loss')
    train_mae_loss = tf.keras.metrics.MeanAbsoluteError(name='train_mae_loss')
    train_rmse_loss = tf.keras.metrics.RootMeanSquaredError(name='train_rmse_loss')

    test_bc_loss = tf.keras.metrics.Mean(name='test_bc_loss')
    test_mae_loss = tf.keras.metrics.MeanAbsoluteError(name='test_mae_loss')
    test_rmse_loss = tf.keras.metrics.RootMeanSquaredError(name='test_rmse_loss')
    
    
    @tf.function
    def train_step(features, labels):
        """ Concrete function for train setp and update train metircs

        Args:
            features (OrderedDict:tf.Tensor): OrderedDict of tensor containing user_id and item_id as features.
            labels (tf.Tensor): labels (rating) of the training examples
            
        Returns:
            None
        """
        with tf.GradientTape() as tape:
            preds = model(features, training=True)
            bc_loss = bc_loss_object(labels, preds)
        gradients = tape.gradient(bc_loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        train_bc_loss(bc_loss)
        train_mae_loss(labels, preds)
        train_rmse_loss(labels, preds)
    
    @tf.function
    def test_step(features, labels):
        """ Concrete function for test setp and update test metircs

        Args:
            features (OrderedDict:tf.Tensor): OrderedDict of tensor containing user_id and item_id as features.
            labels (tf.Tensor): labels (rating) of the training examples
            
        Returns:
            None
        """
        preds = model(features, training=False)
        bc_loss = bc_loss_object(labels, preds)
        test_bc_loss(bc_loss)
        test_mae_loss(labels, preds)
        test_rmse_loss(labels, preds)
    
    # custom train loop
    EPOCHS = args["epochs"]
    for epoch in range(EPOCHS):
        train_bc_loss.reset_states()
        train_mae_loss.reset_states()
        train_rmse_loss.reset_states()

        test_bc_loss.reset_states()
        test_mae_loss.reset_states()
        test_rmse_loss.reset_states()

        for features, labels in train_dataset:
            train_step(features, labels)

        for features, labels in test_dataset:
            test_step(features, labels)

        template = "Epoch {:d}, train [bc_loss: {:.5f}, mae_loss: {:.5f}, rmse_loss: {:.5f}], test [bc_loss: {:.5f}, mae_loss: {:.5f}, rmse_loss: {:.5f}]"
        print(template.format(epoch + 1, train_bc_loss.result(), train_mae_loss.result(), train_rmse_loss.result(), \
                                test_bc_loss.result(), test_mae_loss.result(), test_rmse_loss.result()))
        
    # export model
    EXPORT_PATH = os.path.join(args["output_dir"], datetime.datetime.now().strftime("%Y%m%d%H%M%S"))
    tf.saved_model.save(obj=model, export_dir=EXPORT_PATH)
    
    if args["save_latent_factors"]:
        # get embedding weights
        user_weight = tf.concat([model.gmf_u_embed.get_weights()[0], model.mlp_u_embed.get_weights()[0]], axis = 1).numpy()
        item_weight = tf.concat([model.gmf_i_embed.get_weights()[0], model.mlp_i_embed.get_weights()[0]], axis = 1).numpy()

        # store embedding weights to csv
        save_latent_factors_to_bucket('user_id', args["user_id_path"], user_weight, args["user_latent_path"], args["latent_num"])
        save_latent_factors_to_bucket('item_id', args["item_id_path"], item_weight, args["item_latent_path"], args["latent_num"])

Writing neural_collaborate_filter/trainer/model.py


## 3. train model locally

Run package as a python module in local environment.

In [6]:
%%bash

JOBDIR=./${MODEL}
OUTDIR=./${MODEL}

rm -rf ${JOBDIR}
export PYTHONPATH=${PYTHONPATH}:${PWD}/neural_collaborate_filter

python -m trainer.task \
    --job-dir=${JOBDIR} \
    --train_data_path=gs://${BUCKET}/${DATASET}/preprocess_train.csv \
    --test_data_path=gs://${BUCKET}/${DATASET}/preprocess_test.csv \
    --output_dir=${OUTDIR} \
    --batch_size=2048 \
    --epochs=8 \
    --latent_num=10 \
    --user_id_path=gs://${BUCKET}/${DATASET}/user_id_list.txt \
    --item_id_path=gs://${BUCKET}/${DATASET}/item_id_list.txt

Epoch 1, train [bc_loss: 0.68107, mae_loss: 0.32247, rmse_loss: 0.35924], test [bc_loss: 0.65923, mae_loss: 0.30864, rmse_loss: 0.34384]
Epoch 2, train [bc_loss: 0.63461, mae_loss: 0.28303, rmse_loss: 0.32761], test [bc_loss: 0.63296, mae_loss: 0.26911, rmse_loss: 0.32561]
Epoch 3, train [bc_loss: 0.59234, mae_loss: 0.24501, rmse_loss: 0.29762], test [bc_loss: 0.62794, mae_loss: 0.25887, rmse_loss: 0.32119]
Epoch 4, train [bc_loss: 0.57712, mae_loss: 0.23196, rmse_loss: 0.28702], test [bc_loss: 0.62337, mae_loss: 0.25424, rmse_loss: 0.31821]
Epoch 5, train [bc_loss: 0.56766, mae_loss: 0.22499, rmse_loss: 0.28077], test [bc_loss: 0.61896, mae_loss: 0.25087, rmse_loss: 0.31504]
Epoch 6, train [bc_loss: 0.55217, mae_loss: 0.21330, rmse_loss: 0.26984], test [bc_loss: 0.61631, mae_loss: 0.24596, rmse_loss: 0.31236]
Epoch 7, train [bc_loss: 0.53069, mae_loss: 0.19702, rmse_loss: 0.25319], test [bc_loss: 0.61959, mae_loss: 0.24323, rmse_loss: 0.31293]
Epoch 8, train [bc_loss: 0.50990, mae_los

2020-08-16 17:19:47.948743: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-08-16 17:19:47.949785: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c95a30de90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-16 17:19:47.949822: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-16 17:19:47.950096: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2020-08-16 17:20:37.281640: W tensorflow/python/util/util.cc:319] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.


## 4. train model on gcloud

Submit a training job in gcloud ai-platform to train the package.

In [7]:
%%bash

JOBDIR=gs://${BUCKET}/${MODEL}
OUTDIR=gs://${BUCKET}/${MODEL}
JOBID=neural_collaborate_filter_train_job_$(date -u +%y%m%d_%H%M%S)

gcloud ai-platform jobs submit training ${JOBID} \
    --region=${REGION} \
    --module-name=trainer.task \
    --package-path=$(pwd)/neural_collaborate_filter/trainer \
    --staging-bucket=gs://${BUCKET} \
    --scale-tier=CUSTOM \
    --master-machine-type=n1-highcpu-16 \
    --runtime-version=2.1 \
    --python-version=3.7 \
    -- \
    --job-dir=${JOBDIR} \
    --train_data_path=gs://${BUCKET}/${DATASET}/preprocess_train.csv \
    --test_data_path=gs://${BUCKET}/${DATASET}/preprocess_test.csv \
    --output_dir=${OUTDIR} \
    --batch_size=2048 \
    --epochs=8 \
    --latent_num=10 \
    --user_id_path=gs://${BUCKET}/${DATASET}/user_id_list.txt \
    --item_id_path=gs://${BUCKET}/${DATASET}/item_id_list.txt \
    --user_latent_path=gs://${BUCKET}/${DATASET}/user_latent.csv \
    --item_latent_path=gs://${BUCKET}/${DATASET}/item_latent.csv \
    --save_latent_factors

jobId: neural_collaborate_filter_train_job_200816_172050
state: QUEUED


Job [neural_collaborate_filter_train_job_200816_172050] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe neural_collaborate_filter_train_job_200816_172050

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs neural_collaborate_filter_train_job_200816_172050


The ai-platform training log should look like the following. The final test result is: bc_loss: 0.60900, mae_loss: 0.24119, rmse_loss: 0.30784.

<img src="./img/ncf_train_log.png" width="80%" height="80%" />

## 5. save latent factors in bigquery dataset

Save user and item laten factors into "user_latent.csv" and "item_latent.csv" in GCS bucket.

In [8]:
def load_csv_to_bigquery_table(project_id, dataset_id, table_id, schema, soruce_uri):
    """ Load content from csv file into bigquery table.

    Args:
        project_id (str): ID of the project.
        dataset_id (str): ID of the dataset.
        table_id (str): ID of the table.
        schema (list:bigquery.SchemaField): Schema of the csv file.
        soruce_uri (str): Path to the csv file.

    Returns:
        None
    """
    client = bigquery.Client(project_id)
    dataset_ref = client.dataset(dataset_id)
    
    job_config = bigquery.LoadJobConfig()
    job_config.schema = schema
    job_config.write_disposition = bigquery.WriteDisposition.WRITE_EMPTY
    job_config.skip_leading_rows = 1
    job_config.source_format = bigquery.SourceFormat.CSV
    
    load_job = client.load_table_from_uri(soruce_uri, dataset_ref.table(table_id), job_config=job_config)
    print("Starting job {}".format(load_job.job_id))

    load_job.result()  # Waits for table load to complete.
    print("Job finished.")

    destination_table = client.get_table(dataset_ref.table(table_id))
    print("Loaded {} rows.".format(destination_table.num_rows))

In [9]:
latent_num = 10

user_schema = [bigquery.SchemaField("user_id", "STRING", mode="REQUIRED")] + \
            [bigquery.SchemaField("user_latent_{}".format(i), "FLOAT", mode="REQUIRED") for i in range(2 * latent_num)]

item_schema = [bigquery.SchemaField("item_id", "STRING", mode="REQUIRED")] + \
            [bigquery.SchemaField("item_latent_{}".format(i), "FLOAT", mode="REQUIRED") for i in range(2 * latent_num)]

load_csv_to_bigquery_table(PROJECT, DATASET, "user_latent", user_schema, "gs://{}/{}/user_latent.csv".format(BUCKET, DATASET))
load_csv_to_bigquery_table(PROJECT, DATASET, "item_latent", item_schema, "gs://{}/{}/item_latent.csv".format(BUCKET, DATASET))

Starting job 93be1920-f263-4a6c-9b3e-6fc7a750e100
Job finished.
Loaded 16313 rows.
Starting job ef578c38-4c96-4f9c-a70f-985cce1c30fe
Job finished.
Loaded 2421 rows.


## 6. view user and item latent factors

In [10]:
!gsutil cp gs://{BUCKET}/{DATASET}/user_latent.csv ./{DATASET}/user_latent.csv
!gsutil cp gs://{BUCKET}/{DATASET}/item_latent.csv ./{DATASET}/item_latent.csv

Copying gs://hybrid-recsys-gcp-bucket/news_recommend_dataset/user_latent.csv...
/ [1 files][  4.1 MiB/  4.1 MiB]                                                
Operation completed over 1 objects/4.1 MiB.                                      
Copying gs://hybrid-recsys-gcp-bucket/news_recommend_dataset/item_latent.csv...
/ [1 files][594.3 KiB/594.3 KiB]                                                
Operation completed over 1 objects/594.3 KiB.                                    


In [11]:
user_df = pd.read_csv("./{}/user_latent.csv".format(DATASET))
item_df = pd.read_csv("./{}/item_latent.csv".format(DATASET))

In [12]:
user_df.head()

Unnamed: 0,user_id,u_latent_0,u_latent_1,u_latent_2,u_latent_3,u_latent_4,u_latent_5,u_latent_6,u_latent_7,u_latent_8,...,u_latent_10,u_latent_11,u_latent_12,u_latent_13,u_latent_14,u_latent_15,u_latent_16,u_latent_17,u_latent_18,u_latent_19
0,1000163602560555666,-0.22389,0.013522,-0.197976,0.217497,-0.053681,-0.00672,-0.117004,0.113265,0.187496,...,-0.043796,-0.084263,-0.035453,0.039801,-0.007067,0.01638,-0.042641,-0.033181,0.028918,0.054818
1,1000196974485173657,-0.033306,0.020547,0.104502,-0.003414,0.063732,0.086023,-0.06237,0.030699,-0.115149,...,-0.034464,-0.007665,-0.09231,-0.005157,0.033543,-0.067541,-0.027155,0.054411,0.026142,0.006107
2,1002090131595000997,-0.179897,-0.139295,0.073862,-0.047588,0.047952,-0.000489,0.117391,0.058213,-0.077938,...,-0.012818,-0.001946,0.033016,0.063178,0.066878,-0.069696,-0.025472,-0.085891,0.034603,0.034044
3,1002109532017576768,-0.079408,-0.174885,0.014121,-0.081578,0.140167,-0.137453,0.088288,0.162533,-0.106551,...,0.016511,0.027259,-0.065256,-0.02304,-0.099605,0.058657,0.018901,0.037335,-0.014985,-0.017714
4,1004209053768679755,-0.000192,-0.134218,0.076557,-0.169822,-0.072396,0.000815,-0.026878,-0.070867,0.092746,...,0.002242,0.044778,0.013842,0.02391,0.012564,0.0512,0.04679,0.003449,-0.023228,0.038247


In [13]:
item_df.head()

Unnamed: 0,item_id,i_latent_0,i_latent_1,i_latent_2,i_latent_3,i_latent_4,i_latent_5,i_latent_6,i_latent_7,i_latent_8,...,i_latent_10,i_latent_11,i_latent_12,i_latent_13,i_latent_14,i_latent_15,i_latent_16,i_latent_17,i_latent_18,i_latent_19
0,100170790,-0.044401,-0.054478,-0.024215,-0.095297,0.030977,-0.051534,-0.087727,0.066595,-0.116718,...,0.027045,-0.022293,-0.038569,0.042152,-0.046344,0.025076,-0.079884,-0.023431,0.031445,-0.052044
1,100292889,0.044174,0.018957,-0.020329,0.005043,-0.066686,-0.046977,-0.011907,0.023122,-0.024344,...,0.048059,-0.057492,-0.052943,0.007809,-0.061653,0.076567,0.009918,-0.028304,0.027838,-0.07518
2,100735153,0.004435,-0.092585,-0.101787,-0.067878,0.077632,0.000198,-0.068222,0.012467,-0.053971,...,-0.001989,0.011519,0.056933,-0.082168,0.075539,-0.025724,0.038086,0.067758,-0.04377,-0.009335
3,100915139,0.009406,0.015461,-0.031027,-0.006515,0.015776,-0.004458,0.006125,-0.020394,-0.046054,...,0.044172,0.006846,-0.025532,-0.00883,-0.017709,0.037827,0.035382,0.00554,-0.034189,-0.043504
4,101092112,-0.063698,0.059048,0.049322,-0.023419,0.039215,0.03699,0.013302,-0.031852,-0.001982,...,0.043176,-0.017732,-0.049217,0.024714,0.000673,0.061918,0.006136,-0.008226,0.010462,0.009586
