## Content-Based Filtering Using Neural Networks

This notebook relies on files created in the [content_based_preproc.ipynb](./content_based_preproc.ipynb) notebook. Be sure to run the code in there before completing this notebook.  
Also, we'll be using the **python3** kernel from here on out so don't forget to change the kernel if it's still Python2.

This lab illustrates:
1. how to build feature columns for a model using tf.feature_column
2. how to create custom evaluation metrics and add them to Tensorboard
3. how to train a model and make predictions with the saved model

Tensorflow Hub should already be installed. You can check that it is by using "pip freeze".

In [None]:
%%bash
pip freeze | grep tensor

tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorboardcolab==0.0.22
tensorflow==2.3.0
tensorflow-addons==0.8.3
tensorflow-datasets==4.0.1
tensorflow-estimator==2.3.0
tensorflow-gcs-config==2.3.0
tensorflow-hub==0.10.0
tensorflow-metadata==0.25.0
tensorflow-privacy==0.2.2
tensorflow-probability==0.11.0


Let's make sure we install the necessary version of tensorflow-hub. After doing the pip install below, click **"Restart the kernel"** on the notebook so that the Python environment picks up the new packages.

In [None]:
!pip3 install tensorflow-hub==0.7.0
!pip3 install --upgrade tensorflow==1.15.3
#!pip3 install google-cloud-bigquery==1.10

In [None]:
import os
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
import shutil



### Build the feature columns for the model.

To start, we'll load the list of categories, authors and article ids we created in the previous **Create Datasets** notebook.

In [None]:
categories_list = open("categories.txt").read().splitlines()
print(categories_list)
authors_list = open("authors.txt").read().splitlines()
print(authors_list[:4])
content_ids_list = open("content_ids.txt").read().splitlines()
print(content_ids_list[:4])
mean_months_since_epoch = 523

['News', 'Stars & Kultur', 'Lifestyle']
['Marlene Patsalidis', 'Yvonne Widler', 'Thomas  Trescher', 'Johanna Hager']
['299965853', '299972248', '299410466', '299937546']


In [None]:
cols = ['visitor_id'	, 'content_id'	,'category'	, 'title', 	'author',	'months_since_epoch',	'next_content_id']

In [None]:
import pandas as pd
training_set = pd.read_csv('training_set.csv',names=cols)
training_set.head(2)

Unnamed: 0,visitor_id,content_id,category,title,author,months_since_epoch,next_content_id
0,1042795765758282508,299964154,News,Neue Seidenstraße: Ein chinesischer Keil in Eu...,Hermann Sileitsch-Parzer,574.0,299848776.0
1,1056627016396469139,299821418,Lifestyle,Missbrauchsvorwürfe gegen US-Wellnesskette,Elisabeth Mittendorfer,574.0,299852437.0


In [None]:
training_set.isnull().sum()

visitor_id                0
content_id                0
category               1441
title                     1
author                38378
months_since_epoch        1
next_content_id           1
dtype: int64

In [None]:
test_set = pd.read_csv('test_set.csv' , names= cols)
test_set.head(2)

Unnamed: 0,visitor_id,content_id,category,title,author,months_since_epoch,next_content_id
0,1041177675528771456,60544607,Stars & Kultur,Wien Museum: Hier parkt Bruno Kreiskys Rover,Barbara Mader,531,60546015
1,1389130989249956043,299830996,News,Wie die Schule in der Neuzeit ankommen könnte,Martina Salomon,574,299933565


In the cell below we'll define the feature columns to use in our model. If necessary, remind yourself the [various feature columns](https://www.tensorflow.org/api_docs/python/tf/feature_column) to use.  
For the embedded_title_column feature column, use a Tensorflow Hub Module to create an embedding of the article title. Since the articles and titles are in German, you'll want to use a German language embedding module.  
Explore the text embedding Tensorflow Hub modules [available here](https://alpha.tfhub.dev/). Filter by setting the language to 'German'. The 50 dimensional embedding should be sufficient for our purposes. 

In [None]:
embedded_title_column = hub.text_embedding_column(
    key="title", 
    module_spec="https://tfhub.dev/google/nnlm-de-dim50/1",
    trainable=False)

content_id_column = tf.feature_column.categorical_column_with_hash_bucket(
    key="content_id",
    hash_bucket_size= len(content_ids_list) + 1)


embedded_content_column = tf.feature_column.embedding_column(
    categorical_column=content_id_column,
    dimension=20)

author_column = tf.feature_column.categorical_column_with_hash_bucket(key="author",
    hash_bucket_size=len(authors_list) + 1)



embedded_author_column = tf.feature_column.embedding_column(
    categorical_column=author_column,
    dimension=10)

category_column_categorical = tf.feature_column.categorical_column_with_vocabulary_list(
    key="category",
    vocabulary_list=categories_list,
    num_oov_buckets=1)



category_column = tf.feature_column.indicator_column(category_column_categorical)

months_since_epoch_boundaries = list(range(400,700,20))
print("buckets : {}".format(months_since_epoch_boundaries))

months_since_epoch_column = tf.feature_column.numeric_column(
    key="months_since_epoch")

months_since_epoch_bucketized = tf.feature_column.bucketized_column(
    source_column = months_since_epoch_column,
    boundaries = months_since_epoch_boundaries)

crossed_months_since_category_column = tf.feature_column.indicator_column(tf.feature_column.crossed_column(
  keys = [category_column_categorical, months_since_epoch_bucketized], 
  hash_bucket_size = len(months_since_epoch_boundaries) * (len(categories_list) + 1)))

feature_columns = [embedded_content_column,
                   embedded_author_column,
                   category_column,
                   embedded_title_column,
                   crossed_months_since_category_column] 

buckets : [400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600, 620, 640, 660, 680]


### Create the input function.

Next we'll create the input function for our model. This input function reads the data from the csv files we created in the previous labs. 

In [None]:
record_defaults = [["Unknown"], ["Unknown"],["Unknown"],["Unknown"],["Unknown"],[mean_months_since_epoch],["Unknown"]]
column_keys = ["visitor_id", "content_id", "category", "title", "author", "months_since_epoch", "next_content_id"]
label_key = "next_content_id"
def read_dataset(filename, mode, batch_size = 1024):
  def _input_fn():
      def decode_csv(value_column):
          columns = tf.io.decode_csv(value_column,record_defaults=record_defaults)
          features = dict(zip(column_keys, columns))          
          label = features.pop(label_key)         
          return features, label

      # Create list of files that match pattern
      file_list = tf.io.gfile.glob(filename)

      # Create dataset from file list
      dataset = tf.data.TextLineDataset(file_list).map(decode_csv)

      if mode == tf.estimator.ModeKeys.TRAIN:
          num_epochs = None # indefinitely
          dataset = dataset.shuffle(buffer_size = 10 * batch_size)
      else:
          num_epochs = 1 # end-of-input after this

      dataset = dataset.repeat(num_epochs).batch(batch_size)
      return dataset.make_one_shot_iterator().get_next()
  return _input_fn

### Create the model and train/evaluate


Next, we'll build our model which recommends an article for a visitor to the Kurier.at website. Look through the code below. We use the input_layer feature column to create the dense input layer to our network. This is just a sigle layer network where we can adjust the number of hidden units as a parameter.

Currently, we compute the accuracy between our predicted 'next article' and the actual 'next article' read next by the visitor. We'll also add an additional performance metric of top 10 accuracy to assess our model. To accomplish this, we compute the top 10 accuracy metric, add it to the metrics dictionary below and add it to the tf.summary so that this value is reported to Tensorboard as well.

In [None]:
def model_fn(features, labels, mode, params):
  print("No of classes : ", params['n_classes'])
  net = tf.feature_column.input_layer(features, params['feature_columns'])
  for units in params['hidden_units']:
        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)
        print("layer : ",net)

  
   # Compute logits (1 per class).
  logits = tf.layers.dense(net, params['n_classes'], activation=None) 

  predicted_classes = tf.argmax(logits, 1)
  from tensorflow.python.lib.io import file_io
    
  with file_io.FileIO('content_ids.txt', mode='r') as ifp:
    content = tf.constant([x.rstrip() for x in ifp])
  predicted_class_names = tf.gather(content, predicted_classes)
  if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {
        'class_ids': predicted_classes[:, tf.newaxis],
        'class_names' : predicted_class_names[:, tf.newaxis],
        'probabilities': tf.nn.softmax(logits),
        'logits': logits,
    }
    return tf.estimator.EstimatorSpec(mode, predictions=predictions)
  table = tf.contrib.lookup.index_table_from_file(vocabulary_file="content_ids.txt")
  labels = table.lookup(labels)
  # Compute loss.
  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

  # Compute evaluation metrics.
  accuracy = tf.metrics.accuracy(labels=labels,
                                 predictions=predicted_classes,
                                 name='acc_op')
  top_10_accuracy = tf.metrics.mean(tf.nn.in_top_k(predictions=logits, 
                                                   targets=labels, 
                                                   k=10))
  
  metrics = {
    'accuracy': accuracy,
    'top_10_accuracy' : top_10_accuracy}
  
  tf.summary.scalar('accuracy', accuracy[1])
  tf.summary.scalar('top_10_accuracy', top_10_accuracy[1])

  if mode == tf.estimator.ModeKeys.EVAL:
      return tf.estimator.EstimatorSpec(
          mode, loss=loss, eval_metric_ops=metrics)

  # Create training op.
  assert mode == tf.estimator.ModeKeys.TRAIN

  optimizer = tf.train.AdagradOptimizer(learning_rate=0.01)
  train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
  return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

### Train and Evaluate

In [None]:
outdir = 'content_based_model_trained'
shutil.rmtree(outdir, ignore_errors = True) # start fresh each time
#tf.summary.FileWriterCache.clear() # ensure filewriter cache is clear for TensorBoard events file
estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    model_dir = outdir,
    params={
     'feature_columns': feature_columns,
      'hidden_units': [200, 100, 80, 30],
      'n_classes': len(content_ids_list)
    })

train_spec = tf.estimator.TrainSpec(
    input_fn = read_dataset("training_set.csv", tf.estimator.ModeKeys.TRAIN),
    max_steps = 2000)

eval_spec = tf.estimator.EvalSpec(
    input_fn = read_dataset("test_set.csv", tf.estimator.ModeKeys.EVAL),
    steps = None,
    start_delay_secs = 30,
    throttle_secs = 60)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

INFO:tensorflow:Using default config.


INFO:tensorflow:Using default config.


INFO:tensorflow:Using config: {'_model_dir': 'content_based_model_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f22c5daabe0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': 'content_based_model_trained', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f22c5daabe0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Not using Distribute Coordinator.


INFO:tensorflow:Not using Distribute Coordinator.


INFO:tensorflow:Running training and evaluation locally (non-distributed).


INFO:tensorflow:Running training and evaluation locally (non-distributed).


INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.


INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


No of classes :  15634
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


layer :  Tensor("dense/Relu:0", shape=(?, 200), dtype=float32)
layer :  Tensor("dense_1/Relu:0", shape=(?, 100), dtype=float32)
layer :  Tensor("dense_2/Relu:0", shape=(?, 80), dtype=float32)
layer :  Tensor("dense_3/Relu:0", shape=(?, 30), dtype=float32)
INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Saving checkpoints for 0 into content_based_model_trained/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into content_based_model_trained/model.ckpt.


INFO:tensorflow:loss = 9.657534, step = 1


INFO:tensorflow:loss = 9.657534, step = 1


INFO:tensorflow:global_step/sec: 3.79563


INFO:tensorflow:global_step/sec: 3.79563


INFO:tensorflow:loss = 9.627903, step = 101 (26.352 sec)


INFO:tensorflow:loss = 9.627903, step = 101 (26.352 sec)


INFO:tensorflow:global_step/sec: 3.7774


INFO:tensorflow:global_step/sec: 3.7774


INFO:tensorflow:loss = 9.5912075, step = 201 (26.476 sec)


INFO:tensorflow:loss = 9.5912075, step = 201 (26.476 sec)


INFO:tensorflow:global_step/sec: 3.75839


INFO:tensorflow:global_step/sec: 3.75839


INFO:tensorflow:loss = 9.572741, step = 301 (26.603 sec)


INFO:tensorflow:loss = 9.572741, step = 301 (26.603 sec)


INFO:tensorflow:global_step/sec: 3.75014


INFO:tensorflow:global_step/sec: 3.75014


INFO:tensorflow:loss = 9.4767475, step = 401 (26.663 sec)


INFO:tensorflow:loss = 9.4767475, step = 401 (26.663 sec)


INFO:tensorflow:global_step/sec: 3.8072


INFO:tensorflow:global_step/sec: 3.8072


INFO:tensorflow:loss = 9.119762, step = 501 (26.267 sec)


INFO:tensorflow:loss = 9.119762, step = 501 (26.267 sec)


INFO:tensorflow:global_step/sec: 3.79103


INFO:tensorflow:global_step/sec: 3.79103


INFO:tensorflow:loss = 5.284913, step = 601 (26.381 sec)


INFO:tensorflow:loss = 5.284913, step = 601 (26.381 sec)


INFO:tensorflow:global_step/sec: 3.78294


INFO:tensorflow:global_step/sec: 3.78294


INFO:tensorflow:loss = 6.030287, step = 701 (26.439 sec)


INFO:tensorflow:loss = 6.030287, step = 701 (26.439 sec)


INFO:tensorflow:global_step/sec: 3.81217


INFO:tensorflow:global_step/sec: 3.81217


INFO:tensorflow:loss = 4.8275557, step = 801 (26.224 sec)


INFO:tensorflow:loss = 4.8275557, step = 801 (26.224 sec)


INFO:tensorflow:global_step/sec: 3.79006


INFO:tensorflow:global_step/sec: 3.79006


INFO:tensorflow:loss = 4.4827695, step = 901 (26.388 sec)


INFO:tensorflow:loss = 4.4827695, step = 901 (26.388 sec)


INFO:tensorflow:global_step/sec: 3.76141


INFO:tensorflow:global_step/sec: 3.76141


INFO:tensorflow:loss = 5.041579, step = 1001 (26.586 sec)


INFO:tensorflow:loss = 5.041579, step = 1001 (26.586 sec)


INFO:tensorflow:global_step/sec: 3.76801


INFO:tensorflow:global_step/sec: 3.76801


INFO:tensorflow:loss = 4.816439, step = 1101 (26.539 sec)


INFO:tensorflow:loss = 4.816439, step = 1101 (26.539 sec)


INFO:tensorflow:global_step/sec: 3.77174


INFO:tensorflow:global_step/sec: 3.77174


INFO:tensorflow:loss = 5.854385, step = 1201 (26.513 sec)


INFO:tensorflow:loss = 5.854385, step = 1201 (26.513 sec)


INFO:tensorflow:global_step/sec: 3.82562


INFO:tensorflow:global_step/sec: 3.82562


INFO:tensorflow:loss = 4.939856, step = 1301 (26.140 sec)


INFO:tensorflow:loss = 4.939856, step = 1301 (26.140 sec)


INFO:tensorflow:global_step/sec: 3.82434


INFO:tensorflow:global_step/sec: 3.82434


INFO:tensorflow:loss = 5.720622, step = 1401 (26.145 sec)


INFO:tensorflow:loss = 5.720622, step = 1401 (26.145 sec)


INFO:tensorflow:global_step/sec: 3.82544


INFO:tensorflow:global_step/sec: 3.82544


INFO:tensorflow:loss = 4.9620523, step = 1501 (26.143 sec)


INFO:tensorflow:loss = 4.9620523, step = 1501 (26.143 sec)


INFO:tensorflow:global_step/sec: 3.80688


INFO:tensorflow:global_step/sec: 3.80688


INFO:tensorflow:loss = 5.863186, step = 1601 (26.269 sec)


INFO:tensorflow:loss = 5.863186, step = 1601 (26.269 sec)


INFO:tensorflow:global_step/sec: 3.83157


INFO:tensorflow:global_step/sec: 3.83157


INFO:tensorflow:loss = 4.774455, step = 1701 (26.096 sec)


INFO:tensorflow:loss = 4.774455, step = 1701 (26.096 sec)


INFO:tensorflow:global_step/sec: 3.83889


INFO:tensorflow:global_step/sec: 3.83889


INFO:tensorflow:loss = 4.6015186, step = 1801 (26.053 sec)


INFO:tensorflow:loss = 4.6015186, step = 1801 (26.053 sec)


INFO:tensorflow:global_step/sec: 3.78314


INFO:tensorflow:global_step/sec: 3.78314


INFO:tensorflow:loss = 4.974956, step = 1901 (26.434 sec)


INFO:tensorflow:loss = 4.974956, step = 1901 (26.434 sec)


INFO:tensorflow:Saving checkpoints for 2000 into content_based_model_trained/model.ckpt.


INFO:tensorflow:Saving checkpoints for 2000 into content_based_model_trained/model.ckpt.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


No of classes :  15634
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


layer :  Tensor("dense/Relu:0", shape=(?, 200), dtype=float32)
layer :  Tensor("dense_1/Relu:0", shape=(?, 100), dtype=float32)
layer :  Tensor("dense_2/Relu:0", shape=(?, 80), dtype=float32)
layer :  Tensor("dense_3/Relu:0", shape=(?, 30), dtype=float32)
INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2020-12-04T22:50:28Z


INFO:tensorflow:Starting evaluation at 2020-12-04T22:50:28Z


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Finished evaluation at 2020-12-04-22:50:34


INFO:tensorflow:Finished evaluation at 2020-12-04-22:50:34


INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.02785265, global_step = 2000, loss = 5.3432593, top_10_accuracy = 0.21770382


INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.02785265, global_step = 2000, loss = 5.3432593, top_10_accuracy = 0.21770382


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 2000: content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 2000: content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Loss for final step: 4.78855.


INFO:tensorflow:Loss for final step: 4.78855.


({'accuracy': 0.02785265,
  'global_step': 2000,
  'loss': 5.3432593,
  'top_10_accuracy': 0.21770382},
 [])

This takes a while to complete but in the end, I get about **30% top 10 accuracy**.

### Make predictions with the trained model. 

With the model now trained, we can make predictions by calling the predict method on the estimator. Let's look at how our model predicts on the first five examples of the training set.  
To start, we'll create a new file 'first_5.csv' which contains the first five elements of our training set. We'll also save the target values to a file 'first_5_content_ids' so we can compare our results. 

In [None]:
%%bash
head -30 training_set.csv > first_30.csv
head first_30.csv
awk -F "\"*,\"*" '{print $2}' first_30.csv > first_30_content_ids

1042795765758282508,299964154,News,Neue Seidenstraße: Ein chinesischer Keil in Europa,Hermann Sileitsch-Parzer,574,299848776
1056627016396469139,299821418,Lifestyle,Missbrauchsvorwürfe gegen US-Wellnesskette,Elisabeth Mittendorfer,574,299852437
106377193142039719,299912085,News,Erster ÖBB-Containerzug nach China unterwegs,Stefan Hofer,574,299910994
106377193142039719,299813480,Lifestyle,Alice Schwarzer: Periode des Rückschlags für Frauen,Elisabeth Mittendorfer,574,299939900
106377193142039719,299918278,News,Skipässe in Wintersport-Hochburgen massiv teurer,Stefan Hofer,574,299853016
106377193142039719,299918253,News,Ringen: Iraner musste verlieren um Duell mit Israeli zu vermeiden,Mirad Odobasic,574,299913879
106377193142039719,299913879,News,Python erwürgte thailändischen Besitzer,,574,299907204
106377193142039719,299902870,News,RAF-Terroristin bittet Schleyer-Familie um Verzeihung,Stefan Hofer,574,299935287
106377193142039719,299444828,Lifestyle,RunNa: Läufst du noch oder kotzt du sch

Recall, to make predictions on the trained model we pass a list of examples through the input function. Complete the code below to make predicitons on the examples contained in the "first_5.csv" file we created above. 

In [None]:
output = list(estimator.predict(input_fn=read_dataset("first_30.csv", tf.estimator.ModeKeys.PREDICT)))

INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


No of classes :  15634
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


layer :  Tensor("dense/Relu:0", shape=(?, 200), dtype=float32)
layer :  Tensor("dense_1/Relu:0", shape=(?, 100), dtype=float32)
layer :  Tensor("dense_2/Relu:0", shape=(?, 80), dtype=float32)
layer :  Tensor("dense_3/Relu:0", shape=(?, 30), dtype=float32)
INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Restoring parameters from content_based_model_trained/model.ckpt-2000


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


In [None]:
import numpy as np
recommended_content_ids = [np.asscalar(d["class_names"]).decode('UTF-8') for d in output]
content_ids = open("first_30_content_ids").read().splitlines()

  


In [None]:
print(recommended_content_ids)

['299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775', '299826775']


Finally, we map the content id back to the article title. Let's compare our model's recommendation for the first example. This can be done in BigQuery. Look through the query below and make sure it is clear what is being returned.

In [None]:
from google.cloud import bigquery
recommended_title_sql="""
#standardSQL
SELECT
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS title
FROM `cloud-training-demos.GA360_test.ga_sessions_sample`,   
  UNNEST(hits) AS hits
WHERE 
  # only include hits on pages
  hits.type = "PAGE"
  AND (SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = \"{}\"
LIMIT 1""".format(recommended_content_ids[0])

current_title_sql="""
#standardSQL
SELECT
(SELECT MAX(IF(index=6, value, NULL)) FROM UNNEST(hits.customDimensions)) AS title
FROM `cloud-training-demos.GA360_test.ga_sessions_sample`,   
  UNNEST(hits) AS hits
WHERE 
  # only include hits on pages
  hits.type = "PAGE"
  AND (SELECT MAX(IF(index=10, value, NULL)) FROM UNNEST(hits.customDimensions)) = \"{}\"
LIMIT 1""".format(content_ids[0])
recommended_title = bigquery.Client().query(recommended_title_sql).to_dataframe()['title'].tolist()[0].encode('utf-8').strip()
current_title = bigquery.Client().query(current_title_sql).to_dataframe()['title'].tolist()[0].encode('utf-8').strip()
print("Current title: {} ".format(current_title))
print("Recommended title: {}".format(recommended_title))