# Tutorial: Loganary-Ranking for sparse features


### ANTIQUE: A Question Answering Dataset

Please see [handling_sparse_features.ipynb](https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/handling_sparse_features.ipynb).


Download training, test data and vocabulary file.

In [None]:
!wget -O "/tmp/vocab.txt" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/vocab.txt"
!wget -O "/tmp/train.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/ELWC/train.tfrecords"
!wget -O "/tmp/test.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking//ELWC/test.tfrecords"

Download and install the TensorFlow Ranking and TensorFlow Serving packages.

In [None]:
!pip install -q tensorflow_ranking tensorflow-serving-api loganary-ranking

Let us start by importing libraries that will be used throughout this Notebook.

In [None]:
import tensorflow as tf
import tensorflow_ranking as tfr
from loganary.ranking.model import (
    RankingModel,
    RankingModelConfig,
    RankingModelEmbeddingField,
    RankingModelField,
    get_ndcg_metric,
)

## Configuration for Training

Here we define the train and test paths, along with model hyperparameters.

In [None]:
config = RankingModelConfig(
    model_path="/tmp/ranking_model_dir",
    train_path="/tmp/train.tfrecords",
    eval_path="/tmp/test.tfrecords",
    context_fields=[
        RankingModelEmbeddingField(
            name="query_tokens",
            vocabulary_file="/tmp/vocab.txt",
            dimension=20,
        ),
    ],
    example_fields=[
        RankingModelEmbeddingField(
            name="document_tokens",
            vocabulary_file="/tmp/vocab.txt",
            dimension=20,
        ),
    ],
    label_field=RankingModelField(
        name="relevance",
        column_type="numeric",
        default_value=-1,
    ),
    num_train_steps=15 * 1000,
    hidden_layer_dims=["64", "32", "16"],
    batch_size=32,
    list_size=50,
    learning_rate=0.05,
    group_size=1,
    dropout_rate=0.8,
    eval_metric=get_ndcg_metric([1, 3, 5, 10]),
    loss_keys=[tfr.losses.RankingLossKey.APPROX_NDCG_LOSS],
)

## Train and evaluate the ranker


In [None]:
!rm -rf /tmp/ranking_model_dir
model = RankingModel(config)
result = model.train()
result

In [None]:
export_model_path = model.save_model()
export_model_path

## Launch TensorBoard

In [None]:
%load_ext tensorboard
%tensorboard --logdir="/tmp/ranking_model_dir" --port 12345

## Generating Predictions

In [None]:
def predict_input_fn(path):
    context_feature_spec = tf.feature_column.make_parse_example_spec(
        [f.get_column() for f in config.context_fields])
    example_feature_spec = tf.feature_column.make_parse_example_spec(
        [f.get_column() for f in config.example_fields])
    dataset = tfr.data.build_ranking_dataset(
        file_pattern=path,
        data_format=tfr.data.ELWC,
        batch_size=config.batch_size,
        list_size=config.list_size,
        context_feature_spec=context_feature_spec,
        example_feature_spec=example_feature_spec,
        reader=tf.data.TFRecordDataset,
        shuffle=False,
        num_epochs=1)
    features = tf.compat.v1.data.make_one_shot_iterator(dataset).get_next()
    return features

In [None]:
predictions = model.get_ranker().predict(input_fn=lambda: predict_input_fn("/tmp/test.tfrecords"))

In [None]:
x = next(predictions)
assert len(x) == 50  # Note that this includes padding.