# Práctico Learning to Rank: Deep Neural Ranking Tutorial 
- En este tutorial probaremos rendimiento de deep neural ranking utilizando el dataset ANTIQUE de question answering de Yahoo!. 
- Recordar que un modelo de ranking necesita de una función de scoring, una función de pérdida, queries y documentos candidatos. Por lo que probaremos funciones de ranking y funciones de pérdida y compararemos sus resultados.
- La idea es que con este tutorial, ustedes puedan replicar esta metodología aplicada a otros dominios. 

**Ayudantes**: Andrés Carvallo, Manuel Cartagena y Patricio Cerda. \\


In [0]:
import six
import os
import numpy as np
from google.protobuf import text_format

import tensorflow as tf

try:
  import tensorflow_ranking as tfr
except ImportError:
    !pip install -q tensorflow_ranking
    import tensorflow_ranking as tfr

tf.enable_eager_execution()
tf.executing_eagerly()
tf.set_random_seed(1234)
tf.logging.set_verbosity(tf.logging.INFO)

In [139]:
tf.__version__

'1.14.0'

# Descarga de dataset ANTIQUE (yahoo answers)

In [140]:
!wget -O "/tmp/vocab.txt" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/vocab.txt"
!wget -O "/tmp/train.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/train.tfrecords"
!wget -O "/tmp/test.tfrecords" "http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/test.tfrecords"

--2019-09-16 16:14:19--  http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/vocab.txt
Resolving ciir.cs.umass.edu (ciir.cs.umass.edu)... 128.119.246.154
Connecting to ciir.cs.umass.edu (ciir.cs.umass.edu)|128.119.246.154|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 231508 (226K) [text/plain]
Saving to: ‘/tmp/vocab.txt’


2019-09-16 16:14:20 (331 KB/s) - ‘/tmp/vocab.txt’ saved [231508/231508]

--2019-09-16 16:14:21--  http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/train.tfrecords
Resolving ciir.cs.umass.edu (ciir.cs.umass.edu)... 128.119.246.154
Connecting to ciir.cs.umass.edu (ciir.cs.umass.edu)|128.119.246.154|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 156570056 (149M)
Saving to: ‘/tmp/train.tfrecords’


2019-09-16 16:14:35 (10.8 MB/s) - ‘/tmp/train.tfrecords’ saved [156570056/156570056]

--2019-09-16 16:14:37--  http://ciir.cs.umass.edu/downloads/Antique/tf-ranking/test.tfrecords
Resolving ciir.cs.umass.edu (ciir.cs.

#  Formato de los datos para entregárselos al modelo 


## Formato de las queries (ejemplos)

In [141]:
QUERY = text_format.Parse(
    """
    features {
      feature {
        key: "query_tokens"
        value { bytes_list { value: ["what", "are", "the", "benefits" , "of", "drinking", "coffee", "?"] } }
      }
    }""", tf.train.Example())

print(QUERY)


features {
  feature {
    key: "query_tokens"
    value {
      bytes_list {
        value: "what"
        value: "are"
        value: "the"
        value: "benefits"
        value: "of"
        value: "drinking"
        value: "coffee"
        value: "?"
      }
    }
  }
}



##Formato de los documentos (ejemplos)

In [142]:
DOCUMENTS = [
             
    # ejemplo de documento RELEVANTE (1)  
    text_format.Parse(
    """
    features {
      feature {
        key: "document_tokens"
        value { bytes_list { value: ["wake", "up", "the", "mind", "give", "energy", "contain", "antioxidants", "in", "fact", "recommend", "two", "cups", "a", "day"] } }
      }
      feature {
        key: "relevance"
        value { int64_list { value: 1 } }
      }
    }""", tf.train.Example()),

    # ejemplo de documento NO RELEVANTE (0)
    text_format.Parse(
        """
    features {
      feature {
        key: "document_tokens"
        value { bytes_list { value: ["clouds", "are", "white"] } }
      }
      feature {
        key: "relevance"
        value { int64_list { value: 0 } }
      }
    }""", tf.train.Example()),
]

print(DOCUMENTS)

[features {
  feature {
    key: "document_tokens"
    value {
      bytes_list {
        value: "wake"
        value: "up"
        value: "the"
        value: "mind"
        value: "give"
        value: "energy"
        value: "contain"
        value: "antioxidants"
        value: "in"
        value: "fact"
        value: "recommend"
        value: "two"
        value: "cups"
        value: "a"
        value: "day"
      }
    }
  }
  feature {
    key: "relevance"
    value {
      int64_list {
        value: 1
      }
    }
  }
}
, features {
  feature {
    key: "document_tokens"
    value {
      bytes_list {
        value: "clouds"
        value: "are"
        value: "white"
      }
    }
  }
  feature {
    key: "relevance"
    value {
      int64_list {
        value: 0
      }
    }
  }
}
]


# Definición de hiperparámetros y split de datos en set de train y test para la evaluación del modelo 

In [0]:
_TRAIN_DATA_PATH = "/tmp/train.tfrecords"
_TEST_DATA_PATH = "/tmp/test.tfrecords"
_VOCAB_PATH = "/tmp/vocab.txt"

# Definimos el maximo de documentos por query (si tiene menos de ese numero aplica padding)
_LIST_SIZE = 50

# Nombre del feature, en este caso relevante/no relevante.
_LABEL_FEATURE = "relevance"

# Documentos que estan en padding son ignorados para la funcion de perdida, por eso se le asigna un -1 
_PADDING_LABEL = -1

_LEARNING_RATE = 0.05

# Parametros para funcion de scoring 
_BATCH_SIZE = 32
_HIDDEN_LAYER_DIMS = ["64", "32", "16"] # 3 capas, la primera de 64 neuronas, segunda de 32 y la tercera de 16 
_DROPOUT_RATE = 0.8
_GROUP_SIZE = 5  # 1 si es pointwise, 2 pairwise y mas de 2 listwise.

# Path donde guardaremos el modelo y la cantidad de steps de entrenamiento 
_MODEL_DIR = "/tmp/ranking_model_dir"
_NUM_TRAIN_STEPS = 15000 # pasos de entrenamiento 15,000

# Arquitectura del modelo de ranking 
- **Data Loader:** toma los datos raw y los convierte a formato en que los recice el feature extractor.  
- **Feature transformation:** Transformador de datos a features para el ranker.  
- **Función de scoring:** pointwise, pairwise, listwise.  
- **Función de pérdida:** sigmoid cross entropy (pointwise),  logistic loss (pairwise), softmax cross entropy (listwise) 
- **Métricas de evaluación de resultados de ranking:** nDCG, MRR, etc... 
- **Ranking Head:** combina funcion de ranking, funcion de pérdida y metricas de evaluación en un mismo módulo. 
- **Model builder:** construye modelo definitivo de ranking. 

![tf_ranking_arch](https://user-images.githubusercontent.com/3262617/60061785-5f107980-96ab-11e9-9849-ace2d117220f.png)



## Especificamos features via Feature Columns 
- Primero asocia cada palabra a su index en el vocabulario.  
- Luego convierte esta lista de tokens a lista de embeddings que se inicializan aleatorios y el modelo los aprende en el training. 

In [0]:
_EMBEDDING_DIMENSION = 20

# context == query 
def context_feature_columns():
  sparse_column = tf.feature_column.categorical_column_with_vocabulary_file(key="query_tokens",vocabulary_file=_VOCAB_PATH)
  query_embedding_column = tf.feature_column.embedding_column(sparse_column, _EMBEDDING_DIMENSION)
  return {"query_tokens": query_embedding_column}

# example == documento 
def example_feature_columns():
  sparse_column = tf.feature_column.categorical_column_with_vocabulary_file(key="document_tokens",vocabulary_file=_VOCAB_PATH)
  document_embedding_column = tf.feature_column.embedding_column(sparse_column, _EMBEDDING_DIMENSION)
  
  return {"document_tokens": document_embedding_column}

## Leer input data utilizando input_fn (data loader: agrupar datos en batch para dárselos a la red)
- Convierte datos de entrada en tensores y al tipo que correspondan los valores (i.e float, int, etc...)
- Features de los documentos se representan como tensores de 3 dimensiones (queries, documentos y valores de features) 
- Features de las queries son tensores de 2 dimensiones (queries y valores de sus features). 

In [0]:
def input_fn(path, num_epochs=None):
  context_feature_spec = tf.feature_column.make_parse_example_spec(context_feature_columns().values())
  label_column = tf.feature_column.numeric_column(_LABEL_FEATURE, dtype=tf.int64, default_value=_PADDING_LABEL)
  example_feature_spec = tf.feature_column.make_parse_example_spec(list(example_feature_columns().values()) + [label_column])
  
  # LOAD DATASET IN BATCH, SPECIFY WHICH COLUMNS ARE QUERY FEATURES AND WHICH ONES ARE DOCUMENT FEATURES 
  dataset = tfr.data.build_ranking_dataset(file_pattern=path,
        data_format=tfr.data.EIE,
        batch_size=_BATCH_SIZE,
        list_size=_LIST_SIZE,
        context_feature_spec=context_feature_spec,
        example_feature_spec=example_feature_spec,
        reader=tf.data.TFRecordDataset,
        shuffle=False,
        num_epochs=num_epochs)
  
  # RESHAPE FEATURES AND LABELS FOR RANKING MODEL TRAINING 
  features = tf.data.make_one_shot_iterator(dataset).get_next()
  label = tf.squeeze(features.pop(_LABEL_FEATURE), axis=2)
  label = tf.cast(label, tf.float32)
  return features, label

## Transformación de features: transform_fn
- Transforma features sparse a dense features para que sean recibidos por la red  
- Reshape de features 

In [0]:
def make_transform_fn():
  def _transform_fn(features, mode):
    example_name = next(six.iterkeys(example_feature_columns()))
    input_size = tf.shape(input=features[example_name])[1]

    # GROUP QUERY FEATURES AND DOCUMENT FEATURES IN POINTWISE, PAIRWISE AND/OR LISTWISE FORMAT DEPENDING ON WHAT WE CHOSE EARLIER 
    context_features, example_features = tfr.feature.encode_listwise_features(features=features,input_size=input_size,
                                                                              context_feature_columns=context_feature_columns(), 
                                                                              example_feature_columns=example_feature_columns(), 
                                                                              mode=mode,
                                                                              scope="transform_layer")

    return context_features, example_features 
  return _transform_fn

## Scoring Function 
- La idea es calcular una puntuación de relevancia para un (conjunto de) par (s) de query-documento
- El modelo TF-Ranking utilizará datos de entrenamiento para aprender esta función.
- Aquí formulamos una función de scoring utilizando una red neuronal feed-forward.
- La función toma las características de un solo ejemplo (es decir, par de query-documento) y retorna un score de relevancia.

In [0]:
def make_score_fn():
  def _score_fn(context_features, group_features, mode, params, config):
    with tf.compat.v1.name_scope("input_layer"):
      context_input = [tf.compat.v1.layers.flatten(context_features[name]) for name in sorted(context_feature_columns())]
      group_input = [tf.compat.v1.layers.flatten(group_features[name]) for name in sorted(example_feature_columns())]
      input_layer = tf.concat(context_input + group_input, 1)

    is_training = (mode == tf.estimator.ModeKeys.TRAIN)
    cur_layer = input_layer
    cur_layer = tf.compat.v1.layers.batch_normalization(cur_layer,training=is_training,momentum=0.99)

    for i, layer_width in enumerate(int(d) for d in _HIDDEN_LAYER_DIMS):
      cur_layer = tf.compat.v1.layers.dense(cur_layer, units=layer_width)
      cur_layer = tf.compat.v1.layers.batch_normalization(cur_layer,training=is_training,momentum=0.99)
      cur_layer = tf.nn.relu(cur_layer)
      cur_layer = tf.compat.v1.layers.dropout(inputs=cur_layer, rate=_DROPOUT_RATE, training=is_training)
    logits = tf.compat.v1.layers.dense(cur_layer, units=_GROUP_SIZE)
    return logits

  return _score_fn

## Métricas de evaluación 
- ndcg@1, 3,5,10

In [0]:
def eval_metric_fns():
  metric_fns = {}
  metric_fns.update({
      "metric/ndcg@%d" % topn: tfr.metrics.make_ranking_metric_fn(
          tfr.metrics.RankingMetricKey.NDCG, topn=topn)
      for topn in [1, 3, 5, 10]
  })

  return metric_fns

## Función de pérdida 

In [0]:
# Puede ser cualquiera de estas funciones de perdida 
'''
PAIRWISE_HINGE_LOSS
PAIRWISE_LOGISTIC_LOSS (PAIRWISE) # recordar cambiar arriba GROUP_SIZE por 2 
PAIRWISE_SOFT_ZERO_ONE_LOSS
SOFTMAX_LOSS (LISTWISE)
SIGMOID_CROSS_ENTROPY_LOSS (POINTWISE)
MEAN_SQUARED_LOSS
LIST_MLE_LOSS
APPROX_NDCG_LOSS
'''

_LOSS = tfr.losses.RankingLossKey.SOFTMAX_LOSS

loss_fn = tfr.losses.make_loss_fn(_LOSS)

## Ranking Head 
- Aqui se juntan funciones de perdida, optimizador y métrica de evaluación del modelo en un mismo módulo.

In [0]:
optimizer = tf.compat.v1.train.AdagradOptimizer(learning_rate=_LEARNING_RATE)

def _train_op_fn(loss):
  update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  minimize_op = optimizer.minimize(loss=loss, global_step=tf.compat.v1.train.get_global_step())
  train_op = tf.group([update_ops, minimize_op])
  return train_op

ranking_head = tfr.head.create_ranking_head(loss_fn=loss_fn, eval_metric_fns=eval_metric_fns(),train_op_fn=_train_op_fn)

## Juntamos todo en un Model Builder 

In [151]:
model_fn = tfr.model.make_groupwise_ranking_fn(
          group_score_fn=make_score_fn(),
          transform_fn=make_transform_fn(),
          group_size=_GROUP_SIZE,
          ranking_head=ranking_head)

INFO:tensorflow:Building groupwise ranking model.


## Función de entrenamiento y evaluación del ranker 

In [0]:
def train_and_eval_fn():
  run_config = tf.estimator.RunConfig(save_checkpoints_steps=1000) # SAVE MODEL EACH 1,000 STEPS 
  ranker = tf.estimator.Estimator(
      model_fn=model_fn,
      model_dir=_MODEL_DIR,
      config=run_config)

  train_input_fn = lambda: input_fn(_TRAIN_DATA_PATH)
  eval_input_fn = lambda: input_fn(_TEST_DATA_PATH, num_epochs=1)

  train_spec = tf.estimator.TrainSpec(
      input_fn=train_input_fn, max_steps=_NUM_TRAIN_STEPS)
  
  eval_spec =  tf.estimator.EvalSpec(
          name="eval",
          input_fn=eval_input_fn,
          throttle_secs=15)
  
  return (ranker, train_spec, eval_spec)

In [153]:
! rm -rf "/tmp/ranking_model_dir"  # elimina dir del modelo anterior para ver resultados de un nuevo ranker.
ranker, train_spec, eval_spec = train_and_eval_fn()
tf.estimator.train_and_evaluate(ranker, train_spec, eval_spec)

INFO:tensorflow:Using config: {'_model_dir': '/tmp/ranking_model_dir', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f834197c320>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
IN

({'global_step': 15000,
  'labels_mean': 1.9630322,
  'logits_mean': -0.064794704,
  'loss': 50.056644,
  'metric/ndcg@1': 0.685,
  'metric/ndcg@10': 0.8365746,
  'metric/ndcg@3': 0.7347153,
  'metric/ndcg@5': 0.77785563},
 [])

## Resultados POINTWISE con SIGMOID-CROSS-ENTROPY 

- 'global_step': 15000,
- 'labels_mean': 1.9630322
- 'logits_mean': 6237.247
- 'loss': -6358.942
- 'metric/ndcg@1': 0.5928572
- 'metric/ndcg@10': 0.7942663
- 'metric/ndcg@3': 0.67040014
- 'metric/ndcg@5': 0.72173136

## Resultados PAIRWISE con PAIRWISE_LOGISTIC_LOSS

- 'global_step': 15000,
- 'labels_mean': 1.9630322,
- 'logits_mean': -0.6500347,
- 'loss': 0.91366893,
- 'metric/ndcg@1': 0.6535715,
- 'metric/ndcg@10': 0.8362243,
- 'metric/ndcg@3': 0.73208094,
- 'metric/ndcg@5': 0.7784201},

## Resultados LISTWISE con SOFTMAX_LOSS (listas de tamaño n=5) 
- 'global_step': 15000,
- 'labels_mean': 1.9630322,
- 'logits_mean': -0.08550994,
- 'loss': 50.07819,
- 'metric/ndcg@1': 0.68071425,
- 'metric/ndcg@10': 0.83762825,
- 'metric/ndcg@3': 0.7373218,
- 'metric/ndcg@5': 0.7784797

# Probar modelo ya entrenado para queries de ejemplo 

In [1]:
from google.colab import drive
import pandas as pd
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
df_documents = pd.read_csv('drive/My Drive/tf-ranking/antique-collection.txt', sep='\t', names=["doc_id", 'text'])
df_test_queries = pd.read_csv('drive/My Drive/tf-ranking/antique-test-queries.txt',names = ['query_id', 'text'],  sep='\t') 

df_test_results = pd.read_csv('drive/My Drive/tf-ranking/antique-test.qrel',  sep=' ', names = ['query_id', 'query_name', 'doc_id', 'relevance']) 



In [0]:
dict_queries_names = {}
dict_document_names = {}
dict_query_results = {}

for _id, text in zip(df_documents.doc_id, df_documents.text):
  dict_document_names[_id] = text

for _id, text in zip(df_test_queries.query_id, df_test_queries.text):
  dict_queries_names[_id] = text

for _id in df_test_results.query_id:
  dict_query_results[_id] = []

for _id, qid, relevance in zip(df_test_results.query_id, df_test_results.doc_id, df_test_results.relevance) :
  dict_query_results[_id].append([qid, relevance])

  


In [156]:
set(df_test_results.query_id)

{8293,
 23464,
 34041,
 78762,
 100653,
 103830,
 143833,
 159716,
 172731,
 204633,
 204963,
 224109,
 225575,
 229303,
 312215,
 354733,
 387874,
 402514,
 421753,
 443848,
 456214,
 474417,
 481173,
 484496,
 551239,
 558570,
 654124,
 667488,
 676028,
 707303,
 714612,
 746920,
 761742,
 765138,
 785823,
 788976,
 821387,
 823384,
 849221,
 851124,
 896725,
 922849,
 949154,
 953489,
 1015624,
 1017690,
 1035857,
 1063812,
 1077370,
 1082595,
 1119420,
 1152934,
 1167882,
 1199639,
 1254390,
 1262692,
 1282199,
 1287437,
 1290612,
 1292734,
 1340574,
 1351675,
 1364894,
 1373069,
 1459749,
 1477322,
 1502604,
 1509982,
 1582877,
 1607728,
 1623623,
 1663853,
 1702151,
 1783010,
 1794677,
 1821193,
 1844896,
 1850323,
 1862795,
 1866981,
 1880028,
 1937374,
 1944018,
 1957887,
 1964316,
 1968489,
 1971899,
 1977054,
 2008017,
 2018562,
 2142044,
 2180086,
 2182052,
 2192891,
 2290758,
 2291272,
 2307305,
 2309774,
 2380990,
 2382487,
 2418598,
 2443586,
 2446614,
 2452795,
 2479423,

In [169]:
# QUERY DE EJEMPLO 
QID = 143833

print('QUERY TITLE: ')
print(dict_queries_names[QID])


QUERY TITLE: 
What is the difference between coolant and anti-freeze?


In [170]:
total_docs_query = len(df_test_results[df_test_results['query_id'] == QID])

print('total documents for query: {}'.format(len(df_test_results[df_test_results['query_id'] == QID])))

df_test_results[df_test_results['query_id'] == QID ].sort_values('relevance', ascending=False)


total documents for query: 26


Unnamed: 0,query_id,query_name,doc_id,relevance
5611,143833,U0,143833_0,4
5625,143833,Q0,143833_6,4
5614,143833,Q0,143833_7,4
5621,143833,Q0,143833_2,4
5622,143833,Q0,143833_5,3
5623,143833,Q0,143833_4,3
5635,143833,Q0,2404029_2,2
5633,143833,Q0,1453706_9,2
5632,143833,Q0,3977645_5,2
5630,143833,Q0,158155_14,2


### predecimos documentos relevantes para una query del test set 

In [172]:
# test set 
import warnings
from operator import itemgetter
warnings.filterwarnings('ignore')


cont = 6 


ranked_results = ''
real_results = ''

for x in ranker.predict(input_fn=lambda: input_fn(_TEST_DATA_PATH)):

    ranking_list = [[y[0], id_] for id_, y in zip(x[0:total_docs_query], dict_query_results[QID])]
  
    ranked_results = sorted(ranking_list, key=lambda x: x[1], reverse=True)
  
    real_results = sorted(dict_query_results[QID], key=lambda x: x[1], reverse=True)
  
    cont +=1 
    
    if cont == 7:
      break

INFO:tensorflow:vocabulary_size = 30522 in query_tokens is inferred from the number of elements in the vocabulary_file /tmp/vocab.txt.
INFO:tensorflow:vocabulary_size = 30522 in document_tokens is inferred from the number of elements in the vocabulary_file /tmp/vocab.txt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:vocabulary_size = 30522 in document_tokens is inferred from the number of elements in the vocabulary_file /tmp/vocab.txt.
INFO:tensorflow:vocabulary_size = 30522 in query_tokens is inferred from the number of elements in the vocabulary_file /tmp/vocab.txt.
INFO:tensorflow:vocabulary_size = 30522 in document_tokens is inferred from the number of elements in the vocabulary_file /tmp/vocab.txt.
INFO:tensorflow:vocabulary_size = 30522 in query_tokens is inferred from the number of elements in the vocabulary_file /tmp/vocab.txt.
INFO:tensorflow:vocabulary_size = 30522 in document_tokens is inferred from the number of elements in the vocabulary_file /tmp/vocab.txt.
INFO:tens

## RESULTADOS PREDICHOS POR EL RANKER: 

In [173]:
print('QUERY: {}'.format(dict_queries_names[QID]))
df_results = pd.DataFrame(ranked_results, columns = ['doc_id', 'score'])
df_results['text'] = [dict_document_names[x] for x in df_results.doc_id]
df_results


QUERY: What is the difference between coolant and anti-freeze?


Unnamed: 0,doc_id,score,text
0,1130467_0,0.0039,"I would check the ""Coolant Level Sensor"" first..."
1,143833_0,0.002804,"The difference is, coolant is the liquid that ..."
2,3769909_5,0.002679,i think your dog may have eaten chocolate or a...
3,1458356_0,0.001905,"drain the coolant out, drill a hole in the cen..."
4,1453706_9,0.001683,Water was commonly used in automobile radiator...
5,143833_3,0.000987,While your at the store getting your coolant a...
6,143833_5,0.000832,There is no difference. It is the same substan...
7,2404029_2,0.000826,"Actually 2 ways. ""Old man"" way and the Right w..."
8,143833_1,0.000657,It's the same thing dude!
9,3977645_5,0.000462,different oils freeze at different points


## RESULTADOS REALES

In [174]:
print('QUERY: {}'.format(dict_queries_names[QID]))
df_ground_truth = pd.DataFrame(real_results, columns = ['doc_id', 'relevance'])
df_ground_truth['text'] = [dict_document_names[x] for x in df_ground_truth.doc_id]
df_ground_truth

QUERY: What is the difference between coolant and anti-freeze?


Unnamed: 0,doc_id,relevance,text
0,143833_0,4,"The difference is, coolant is the liquid that ..."
1,143833_7,4,Coolant is the liquid in the radiator. It may...
2,143833_2,4,They are the same thing. It just does two jobs...
3,143833_6,4,Coolant IS the more proper term. Ethylene gly...
4,143833_5,3,There is no difference. It is the same substan...
5,143833_4,3,"just make sure whatever you're using says ""for..."
6,1130467_0,2,"I would check the ""Coolant Level Sensor"" first..."
7,2539741_0,2,"i hate to tell you this,but you blown a head g..."
8,2580192_2,2,It may be your fluids especially antifreeze co...
9,143833_1,2,It's the same thing dude!
