A Tensorflow implementation of the Deep Listwise Context Model (DLCM) for ranking refinement.
Branch: master
Clone or download
Latest commit ad40a00 Nov 2, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DLCM fix terminology to avoid confusion Jun 15, 2018
scripts/Yahoo Letor/SVMrank Fix an input problem in the script Oct 1, 2018
LICENSE Create LICENSE Nov 2, 2018
README.md Fix some errors in the comments Apr 15, 2018

README.md

Overview

This is an implementation of the Deep Listwise Context Model (DLCM) for ranking refinement <1>. Please cite the following paper if you plan to use it for your project:

  • Qingyao Ai, Keping Bi, Jiafeng Guo, W. Bruce Croft. 2018. Learning a Deep Listwise Context Model for Ranking Refinement. In Proceedings of SIGIR ’18

The DLCM is a deep model that uses a recurrent neural network to encode the feature vectors of top retrieved documents in order to capture the local search context of each query. It can be deployed on any learning-to-rank system for query-specific ranking refinements. Please refer to the paper for more details.

Requirements:

1. To run DLCM in ./DLCM/ and the python scripts in ./scripts/, python 2.7+ and Tensorflow v1.0+ are needed.

Data Preparation

For simplicity, here we show the instruction of data preparation for SVMrank on Yahoo letor set 1 and attached the corresponding scripts in /scripts/. You can extend the scripts or write your own code to prepare the data for other letor datasets and learning algorithms.
1. Download Yahoo Letor dataset 1 from (http://webscope.sandbox.yahoo.co).
2. Decompressed the files and put the data into a single directory. The directory should be like the follows:
	<letor_data_path>: # the directory of letor data
		/set1.train.txt # the data used for training
		/set1.valid.txt # the data used for validation
		/set1.test.txt # the data used for testing
3. Train SVMrank with the data and output the model. For detailed training instructions, please refer to https://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html.
4. Run the SVMrank model on the train/valid/test data and output the corresponding scores. Then you should have a directory with the output scores like:
	<inital_rank_path>: # the directory for SVMrank outputs
		/train.predict # the SVMrank output for documents in the training data
		/valid.predict # the SVMrank output for documents in the validation data
		/test.predict # the SVMrank output for documents in the test data
5. Generate the rank lists in the initial retrieval process using the SVMrank outputs and prepare the data for DLCM:
	python ./scripts/Prepare_yahoo_letor_data_set1.py <letor_data_path> <inital_rank_path> <DLCM_data_path> <rank_cut>
		<letor_data_path>: the directory of letor data. 
		<inital_rank_path>: the directory for SVMrank outputs. 
		<DLCM_data_path>: the directory for the inputs of DLCM. 
		<rank_cut>: the number of top documents we keep for each query.

After the data preparation, we will have the following files in <DLCM_data_path>:
	<DLCM_data_path>/settings.json:
		The settings we used to prepare the data

	<DLCM_data_path>/train/:
		1. <DLCM_data_path>/train/train.feature:
			The feature data.

			<doc_id> <feature_id>:<feature_val> <feature_id>:<feature_val> ... <feature_id>:<feature_val>

				<doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
				<feature_id> = an integer identifier for each feature from 0 to 699
				<feature_val> = the real feature value

			Each line represents a different document. 

		2. <DLCM_data_path>/train/train.init_list:
			The initial rank lists for each query:
			
			<query_id> <feature_line_number_for_the_1st_doc> <feature_line_number_for_the_2nd_doc> ...  <feature_line_number_for_the_Nth_doc>
			
				<query_id> = the integer identifier for each query.
				<feature_line_number_for_the_Nth_doc> = the line number (start from 0) of the feature file (train.feature) in which the features of the Nth document for this query is stored.
			
			Each line represents a rank list generated by the SVMrank for the query. Documents are represented with their feature line number in the feature file, and are sorted by the decending order based on the ranking scores produced by SVMrank.
			
		3. <DLCM_data_path>/train/train.gold_list:
            The golden rank lists for each query:
            
            <query_id> <doc_idx_in_initial_list> <doc_idx_in_initial_list> ...  <doc_idx_in_initial_list>
            
                <query_id> = the integer identifier for each query.
                <doc_idx_in_initial_list> = the index (start from 0) of the document in the initial rank list (stored in train.init_list) for the query. For example, <doc_idx_in_initial_list> = 1 means the 2rd document in the initial list of the query 
            
            Each line represents a golden rank list generated by reranking the initial rank list according document annotations for the query. Documents are represented with their index in the initial list of the corresponding query in train.init_list, and are sorted by the decending order based on human relevance annotations.

		4. <DLCM_data_path>/train/train.weights:
            The annotated relevance value for documents in the initial list of each query.

            <query_id> <relevance_value_for_the_1st_doc> <relevance_value_for_the_2nd_doc> ...  <relevance_value_for_the_Nth_doc>
            
                <query_id> = the integer identifier for each query.
                <relevance_value__for_the_Nth_doc> = the human annotated relevance value of the Nth documents in the initial list of the corresponding query. For 5-level relevance judgments, it should be one of the value from {0,1,2,3,4}.
            

		5. <DLCM_data_path>/train/train.initial_scores:
            The ranking scores produced by SVMrank for documents in the initial list of each query.

            <query_id> <ranking_scores_for_the_1st_doc> <ranking_scores_for_the_2nd_doc> ...  <ranking_scores_for_the_Nth_doc>
            
                <query_id> = the integer identifier for each query.
                <ranking_scores_for_the_Nth_doc> = the ranking scores produced by SVMrank for the Nth documents in the initial list of the corresponding query.

		6. <DLCM_data_path>/train/train.qrels:
            The relevance judgement file used for evaluation.

            <query_id> 0 <doc_id> <relevance_value>

                <query_id> = the integer identifier for each query.
                <doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
                <relevance_value> = the human annotated relevance value for the corresponding query-document pair. For 5-level relevance judgments, it should be one of the value from {0,1,2,3,4}.

		7. <DLCM_data_path>/train/train.trec.gold_list:
            The golden rank lists in TREC format.

            <query_id> Q0 <doc_id> <rank> <relevance_value> Gold

                <query_id> = the integer identifier for each query.
                <doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
                <rank> = the rank (start from 1) of the document in the ranked list of the query.
                <relevance_value> = the human annotated relevance value for the corresponding query-document pair. For 5-level relevance judgments, it should be one of the value from {0,1,2,3,4}.

		8. <DLCM_data_path>/train/train.trec.init_list:
            The initial rank lists in TREC format.

            <query_id> Q0 <doc_id> <rank> <ranking_scores> RankSVM

                <query_id> = the integer identifier for each query.
                <doc_id> = the identifier of the document. For example, "test_2_5" means the 5th document for the query with identifier "2" in the original test set of the Yahoo letor data.
                <rank> = the rank (start from 1) of the document in the ranked list of the query.
                <ranking_scores> = the ranking scores produced by SVMrank for the corresponding query-document pair. 

        * Please notice that the query sequence in train.init_list, train.gold_list, train.weights and train.initial_scores must be the same.
		
	<DLCM_data_path>/valid/:
        Similar to <DLCM_data_path>/train/ except that this directory is built for the validation data.

	<DLCM_data_path>/test/:
        Similar to <DLCM_data_path>/train/ except that this directory is built for the test data.

Model Training/Testing

1. python ./DLCM/main.py --<parameter_name> <parameter_value> --<parameter_name> <parameter_value> … 

    1. batch_size: Batch size used in training. Default 256
    2. embed_size: Size of each embedding. Default 1024.
    3. max_train_iteration: Limit on the iterations of training (0: no limit).
    4. steps_per_checkpoint: How many training steps to do per checkpoint. Default 200
    5. boost_max_num: The max number of new data for boosting one training instance. Default 50. (Not used in the final program, to be deleted)
    6. boost_swap_num: How many time to swap when boosting one training instance. Default 10. (Not used in the final program, to be deleted)
    7. decode: Set to “False" for training on the training data and “True" for testing on test data. Default “False".
    8. decode_train: Set to "True" for testing on the training data. Default "False".
    9. feed_previous: Set to True for feed previous internal output for training. (Not used in the final program, to be deleted).
    10. boost_training_data: Boost training data througn swapping docs with same relevance scores. (Not used in the final program, to be deleted)
    11. data_dir: The data directory, which should be the <DLCM_data_path>.
    12. train_dir: Model directory & output directory.
    13. test_dir: The directory for output test results.
    14. hparams: Hyper-parameters for models (a string written in the Tensorflow required format), which include:

        1. learning_rate:  The learning rate in training. Default 0.5.
        2. learning_rate_decay_factor: Learning rate decays by this much whenever the loss is higher than three previous loss. Default 0.90
        3. max_gradient_norm: Clip gradients to this norm. Default 5.0.
        4. reverse_input: Set to True for reverse input sequences. Default True.
        5. num_layers: Number of layers in the model. Default 1.
        6. num_heads: Number of heads in the attention strategy. Default 3.
        7. l2_loss: The lambda for L2 regularization. Default 0.0.
        8. att_strategy: A string that specifies the function used for attention score computation. It could be "add", "multi", "multi_add", "NTN" and "elu". Please refer to the function rnn_decoder in the source code for detailed information.
        9. use_residua: Set to True for using the initial scores to compute residua. (Not used in the final program, to be deleted)
        10. use_lstm: Set to True for using LSTM cells instead of GRU cells. Default False.
        softRank_theta: Set Gaussian distribution theta for softRank. Default 0.1.
    
    
2. Evaluation

    1. After training with "--decode False”, generate test rank lists with "--decode True”.
    2. TREC format rank lists for test data will be stored in <train_dir> with name “test.ranklist”
    3. Evaluate test rank lists with ground truth <DLCM_data_path>/test/test.qrels using trec_eval or galago eval tool.

Example Parameter Settings

num_layers --> 1
num_heads --> 3
learning_rate --> 0.5
steps_per_checkpoint --> 200
max_train_iteration --> 10000
l2_loss --> 0.0
loss_func --> 'softmax'
att_strategy --> 'multi'

Reference:

<1> Qingyao Ai, Keping Bi, Jiafeng Guo, W. Bruce Croft. 2018. Learning a Deep Listwise Context Model for Ranking Refinement. In Proceedings of SIGIR ’18