Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduction of Benchmark Results #99

Closed
ghost opened this issue May 11, 2018 · 21 comments
Closed

Reproduction of Benchmark Results #99

ghost opened this issue May 11, 2018 · 21 comments
Assignees

Comments

@ghost
Copy link

ghost commented May 11, 2018

When running through the procedure described in the readme for the benchmark results of WikiQA, the reproduced values for NDCG@3, NDCG@5, and MAP are roughly half of the values shown in the table. Could you provide insight as to why this may be occuring?

@faneshion
Copy link
Member

Could you give more details, and which model are you running?

@ghost
Copy link
Author

ghost commented May 12, 2018

I am following the readme. I cloned the repo, ran run_data.sh under data/WikiQA, then proceeded to enter the readme commands for training, i.e. "python matchzoo/main.py --phase train --model_file examples/wikiqa/config/drmm_wikiqa.config" and then the testing command "python matchzoo/main.py --phase predict --model_file examples/wikiqa/config/drmm_wikiqa.config". My results for drmm NDCG@3, NDCG@5, and MAP are roughly half the values shown in the table-around (.29, .34, .357) respectively.

@bwanglzu
Copy link
Member

bwanglzu commented May 18, 2018

Similar question here, this is something I expected, but clearly, the benchmark in Readme is way higher than IR benchmark, I guess there might be some error in implementing the evaluation functions?

I tried to read the source code of MatchZoo, figurred there are some differences w.r.t implementation and paper, let's say CDSSM model:

  1. In this line:
seq.add(Dense(self.config['hidden_sizes'][0], activation='relu', input_shape=(input_dim,)))

The original paper was using tanh instead of relu (alrhough relu could be better).

  1. The original paper didn't use any dropout (see here), but in CDSSM implementation, clearly we have a dropout layer here. I wander how it'll impact the result.
seq.add(Dropout(self.config['dropout_rate']))
  1. Clearly, something is wrong in this line:
wordhashing = Embedding(self.config['vocab_size'], self.config['embed_size'], weights=[self.config['embed']], trainable=self.embed_trainable))

In CDSSM implementation, word hashing layer is represented into an Embedding layer (in Keras), but in original paper, it should be some sort of one-hot representations like this [1,0,2,0,0,0,...0].

  1. The original paper (the shared model) has 1 CONV layer and 1 max-pooling layer without dropout, but in the implementation, we can clearly see that there are 2 CONV layers and max-pooling layers.

  2. In the CDSSM original paper, we use a soft-softmax function as output, in the implementation, we're using a pure softmax.

  3. If you're using default configuration for CDSSM, you can see that default value is:

 "optimizer": "adam",

The original paper is using stochastic gradient descient, but it's fine since it's configurable.

  1. Some other small differences such as batch_size, learning rate (for sgd) and num_epochs.

I guess these changes could bring some unexpected result, I would say let's be careful when using it for scientific purpose.

@yangliuy
Copy link
Member

yangliuy commented May 18, 2018

Thanks @millanbatra for the questions and feedback!

The choice of hyper-parameters has great impact on the results on WikiQA. But it is still very strange why "for the benchmark results of WikiQA, the reproduced values for NDCG@3, NDCG@5, and MAP are roughly half of the values shown in the table". The gap is too large. If you used the correct setting, the gap shouldn't be so large. I pasted the raw output log file from my side when I tried to run aNMM of MatchZoo on WikiQA. You can see the results I got are very close to the results in our Readme file. I suggest you to compare your configuration (on hyper-parameters) of your model with mine to debug your model.

Thanks @bwanglzu for pointing out the differences of the implementations in MatchZoo with the descriptions in some papers. Yes, this is possible. We tried to implement the most important components/novel parts of these neural models. But it is still possible that there are some differences between our current implementation details with some details described in the paper. For some critical differences, we will fix them in the next version of MatchZoo. But for some differences like "dropout", I think it is fine to keep it. You can adjust the dropout rate to control the network as you want. What do you think about it ? @faneshion @pl8787 @bwanglzu

Stay tuned.

My raw output logs:

nohup: ignoring input
Using TensorFlow backend.
{
  "inputs": {
    "test": {
      "phase": "EVAL", 
      "input_type": "DRMM_ListGenerator", 
      "hist_feats_file": "../data/WikiQA/relation_test.binsum-20.txt", 
      "relation_file": "../data/WikiQA/relation_test.txt", 
      "batch_list": 10
    }, 
    "predict": {
      "phase": "PREDICT", 
      "input_type": "DRMM_ListGenerator", 
      "hist_feats_file": "../data/WikiQA/relation_test.binsum-20.txt", 
      "relation_file": "../data/WikiQA/relation_test.txt", 
      "batch_list": 10
    }, 
    "train": {
      "relation_file": "../data/WikiQA/relation_train.txt", 
      "input_type": "DRMM_PairGenerator", 
      "batch_size": 100, 
      "batch_per_iter": 5, 
      "hist_feats_file": "../data/WikiQA/relation_train.binsum-20.txt", 
      "phase": "TRAIN", 
      "query_per_iter": 50, 
      "use_iter": false
    }, 
    "share": {
      "vocab_size": 18678, 
      "use_dpool": false, 
      "embed_size": 300, 
      "target_mode": "ranking", 
      "text1_corpus": "../data/WikiQA/corpus_preprocessed.txt", 
      "text2_corpus": "../data/WikiQA/corpus_preprocessed.txt", 
      "embed_path": "../data/WikiQA/embed.idf", 
      "text1_maxlen": 10, 
      "bin_num": 20, 
      "train_embed": false, 
      "text2_maxlen": 40
    }, 
    "valid": {
      "phase": "EVAL", 
      "input_type": "DRMM_ListGenerator", 
      "hist_feats_file": "../data/WikiQA/relation_valid.binsum-20.txt", 
      "relation_file": "../data/WikiQA/relation_valid.txt", 
      "batch_list": 10
    }
  }, 
  "global": {
    "optimizer": "adadelta", 
    "num_iters": 400, 
    "save_weights_iters": 10, 
    "learning_rate": 0.0001, 
    "weights_file": "./models/weights/anmm.wikiqa.weights", 
    "model_type": "PY", 
    "display_interval": 10
  }, 
  "outputs": {
    "predict": {
      "save_format": "TREC", 
      "save_path": "predict.test.wikiqa.txt"
    }
  }, 
  "losses": [
    {
      "object_name": "rank_hinge_loss", 
      "object_params": {
        "margin": 1.0
      }
    }
  ], 
  "metrics": [
    "ndcg@3", 
    "ndcg@5", 
    "map"
  ], 
  "net_name": "ANMM", 
  "model": {
    "model_py": "anmm.ANMM", 
    "setting": {
      "dropout_rate": 0.0, 
      "num_layers": 2, 
      "hidden_sizes": [
        20, 
        1
      ]
    }, 
    "model_path": "./models/"
  }
}
[../data/WikiQA/embed.idf]
	Embedding size: 18677
Generate numpy embed: (18678, 300)
[Embedding] Embedding Load Done.
[Input] Process Input Tags. [u'train'] in TRAIN, [u'test', u'valid'] in EVAL.
[../data/WikiQA/corpus_preprocessed.txt]
	Data size: 24106
[Dataset] 1 Dataset Load Done.
{u'relation_file': u'../data/WikiQA/relation_train.txt', u'vocab_size': 18678, u'query_per_iter': 50, u'use_dpool': False, u'embed_size': 300, u'target_mode': u'ranking', u'input_type': u'DRMM_PairGenerator', u'text1_corpus': u'../data/WikiQA/corpus_preprocessed.txt', u'batch_size': 100, u'batch_per_iter': 5, u'text2_corpus': u'../data/WikiQA/corpus_preprocessed.txt', u'hist_feats_file': u'../data/WikiQA/relation_train.binsum-20.txt', u'embed_path': u'../data/WikiQA/embed.idf', u'text1_maxlen': 10, u'phase': u'TRAIN', u'bin_num': 20, 'embed': array([[ 7.78767204,  7.78767204,  7.78767204, ...,  7.78767204,
         7.78767204,  7.78767204],
       [ 4.46623993,  4.46623993,  4.46623993, ...,  4.46623993,
         4.46623993,  4.46623993],
       [ 8.4808197 ,  8.4808197 ,  8.4808197 , ...,  8.4808197 ,
         8.4808197 ,  8.4808197 ],
       ..., 
       [ 8.99164486,  8.99164486,  8.99164486, ...,  8.99164486,
         8.99164486,  8.99164486],
       [ 8.99164486,  8.99164486,  8.99164486, ...,  8.99164486,
         8.99164486,  8.99164486],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32), u'train_embed': False, u'text2_maxlen': 40, u'use_iter': False}
[../data/WikiQA/relation_train.txt]
	Instance size: 20360
Pair Instance Count: 8995
[../data/WikiQA/relation_train.binsum-20.txt]
	Feature size: 20360
[DRMM_PairGenerator] init done
{u'relation_file': u'../data/WikiQA/relation_test.txt', u'vocab_size': 18678, u'use_dpool': False, u'embed_size': 300, u'target_mode': u'ranking', u'input_type': u'DRMM_ListGenerator', u'batch_list': 10, u'text1_corpus': u'../data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'../data/WikiQA/corpus_preprocessed.txt', u'hist_feats_file': u'../data/WikiQA/relation_test.binsum-20.txt', u'embed_path': u'../data/WikiQA/embed.idf', u'text1_maxlen': 10, u'phase': u'EVAL', u'bin_num': 20, 'embed': array([[ 7.78767204,  7.78767204,  7.78767204, ...,  7.78767204,
         7.78767204,  7.78767204],
       [ 4.46623993,  4.46623993,  4.46623993, ...,  4.46623993,
         4.46623993,  4.46623993],
       [ 8.4808197 ,  8.4808197 ,  8.4808197 , ...,  8.4808197 ,
         8.4808197 ,  8.4808197 ],
       ..., 
       [ 8.99164486,  8.99164486,  8.99164486, ...,  8.99164486,
         8.99164486,  8.99164486],
       [ 8.99164486,  8.99164486,  8.99164486, ...,  8.99164486,
         8.99164486,  8.99164486],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32), u'train_embed': False, u'text2_maxlen': 40}
[../data/WikiQA/relation_test.txt]
	Instance size: 2341
List Instance Count: 237
[../data/WikiQA/relation_test.binsum-20.txt]
	Feature size: 2341
[DRMM_ListGenerator] init done, list number: 237. 
{u'relation_file': u'../data/WikiQA/relation_valid.txt', u'vocab_size': 18678, u'use_dpool': False, u'embed_size': 300, u'target_mode': u'ranking', u'input_type': u'DRMM_ListGenerator', u'batch_list': 10, u'text1_corpus': u'../data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'../data/WikiQA/corpus_preprocessed.txt', u'hist_feats_file': u'../data/WikiQA/relation_valid.binsum-20.txt', u'embed_path': u'../data/WikiQA/embed.idf', u'text1_maxlen': 10, u'phase': u'EVAL', u'bin_num': 20, 'embed': array([[ 7.78767204,  7.78767204,  7.78767204, ...,  7.78767204,
         7.78767204,  7.78767204],
       [ 4.46623993,  4.46623993,  4.46623993, ...,  4.46623993,
         4.46623993,  4.46623993],
       [ 8.4808197 ,  8.4808197 ,  8.4808197 , ...,  8.4808197 ,
         8.4808197 ,  8.4808197 ],
       ..., 
       [ 8.99164486,  8.99164486,  8.99164486, ...,  8.99164486,
         8.99164486,  8.99164486],
       [ 8.99164486,  8.99164486,  8.99164486, ...,  8.99164486,
         8.99164486,  8.99164486],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32), u'train_embed': False, u'text2_maxlen': 40}
[../data/WikiQA/relation_valid.txt]
	Instance size: 1126
List Instance Count: 122
[../data/WikiQA/relation_valid.binsum-20.txt]
	Feature size: 1126
[DRMM_ListGenerator] init done, list number: 122. 
[ANMM] init done
[layer]: Input	[shape]: [None, 10] 
�[33m [Memory] Total Memory Use: 673.0781 MB 	 Resident: 689232 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Input	[shape]: [None, 10, 20] 
�[33m [Memory] Total Memory Use: 673.0781 MB 	 Resident: 689232 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Embedding	[shape]: [None, 10, 300] 
�[33m [Memory] Total Memory Use: 982.8633 MB 	 Resident: 1006452 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Dense	[shape]: [None, 10, 1] 
�[33m [Memory] Total Memory Use: 982.8633 MB 	 Resident: 1006452 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Lambda-softmax	[shape]: [None, 10, 1] 
�[33m [Memory] Total Memory Use: 983.0234 MB 	 Resident: 1006616 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Dropout	[shape]: [None, 10, 20] 
�[33m [Memory] Total Memory Use: 983.0234 MB 	 Resident: 1006616 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Dense	[shape]: [None, 10, 20] 
�[33m [Memory] Total Memory Use: 983.2500 MB 	 Resident: 1006848 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Dense	[shape]: [None, 10, 1] 
�[33m [Memory] Total Memory Use: 983.5000 MB 	 Resident: 1007104 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Permute	[shape]: [None, 1, 10] 
�[33m [Memory] Total Memory Use: 983.7578 MB 	 Resident: 1007368 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Reshape	[shape]: [None, 10] 
�[33m [Memory] Total Memory Use: 983.7578 MB 	 Resident: 1007368 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Reshape	[shape]: [None, 10] 
�[33m [Memory] Total Memory Use: 983.7578 MB 	 Resident: 1007368 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[layer]: Dense	[shape]: [None, 1] 
�[33m [Memory] Total Memory Use: 984.0000 MB 	 Resident: 1007616 Shared: 0 UnshareData: 0 UnshareStack: 0 �[0m
[Model] Model Compile Done.
[12-02-2017 18:58:15]	[Train:train] Iter:0	loss=0.998629
[12-02-2017 18:58:15]	[Eval:test] Iter:0	ndcg@3=0.437080	map=0.462454	ndcg@5=0.501976
[12-02-2017 18:58:16]	[Eval:valid] Iter:0	ndcg@3=0.524195	map=0.546850	ndcg@5=0.588296
[12-02-2017 18:58:16]	[Train:train] Iter:1	loss=0.983900
[12-02-2017 18:58:16]	[Eval:test] Iter:1	ndcg@3=0.508106	map=0.538522	ndcg@5=0.575976
[12-02-2017 18:58:16]	[Eval:valid] Iter:1	ndcg@3=0.580473	map=0.574481	ndcg@5=0.635195
[12-02-2017 18:58:16]	[Train:train] Iter:2	loss=0.967656
[12-02-2017 18:58:16]	[Eval:test] Iter:2	ndcg@3=0.522492	map=0.552471	ndcg@5=0.592427
......  (skip these output to save spaces)
[12-02-2017 19:00:22]	[Eval:valid] Iter:397	ndcg@3=0.653439	map=0.666694	ndcg@5=0.704019
[12-02-2017 19:00:22]	[Train:train] Iter:398	loss=0.572831
[12-02-2017 19:00:22]	[Eval:test] Iter:398	ndcg@3=0.621962	map=0.622157	ndcg@5=0.666083
[12-02-2017 19:00:22]	[Eval:valid] Iter:398	ndcg@3=0.656465	map=0.667423	ndcg@5=0.706831
[12-02-2017 19:00:22]	[Train:train] Iter:399	loss=0.608465
[12-02-2017 19:00:22]	[Eval:test] Iter:399	ndcg@3=0.619300	map=0.624858	ndcg@5=0.667405
[12-02-2017 19:00:22]	[Eval:valid] Iter:399	ndcg@3=0.647195	map=0.661275	ndcg@5=0.710784

@aneesh-joshi
Copy link
Contributor

aneesh-joshi commented May 19, 2018

@yangliuy @faneshion

Below is the output on my system after running:

python matchzoo/main.py --phase train --model_file examples/wikiqa/config/anmm_wikiqa.config

python matchzoo/main.py --phase predict --model_file examples/wikiqa/config/anmm_wikiqa.config

Output of Train:

Using TensorFlow backend.
2018-05-19 09:30:25.938230: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
{
  "net_name": "ANMM",
  "global": {
    "model_type": "PY",
    "weights_file": "examples/wikiqa/weights/anmm.wikiqa.weights",
    "save_weights_iters": 10,
    "num_iters": 400,
    "display_interval": 10,
    "test_weights_iters": 400,
    "optimizer": "adadelta",
    "learning_rate": 0.0001
  },
  "inputs": {
    "share": {
      "text1_corpus": "./data/WikiQA/corpus_preprocessed.txt",
      "text2_corpus": "./data/WikiQA/corpus_preprocessed.txt",
      "use_dpool": false,
      "embed_size": 300,
      "embed_path": "./data/WikiQA/embed.idf",
      "vocab_size": 18670,
      "train_embed": false,
      "target_mode": "ranking",
      "bin_num": 20,
      "text1_maxlen": 10,
      "text2_maxlen": 40
    },
    "train": {
      "input_type": "DRMM_PairGenerator",
      "phase": "TRAIN",
      "use_iter": false,
      "query_per_iter": 50,
      "batch_per_iter": 5,
      "batch_size": 100,
      "relation_file": "./data/WikiQA/relation_train.txt",
      "hist_feats_file": "./data/WikiQA/relation_train.binsum-20.txt"
    },
    "valid": {
      "input_type": "DRMM_ListGenerator",
      "phase": "EVAL",
      "batch_list": 10,
      "relation_file": "./data/WikiQA/relation_valid.txt",
      "hist_feats_file": "./data/WikiQA/relation_valid.binsum-20.txt"
    },
    "test": {
      "input_type": "DRMM_ListGenerator",
      "phase": "EVAL",
      "batch_list": 10,
      "relation_file": "./data/WikiQA/relation_test.txt",
      "hist_feats_file": "./data/WikiQA/relation_test.binsum-20.txt"
    },
    "predict": {
      "input_type": "DRMM_ListGenerator",
      "phase": "PREDICT",
      "batch_list": 10,
      "relation_file": "./data/WikiQA/relation_test.txt",
      "hist_feats_file": "./data/WikiQA/relation_test.binsum-20.txt"
    }
  },
  "outputs": {
    "predict": {
      "save_format": "TREC",
      "save_path": "predict.test.wikiqa.txt"
    }
  },
  "model": {
    "model_path": "./matchzoo/models/",
    "model_py": "anmm.ANMM",
    "setting": {
      "num_layers": 2,
      "hidden_sizes": [
        20,
        1
      ],
      "dropout_rate": 0.0
    }
  },
  "losses": [
    {
      "object_name": "rank_hinge_loss",
      "object_params": {
        "margin": 1.0
      }
    }
  ],
  "metrics": [
    "ndcg@3",
    "ndcg@5",
    "map"
  ]
}
[./data/WikiQA/embed.idf]
        Embedding size: 18670
Generate numpy embed: %s (18670, 300)
[Embedding] Embedding Load Done.
[Input] Process Input Tags. odict_keys(['train']) in TRAIN, odict_keys(['valid', 'test']) in EVAL.
[./data/WikiQA/corpus_preprocessed.txt]
        Data size: 24106
[Dataset] 1 Dataset Load Done.
{'text1_corpus': './data/WikiQA/corpus_preprocessed.txt', 'text2_corpus': './data/WikiQA/corpus_preprocessed.txt', 'use_dpool': False, 'embed_size': 300, 'embed_path': './data/WikiQA/embed.idf', 'vocab_size': 18670, 'train_embed': False, 'target_mode': 'ranking', 'bin_num': 20, 'text1_maxlen': 10, 'text2_maxlen': 40, 'embed': array([[8.144347, 8.144347, 8.144347, ..., 8.144347, 8.144347, 8.144347],
       [8.48082 , 8.48082 , 8.48082 , ..., 8.48082 , 8.48082 , 8.48082 ],
       [5.426818, 5.426818, 5.426818, ..., 5.426818, 5.426818, 5.426818],
       ...,
       [8.703963, 8.703963, 8.703963, ..., 8.703963, 8.703963, 8.703963],
       [8.991645, 8.991645, 8.991645, ..., 8.991645, 8.991645, 8.991645],
       [0.      , 0.      , 0.      , ..., 0.      , 0.      , 0.      ]],
      dtype=float32), 'input_type': 'DRMM_PairGenerator', 'phase': 'TRAIN', 'use_iter': False, 'query_per_iter': 50, 'batch_per_iter': 5, 'batch_size': 100, 'relation_file': './data/WikiQA/relation_train.txt', 'hist_feats_file': './data/WikiQA/relation_train.binsum-20.txt'}
[./data/WikiQA/relation_train.txt]
        Instance size: 20360
Pair Instance Count: 8995
[./data/WikiQA/relation_train.binsum-20.txt]
        Feature size: 20360
[DRMM_PairGenerator] init done
{'text1_corpus': './data/WikiQA/corpus_preprocessed.txt', 'text2_corpus': './data/WikiQA/corpus_preprocessed.txt', 'use_dpool': False, 'embed_size': 300, 'embed_path': './data/WikiQA/embed.idf', 'vocab_size': 18670, 'train_embed': False, 'target_mode': 'ranking', 'bin_num': 20, 'text1_maxlen': 10, 'text2_maxlen': 40, 'embed': array([[8.144347, 8.144347, 8.144347, ..., 8.144347, 8.144347, 8.144347],
       [8.48082 , 8.48082 , 8.48082 , ..., 8.48082 , 8.48082 , 8.48082 ],
       [5.426818, 5.426818, 5.426818, ..., 5.426818, 5.426818, 5.426818],
       ...,
       [8.703963, 8.703963, 8.703963, ..., 8.703963, 8.703963, 8.703963],
       [8.991645, 8.991645, 8.991645, ..., 8.991645, 8.991645, 8.991645],
       [0.      , 0.      , 0.      , ..., 0.      , 0.      , 0.      ]],
      dtype=float32), 'input_type': 'DRMM_ListGenerator', 'phase': 'EVAL', 'batch_list': 10, 'relation_file': './data/WikiQA/relation_valid.txt', 'hist_feats_file': './data/WikiQA/relation_valid.binsum-20.txt'}
[./data/WikiQA/relation_valid.txt]
        Instance size: 1126
List Instance Count: 122
[./data/WikiQA/relation_valid.binsum-20.txt]
        Feature size: 1126
[DRMM_ListGenerator] init done, list number: 122.
{'text1_corpus': './data/WikiQA/corpus_preprocessed.txt', 'text2_corpus': './data/WikiQA/corpus_preprocessed.txt', 'use_dpool': False, 'embed_size': 300, 'embed_path': './data/WikiQA/embed.idf', 'vocab_size': 18670, 'train_embed': False, 'target_mode': 'ranking', 'bin_num': 20, 'text1_maxlen': 10, 'text2_maxlen': 40, 'embed': array([[8.144347, 8.144347, 8.144347, ..., 8.144347, 8.144347, 8.144347],
       [8.48082 , 8.48082 , 8.48082 , ..., 8.48082 , 8.48082 , 8.48082 ],
       [5.426818, 5.426818, 5.426818, ..., 5.426818, 5.426818, 5.426818],
       ...,
       [8.703963, 8.703963, 8.703963, ..., 8.703963, 8.703963, 8.703963],
       [8.991645, 8.991645, 8.991645, ..., 8.991645, 8.991645, 8.991645],
       [0.      , 0.      , 0.      , ..., 0.      , 0.      , 0.      ]],
      dtype=float32), 'input_type': 'DRMM_ListGenerator', 'phase': 'EVAL', 'batch_list': 10, 'relation_file': './data/WikiQA/relation_test.txt', 'hist_feats_file': './data/WikiQA/relation_test.binsum-20.txt'}
[./data/WikiQA/relation_test.txt]
        Instance size: 2341
List Instance Count: 237
[./data/WikiQA/relation_test.binsum-20.txt]
        Feature size: 2341
[DRMM_ListGenerator] init done, list number: 237.
[ANMM] init done
[layer]: Input  [shape]: [None, 10]
 [Memory] Total Memory Use: 249.3359 MB          Resident: 255320 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Input  [shape]: [None, 10, 20]
 [Memory] Total Memory Use: 249.3438 MB          Resident: 255328 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Embedding      [shape]: [None, 10, 300]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 10, 1]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Lambda-softmax [shape]: [None, 10, 1]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dropout        [shape]: [None, 10, 20]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 10, 20]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 10, 1]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Permute        [shape]: [None, 1, 10]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Reshape        [shape]: [None, 10]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Reshape        [shape]: [None, 10]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 1]
 [Memory] Total Memory Use: 314.0000 MB          Resident: 321536 Shared: 0 UnshareData: 0 UnshareStack: 0
[Model] Model Compile Done.
[05-19-2018 09:30:28]   [Train:train] Iter:0    loss=0.999652
[05-19-2018 09:30:29]   [Eval:valid] Iter:0     ndcg@3=0.434129 ndcg@5=0.529443 map=0.478060
[05-19-2018 09:30:29]   [Eval:test] Iter:0      ndcg@3=0.355642 ndcg@5=0.436278 map=0.404488
[05-19-2018 09:30:29]   [Train:train] Iter:1    loss=1.000090
[05-19-2018 09:30:29]   [Eval:valid] Iter:1     ndcg@3=0.434129 ndcg@5=0.531589 map=0.478060
[05-19-2018 09:30:29]   [Eval:test] Iter:1      ndcg@3=0.352980 ndcg@5=0.433691 map=0.404136
[05-19-2018 09:30:29]   [Train:train] Iter:2    loss=0.999537
[05-19-2018 09:30:29]   [Eval:valid] Iter:2     ndcg@3=0.437154 ndcg@5=0.530516 map=0.484207
[05-19-2018 09:30:29]   [Eval:test] Iter:2      ndcg@3=0.352980 ndcg@5=0.434536 map=0.405191
[05-19-2018 09:30:29]   [Train:train] Iter:3    loss=0.999827
[05-19-2018 09:30:29]   [Eval:valid] Iter:3     ndcg@3=0.439300 ndcg@5=0.526418 map=0.480109
[05-19-2018 09:30:30]   [Eval:test] Iter:3      ndcg@3=0.355090 ndcg@5=0.433691 map=0.403081
[05-19-2018 09:30:30]   [Train:train] Iter:4    loss=0.999971
[05-19-2018 09:30:30]   [Eval:valid] Iter:4     ndcg@3=0.434129 ndcg@5=0.529443 map=0.484207
[05-19-2018 09:30:30]   [Eval:test] Iter:4      ndcg@3=0.359309 ndcg@5=0.434461 map=0.401675
[05-19-2018 09:30:30]   [Train:train] Iter:5    loss=0.999701
[05-19-2018 09:30:30]   [Eval:valid] Iter:5     ndcg@3=0.434129 ndcg@5=0.534614 map=0.478060
[05-19-2018 09:30:30]   [Eval:test] Iter:5      ndcg@3=0.355090 ndcg@5=0.438680 map=0.403925
[05-19-2018 09:30:30]   [Train:train] Iter:6    loss=0.999789
[05-19-2018 09:30:30]   [Eval:valid] Iter:6     ndcg@3=0.434129 ndcg@5=0.529589 map=0.481475
[05-19-2018 09:30:30]   [Eval:test] Iter:6      ndcg@3=0.355090 ndcg@5=0.435013 map=0.401323
[05-19-2018 09:30:30]   [Train:train] Iter:7    loss=0.999478
[05-19-2018 09:30:30]   [Eval:valid] Iter:7     ndcg@3=0.434129 ndcg@5=0.529443 map=0.476921
[05-19-2018 09:30:30]   [Eval:test] Iter:7      ndcg@3=0.355090 ndcg@5=0.433691 map=0.401815
[05-19-2018 09:30:31]   [Train:train] Iter:8    loss=0.999616
[05-19-2018 09:30:31]   [Eval:valid] Iter:8     ndcg@3=0.437154 ndcg@5=0.534614 map=0.484207
[05-19-2018 09:30:31]   [Eval:test] Iter:8      ndcg@3=0.359862 ndcg@5=0.433984 map=0.402026
[05-19-2018 09:30:31]   [Train:train] Iter:9    loss=0.999312
[05-19-2018 09:30:31]   [Eval:valid] Iter:9     ndcg@3=0.437154 ndcg@5=0.529948 map=0.478743
[05-19-2018 09:30:31]   [Eval:test] Iter:9      ndcg@3=0.352980 ndcg@5=0.432904 map=0.400409
[05-19-2018 09:30:31]   [Train:train] Iter:10   loss=0.999152
[05-19-2018 09:30:31]   [Eval:valid] Iter:10    ndcg@3=0.437154 ndcg@5=0.526418 map=0.476921
[05-19-2018 09:30:31]   [Eval:test] Iter:10     ndcg@3=0.355642 ndcg@5=0.433691 map=0.400971
[05-19-2018 09:30:32]   [Train:train] Iter:11   loss=0.999868
[05-19-2018 09:30:32]   [Eval:valid] Iter:11    ndcg@3=0.434129 ndcg@5=0.529443 map=0.478743
[05-19-2018 09:30:32]   [Eval:test] Iter:11     ndcg@3=0.357200 ndcg@5=0.434536 map=0.404136
[05-19-2018 09:30:32]   [Train:train] Iter:12   loss=0.999864
[05-19-2018 09:30:32]   [Eval:valid] Iter:12    ndcg@3=0.437154 ndcg@5=0.534614 map=0.484207
[05-19-2018 09:30:32]   [Eval:test] Iter:12     ndcg@3=0.359309 ndcg@5=0.436278 map=0.400650
[05-19-2018 09:30:32]   [Train:train] Iter:13   loss=0.999480
[05-19-2018 09:30:32]   [Eval:valid] Iter:13    ndcg@3=0.437154 ndcg@5=0.526418 map=0.477035
[05-19-2018 09:30:32]   [Eval:test] Iter:13     ndcg@3=0.352980 ndcg@5=0.434461 map=0.400861
[05-19-2018 09:30:32]   [Train:train] Iter:14   loss=0.999624
[05-19-2018 09:30:32]   [Eval:valid] Iter:14    ndcg@3=0.445350 ndcg@5=0.534469 map=0.488032
[05-19-2018 09:30:32]   [Eval:test] Iter:14     ndcg@3=0.357752 ndcg@5=0.434536 map=0.400509
[05-19-2018 09:30:32]   [Train:train] Iter:15   loss=0.998972
[05-19-2018 09:30:33]   [Eval:valid] Iter:15    ndcg@3=0.434129 ndcg@5=0.526272 map=0.483934
[05-19-2018 09:30:33]   [Eval:test] Iter:15     ndcg@3=0.361419 ndcg@5=0.434461 map=0.401072
[05-19-2018 09:30:33]   [Train:train] Iter:16   loss=0.999660
.
.
.
[05-19-2018 09:32:19]   [Train:train] Iter:395  loss=0.998305
[05-19-2018 09:32:19]   [Eval:valid] Iter:395   ndcg@3=0.446936 ndcg@5=0.538421 map=0.499850
[05-19-2018 09:32:19]   [Eval:test] Iter:395    ndcg@3=0.370460 ndcg@5=0.445458 map=0.407764
[05-19-2018 09:32:19]   [Train:train] Iter:396  loss=0.999088
[05-19-2018 09:32:19]   [Eval:valid] Iter:396   ndcg@3=0.449961 ndcg@5=0.531297 map=0.488580
[05-19-2018 09:32:19]   [Eval:test] Iter:396    ndcg@3=0.372570 ndcg@5=0.442871 map=0.409382
[05-19-2018 09:32:19]   [Train:train] Iter:397  loss=0.998287
[05-19-2018 09:32:19]   [Eval:valid] Iter:397   ndcg@3=0.451034 ndcg@5=0.534322 map=0.489195
[05-19-2018 09:32:19]   [Eval:test] Iter:397    ndcg@3=0.368351 ndcg@5=0.442578 map=0.408116
[05-19-2018 09:32:19]   [Train:train] Iter:398  loss=0.998974
[05-19-2018 09:32:19]   [Eval:valid] Iter:398   ndcg@3=0.449961 ndcg@5=0.539494 map=0.488466
[05-19-2018 09:32:19]   [Eval:test] Iter:398    ndcg@3=0.366793 ndcg@5=0.443348 map=0.407624
[05-19-2018 09:32:19]   [Train:train] Iter:399  loss=0.999517
[05-19-2018 09:32:20]   [Eval:valid] Iter:399   ndcg@3=0.455133 ndcg@5=0.537493 map=0.488580
[05-19-2018 09:32:20]   [Eval:test] Iter:399    ndcg@3=0.364131 ndcg@5=0.443608 map=0.409030

Output of Predict:

Using TensorFlow backend.
2018-05-19 09:32:50.711351: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
{
  "net_name": "ANMM",
  "global": {
    "model_type": "PY",
    "weights_file": "examples/wikiqa/weights/anmm.wikiqa.weights",
    "save_weights_iters": 10,
    "num_iters": 400,
    "display_interval": 10,
    "test_weights_iters": 400,
    "optimizer": "adadelta",
    "learning_rate": 0.0001
  },
  "inputs": {
    "share": {
      "text1_corpus": "./data/WikiQA/corpus_preprocessed.txt",
      "text2_corpus": "./data/WikiQA/corpus_preprocessed.txt",
      "use_dpool": false,
      "embed_size": 300,
      "embed_path": "./data/WikiQA/embed.idf",
      "vocab_size": 18670,
      "train_embed": false,
      "target_mode": "ranking",
      "bin_num": 20,
      "text1_maxlen": 10,
      "text2_maxlen": 40
    },
    "train": {
      "input_type": "DRMM_PairGenerator",
      "phase": "TRAIN",
      "use_iter": false,
      "query_per_iter": 50,
      "batch_per_iter": 5,
      "batch_size": 100,
      "relation_file": "./data/WikiQA/relation_train.txt",
      "hist_feats_file": "./data/WikiQA/relation_train.binsum-20.txt"
    },
    "valid": {
      "input_type": "DRMM_ListGenerator",
      "phase": "EVAL",
      "batch_list": 10,
      "relation_file": "./data/WikiQA/relation_valid.txt",
      "hist_feats_file": "./data/WikiQA/relation_valid.binsum-20.txt"
    },
    "test": {
      "input_type": "DRMM_ListGenerator",
      "phase": "EVAL",
      "batch_list": 10,
      "relation_file": "./data/WikiQA/relation_test.txt",
      "hist_feats_file": "./data/WikiQA/relation_test.binsum-20.txt"
    },
    "predict": {
      "input_type": "DRMM_ListGenerator",
      "phase": "PREDICT",
      "batch_list": 10,
      "relation_file": "./data/WikiQA/relation_test.txt",
      "hist_feats_file": "./data/WikiQA/relation_test.binsum-20.txt"
    }
  },
  "outputs": {
    "predict": {
      "save_format": "TREC",
      "save_path": "predict.test.wikiqa.txt"
    }
  },
  "model": {
    "model_path": "./matchzoo/models/",
    "model_py": "anmm.ANMM",
    "setting": {
      "num_layers": 2,
      "hidden_sizes": [
        20,
        1
      ],
      "dropout_rate": 0.0
    }
  },
  "losses": [
    {
      "object_name": "rank_hinge_loss",
      "object_params": {
        "margin": 1.0
      }
    }
  ],
  "metrics": [
    "ndcg@3",
    "ndcg@5",
    "map"
  ]
}
[./data/WikiQA/embed.idf]
        Embedding size: 18670
Generate numpy embed: %s (18670, 300)
[Embedding] Embedding Load Done.
[Input] Process Input Tags. odict_keys(['predict']) in PREDICT.
[./data/WikiQA/corpus_preprocessed.txt]
        Data size: 24106
[Dataset] 1 Dataset Load Done.
{'text1_corpus': './data/WikiQA/corpus_preprocessed.txt', 'text2_corpus': './data/WikiQA/corpus_preprocessed.txt', 'use_dpool': False, 'embed_size': 300, 'embed_path': './data/WikiQA/embed.idf', 'vocab_size': 18670, 'train_embed': False, 'target_mode': 'ranking', 'bin_num': 20, 'text1_maxlen': 10, 'text2_maxlen': 40, 'embed': array([[8.144347, 8.144347, 8.144347, ..., 8.144347, 8.144347, 8.144347],
       [8.48082 , 8.48082 , 8.48082 , ..., 8.48082 , 8.48082 , 8.48082 ],
       [5.426818, 5.426818, 5.426818, ..., 5.426818, 5.426818, 5.426818],
       ...,
       [8.703963, 8.703963, 8.703963, ..., 8.703963, 8.703963, 8.703963],
       [8.991645, 8.991645, 8.991645, ..., 8.991645, 8.991645, 8.991645],
       [0.      , 0.      , 0.      , ..., 0.      , 0.      , 0.      ]],
      dtype=float32), 'input_type': 'DRMM_ListGenerator', 'phase': 'PREDICT', 'batch_list': 10, 'relation_file': './data/WikiQA/relation_test.txt', 'hist_feats_file': './data/WikiQA/relation_test.binsum-20.txt'}
[./data/WikiQA/relation_test.txt]
        Instance size: 2341
List Instance Count: 237
[./data/WikiQA/relation_test.binsum-20.txt]
        Feature size: 2341
[DRMM_ListGenerator] init done, list number: 237.
[ANMM] init done
[layer]: Input  [shape]: [None, 10]
 [Memory] Total Memory Use: 206.9180 MB          Resident: 211884 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Input  [shape]: [None, 10, 20]
 [Memory] Total Memory Use: 206.9180 MB          Resident: 211884 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Embedding      [shape]: [None, 10, 300]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 10, 1]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Lambda-softmax [shape]: [None, 10, 1]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dropout        [shape]: [None, 10, 20]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 10, 20]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 10, 1]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Permute        [shape]: [None, 1, 10]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Reshape        [shape]: [None, 10]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Reshape        [shape]: [None, 10]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense  [shape]: [None, 1]
 [Memory] Total Memory Use: 253.6914 MB          Resident: 259780 Shared: 0 UnshareData: 0 UnshareStack: 0
[05-19-2018 09:32:51]   [Predict] @ predict [Predict] results:  ndcg@3=0.366241 ndcg@5=0.443056 map=0.411140

@bwanglzu
Copy link
Member

@yangliuy Agree, just leave stuff such as drop_out, activation configurable. But the best option could be keep the parameters "as similar as possible", and give users the opportunity to fine-tune the rest parameters (Because in my mind MatchZoo is developed for academic purpose).

Any plans for next version?

@ghost
Copy link
Author

ghost commented May 19, 2018

My values match annesh_joshi's almost exactly. They are ndcg@3=0.36468, map=0.404962, and ndcg@5=0.440946.

from ._conv import register_converters as _register_converters
Using TensorFlow backend.
2018-05-19 14:53:02.162052: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
{
"inputs": {
"test": {
"phase": "EVAL",
"input_type": "DRMM_ListGenerator",
"hist_feats_file": "./data/WikiQA/relation_test.binsum-20.txt",
"relation_file": "./data/WikiQA/relation_test.txt",
"batch_list": 10
},
"predict": {
"phase": "PREDICT",
"input_type": "DRMM_ListGenerator",
"hist_feats_file": "./data/WikiQA/relation_test.binsum-20.txt",
"relation_file": "./data/WikiQA/relation_test.txt",
"batch_list": 10
},
"train": {
"relation_file": "./data/WikiQA/relation_train.txt",
"input_type": "DRMM_PairGenerator",
"batch_size": 100,
"batch_per_iter": 5,
"hist_feats_file": "./data/WikiQA/relation_train.binsum-20.txt",
"phase": "TRAIN",
"query_per_iter": 50,
"use_iter": false
},
"share": {
"vocab_size": 18670,
"use_dpool": false,
"embed_size": 300,
"target_mode": "ranking",
"text1_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"text2_corpus": "./data/WikiQA/corpus_preprocessed.txt",
"embed_path": "./data/WikiQA/embed.idf",
"text1_maxlen": 10,
"bin_num": 20,
"train_embed": false,
"text2_maxlen": 40
},
"valid": {
"phase": "EVAL",
"input_type": "DRMM_ListGenerator",
"hist_feats_file": "./data/WikiQA/relation_valid.binsum-20.txt",
"relation_file": "./data/WikiQA/relation_valid.txt",
"batch_list": 10
}
},
"global": {
"optimizer": "adadelta",
"num_iters": 400,
"save_weights_iters": 10,
"learning_rate": 0.0001,
"test_weights_iters": 400,
"weights_file": "examples/wikiqa/weights/anmm.wikiqa.weights",
"model_type": "PY",
"display_interval": 10
},
"outputs": {
"predict": {
"save_format": "TREC",
"save_path": "predict.test.wikiqa.txt"
}
},
"losses": [
{
"object_name": "rank_hinge_loss",
"object_params": {
"margin": 1.0
}
}
],
"metrics": [
"ndcg@3",
"ndcg@5",
"map"
],
"net_name": "ANMM",
"model": {
"model_py": "anmm.ANMM",
"setting": {
"dropout_rate": 0.0,
"hidden_sizes": [
20,
1
],
"num_layers": 2
},
"model_path": "./matchzoo/models/"
}
}
[./data/WikiQA/embed.idf]
Embedding size: 18670
Generate numpy embed: %s (18670, 300)
[Embedding] Embedding Load Done.
[Input] Process Input Tags. [u'predict'] in PREDICT.
[./data/WikiQA/corpus_preprocessed.txt]
Data size: 24106
[Dataset] 1 Dataset Load Done.
{u'relation_file': u'./data/WikiQA/relation_test.txt', u'vocab_size': 18670, u'use_dpool': False, u'embed_size': 300, u'target_mode': u'ranking', u'input_type': u'DRMM_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'hist_feats_file': u'./data/WikiQA/relation_test.binsum-20.txt', u'embed_path': u'./data/WikiQA/embed.idf', u'text1_maxlen': 10, u'phase': u'PREDICT', u'bin_num': 20, 'embed': array([[7.787672, 7.787672, 7.787672, ..., 7.787672, 7.787672, 7.787672],
[4.46624 , 4.46624 , 4.46624 , ..., 4.46624 , 4.46624 , 4.46624 ],
[8.48082 , 8.48082 , 8.48082 , ..., 8.48082 , 8.48082 , 8.48082 ],
...,
[8.991645, 8.991645, 8.991645, ..., 8.991645, 8.991645, 8.991645],
[8.991645, 8.991645, 8.991645, ..., 8.991645, 8.991645, 8.991645],
[0. , 0. , 0. , ..., 0. , 0. , 0. ]],
dtype=float32), u'train_embed': False, u'text2_maxlen': 40}
[./data/WikiQA/relation_test.txt]
Instance size: 2341
List Instance Count: 237
[./data/WikiQA/relation_test.binsum-20.txt]
Feature size: 2341
[DRMM_ListGenerator] init done, list number: 237.
[ANMM] init done
[layer]: Input [shape]: [None, 10]
[Memory] Total Memory Use: 233.9219 MB Resident: 245284864 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Input [shape]: [None, 10, 20]
[Memory] Total Memory Use: 233.9297 MB Resident: 245293056 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Embedding [shape]: [None, 10, 300]
[Memory] Total Memory Use: 300.3477 MB Resident: 314937344 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense [shape]: [None, 10, 1]
[Memory] Total Memory Use: 300.5156 MB Resident: 315113472 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Lambda-softmax [shape]: [None, 10, 1]
[Memory] Total Memory Use: 300.5781 MB Resident: 315179008 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dropout [shape]: [None, 10, 20]
[Memory] Total Memory Use: 300.5898 MB Resident: 315191296 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense [shape]: [None, 10, 20]
[Memory] Total Memory Use: 300.8398 MB Resident: 315453440 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense [shape]: [None, 10, 1]
[Memory] Total Memory Use: 301.1211 MB Resident: 315748352 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Permute [shape]: [None, 1, 10]
[Memory] Total Memory Use: 301.1367 MB Resident: 315764736 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Reshape [shape]: [None, 10]
[Memory] Total Memory Use: 301.2148 MB Resident: 315846656 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Reshape [shape]: [None, 10]
[Memory] Total Memory Use: 301.3008 MB Resident: 315936768 Shared: 0 UnshareData: 0 UnshareStack: 0
[layer]: Dense [shape]: [None, 1]
[Memory] Total Memory Use: 301.3281 MB Resident: 315965440 Shared: 0 UnshareData: 0 UnshareStack: 0
[05-19-2018 14:53:02] [Predict] @ predict [Predict] results: ndcg@3=0.364684 map=0.404962 ndcg@5=0.440946

@yangliuy
Copy link
Member

@millanbatra @aneesh-joshi I noticed that your training loss didn't decrease as in my output. That's why your results are bad. The vocab_size is different. You can compare your settings with my settings.It is also possible that some code changes between "12-02-2017" and "05-19-2018" introduced bugs into MatchZoo. (My run was performed on "12-02-2017") It is very strange why your training loss always kept the same until the last iteration. We'll double check this part later.

@mandroid6
Copy link

mandroid6 commented May 20, 2018

@yangliuy For dssm, the training loss is decreasing as expected. Maybe this is a drmm specific issue, will need to verify with other models.

Ran the following -
Train -

python matchzoo/main.py --phase train --model_file examples/wikiqa/config/dssm_wikiqa.config 

Predict -

python matchzoo/main.py --phase predict --model_file examples/wikiqa/config/dssm_wikiqa.config 

Output for train -

Using TensorFlow backend.
2018-05-20 11:55:29.012697: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
{
  "inputs": {
    "test": {
      "phase": "EVAL", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/WikiQA/relation_test.txt", 
      "dtype": "dssm"
    }, 
    "predict": {
      "phase": "PREDICT", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/WikiQA/relation_test.txt", 
      "dtype": "dssm"
    }, 
    "train": {
      "relation_file": "./data/WikiQA/relation_train.txt", 
      "input_type": "Triletter_PairGenerator", 
      "batch_size": 100, 
      "batch_per_iter": 5, 
      "dtype": "dssm", 
      "phase": "TRAIN", 
      "query_per_iter": 50, 
      "use_iter": false
    }, 
    "share": {
      "vocab_size": 3314, 
      "embed_size": 1, 
      "target_mode": "ranking", 
      "text1_corpus": "./data/WikiQA/corpus_preprocessed.txt", 
      "text2_corpus": "./data/WikiQA/corpus_preprocessed.txt", 
      "word_triletter_map_file": "./data/WikiQA/word_triletter_map.txt"
    }, 
    "valid": {
      "phase": "EVAL", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/WikiQA/relation_valid.txt", 
      "dtype": "dssm"
    }
  }, 
  "global": {
    "optimizer": "adam", 
    "num_iters": 400, 
    "save_weights_iters": 10, 
    "learning_rate": 0.0001, 
    "test_weights_iters": 400, 
    "weights_file": "examples/wikiqa/weights/dssm.wikiqa.weights", 
    "model_type": "PY", 
    "display_interval": 10
  }, 
  "outputs": {
    "predict": {
      "save_format": "TREC", 
      "save_path": "predict.test.wikiqa.txt"
    }
  }, 
  "losses": [
    {
      "object_name": "rank_hinge_loss", 
      "object_params": {
        "margin": 1.0
      }
    }
  ], 
  "metrics": [
    "ndcg@3", 
    "ndcg@5", 
    "map"
  ], 
  "net_name": "DSSM", 
  "model": {
    "model_py": "dssm.DSSM", 
    "setting": {
      "dropout_rate": 0.9, 
      "hidden_sizes": [
        300
      ]
    }, 
    "model_path": "./matchzoo/models/"
  }
}
[Embedding] Embedding Load Done.
[Input] Process Input Tags. [u'train'] in TRAIN, [u'test', u'valid'] in EVAL.
[./data/WikiQA/corpus_preprocessed.txt]
	Data size: 24106
[Dataset] 1 Dataset Load Done.
{u'relation_file': u'./data/WikiQA/relation_train.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_PairGenerator', u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'batch_size': 100, u'batch_per_iter': 5, u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'TRAIN', 'embed': array([[-0.18291523],
       [-0.00574826],
       [-0.13887608],
       ..., 
       [-0.17844775],
       [-0.1465386 ],
       [-0.13503003]], dtype=float32), u'query_per_iter': 50, u'use_iter': False}
[./data/WikiQA/relation_train.txt]
	Instance size: 20360
Pair Instance Count: 8995
[Triletter_PairGenerator] init done
{u'relation_file': u'./data/WikiQA/relation_test.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523],
       [-0.00574826],
       [-0.13887608],
       ..., 
       [-0.17844775],
       [-0.1465386 ],
       [-0.13503003]], dtype=float32)}
[./data/WikiQA/relation_test.txt]
	Instance size: 2341
List Instance Count: 237
[Triletter_ListGenerator] init done
{u'relation_file': u'./data/WikiQA/relation_valid.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'EVAL', 'embed': array([[-0.18291523],
       [-0.00574826],
       [-0.13887608],
       ..., 
       [-0.17844775],
       [-0.1465386 ],
       [-0.13503003]], dtype=float32)}
[./data/WikiQA/relation_valid.txt]
	Instance size: 1126
List Instance Count: 122
[Triletter_ListGenerator] init done
[DSSM] init done
[layer]: Input	[shape]: [None, 3314] 
 [Memory] Total Memory Use: 188.4258 MB 	 Resident: 192948 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: Input	[shape]: [None, 3314] 
 [Memory] Total Memory Use: 188.4258 MB 	 Resident: 192948 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: MLP	[shape]: [None, 300] 
 [Memory] Total Memory Use: 188.7344 MB 	 Resident: 193264 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: MLP	[shape]: [None, 300] 
 [Memory] Total Memory Use: 188.7344 MB 	 Resident: 193264 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: Dot	[shape]: [None, 1] 
 [Memory] Total Memory Use: 188.9414 MB 	 Resident: 193476 Shared: 0 UnshareData: 0 UnshareStack: 0 
[Model] Model Compile Done.
[05-20-2018 11:55:29]	[Train:train] Iter:0	loss=0.875981
[05-20-2018 11:55:30]	[Eval:test] Iter:0	ndcg@3=0.530968	map=0.547287	ndcg@5=0.598319
[05-20-2018 11:55:31]	[Eval:valid] Iter:0	ndcg@3=0.559904	map=0.579712	ndcg@5=0.626929
[05-20-2018 11:55:31]	[Train:train] Iter:1	loss=0.846194
[05-20-2018 11:55:32]	[Eval:test] Iter:1	ndcg@3=0.530223	map=0.549429	ndcg@5=0.604287
[05-20-2018 11:55:32]	[Eval:valid] Iter:1	ndcg@3=0.557536	map=0.581286	ndcg@5=0.628172
[05-20-2018 11:55:33]	[Train:train] Iter:2	loss=0.832642
[05-20-2018 11:55:34]	[Eval:test] Iter:2	ndcg@3=0.530487	map=0.551394	ndcg@5=0.606368
[05-20-2018 11:55:34]	[Eval:valid] Iter:2	ndcg@3=0.557536	map=0.580841	ndcg@5=0.625001
[05-20-2018 11:55:34]	[Train:train] Iter:3	loss=0.807873
[05-20-2018 11:55:35]	[Eval:test] Iter:3	ndcg@3=0.534706	map=0.552465	ndcg@5=0.606953
[05-20-2018 11:55:36]	[Eval:valid] Iter:3	ndcg@3=0.558610	map=0.577279	ndcg@5=0.622544
[05-20-2018 11:55:36]	[Train:train] Iter:4	loss=0.792922
[05-20-2018 11:55:37]	[Eval:test] Iter:4	ndcg@3=0.538373	map=0.559824	ndcg@5=0.610877
[05-20-2018 11:55:37]	[Eval:valid] Iter:4	ndcg@3=0.554511	map=0.576596	ndcg@5=0.621976
[05-20-2018 11:55:38]	[Train:train] Iter:5	loss=0.780318
[05-20-2018 11:55:39]	[Eval:test] Iter:5	ndcg@3=0.537771	map=0.559040	ndcg@5=0.612908
[05-20-2018 11:55:39]	[Eval:valid] Iter:5	ndcg@3=0.559683	map=0.579048	ndcg@5=0.630220
[05-20-2018 11:55:39]	[Train:train] Iter:6	loss=0.751999
[05-20-2018 11:55:40]	[Eval:test] Iter:6	ndcg@3=0.539328	map=0.558607	ndcg@5=0.612648
[05-20-2018 11:55:41]	[Eval:valid] Iter:6	ndcg@3=0.560562	map=0.581029	ndcg@5=0.631679
[05-20-2018 11:55:41]	[Train:train] Iter:7	loss=0.732407
.
.
.
[05-20-2018 12:07:58]	[Train:train] Iter:395	loss=0.001485
[05-20-2018 12:07:59]	[Eval:test] Iter:395	ndcg@3=0.552114	map=0.571677	ndcg@5=0.613771
[05-20-2018 12:07:59]	[Eval:valid] Iter:395	ndcg@3=0.555092	map=0.560439	ndcg@5=0.610904
[05-20-2018 12:07:59]	[Train:train] Iter:396	loss=0.000499
[05-20-2018 12:08:00]	[Eval:test] Iter:396	ndcg@3=0.553672	map=0.573834	ndcg@5=0.615328
[05-20-2018 12:08:01]	[Eval:valid] Iter:396	ndcg@3=0.555092	map=0.560405	ndcg@5=0.610904
[05-20-2018 12:08:01]	[Train:train] Iter:397	loss=0.001926
[05-20-2018 12:08:02]	[Eval:test] Iter:397	ndcg@3=0.552114	map=0.574506	ndcg@5=0.616589
[05-20-2018 12:08:03]	[Eval:valid] Iter:397	ndcg@3=0.558117	map=0.564390	ndcg@5=0.613929
[05-20-2018 12:08:03]	[Train:train] Iter:398	loss=0.000582
[05-20-2018 12:08:04]	[Eval:test] Iter:398	ndcg@3=0.552114	map=0.574556	ndcg@5=0.616589
[05-20-2018 12:08:05]	[Eval:valid] Iter:398	ndcg@3=0.554019	map=0.563980	ndcg@5=0.616532
[05-20-2018 12:08:05]	[Train:train] Iter:399	loss=0.001208
[05-20-2018 12:08:06]	[Eval:test] Iter:399	ndcg@3=0.552114	map=0.574365	ndcg@5=0.615588
[05-20-2018 12:08:07]	[Eval:valid] Iter:399	ndcg@3=0.554019	map=0.563980	ndcg@5=0.616532

Output for predict -

Using TensorFlow backend.
2018-05-20 12:08:16.542890: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
{
  "inputs": {
    "test": {
      "phase": "EVAL", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/WikiQA/relation_test.txt", 
      "dtype": "dssm"
    }, 
    "predict": {
      "phase": "PREDICT", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/WikiQA/relation_test.txt", 
      "dtype": "dssm"
    }, 
    "train": {
      "relation_file": "./data/WikiQA/relation_train.txt", 
      "input_type": "Triletter_PairGenerator", 
      "batch_size": 100, 
      "batch_per_iter": 5, 
      "dtype": "dssm", 
      "phase": "TRAIN", 
      "query_per_iter": 50, 
      "use_iter": false
    }, 
    "share": {
      "vocab_size": 3314, 
      "embed_size": 1, 
      "target_mode": "ranking", 
      "text1_corpus": "./data/WikiQA/corpus_preprocessed.txt", 
      "text2_corpus": "./data/WikiQA/corpus_preprocessed.txt", 
      "word_triletter_map_file": "./data/WikiQA/word_triletter_map.txt"
    }, 
    "valid": {
      "phase": "EVAL", 
      "input_type": "Triletter_ListGenerator", 
      "batch_list": 10, 
      "relation_file": "./data/WikiQA/relation_valid.txt", 
      "dtype": "dssm"
    }
  }, 
  "global": {
    "optimizer": "adam", 
    "num_iters": 400, 
    "save_weights_iters": 10, 
    "learning_rate": 0.0001, 
    "test_weights_iters": 400, 
    "weights_file": "examples/wikiqa/weights/dssm.wikiqa.weights", 
    "model_type": "PY", 
    "display_interval": 10
  }, 
  "outputs": {
    "predict": {
      "save_format": "TREC", 
      "save_path": "predict.test.wikiqa.txt"
    }
  }, 
  "losses": [
    {
      "object_name": "rank_hinge_loss", 
      "object_params": {
        "margin": 1.0
      }
    }
  ], 
  "metrics": [
    "ndcg@3", 
    "ndcg@5", 
    "map"
  ], 
  "net_name": "DSSM", 
  "model": {
    "model_py": "dssm.DSSM", 
    "setting": {
      "dropout_rate": 0.9, 
      "hidden_sizes": [
        300
      ]
    }, 
    "model_path": "./matchzoo/models/"
  }
}
[Embedding] Embedding Load Done.
[Input] Process Input Tags. [u'predict'] in PREDICT.
[./data/WikiQA/corpus_preprocessed.txt]
	Data size: 24106
[Dataset] 1 Dataset Load Done.
{u'relation_file': u'./data/WikiQA/relation_test.txt', u'vocab_size': 3314, u'embed_size': 1, u'target_mode': u'ranking', u'input_type': u'Triletter_ListGenerator', u'batch_list': 10, u'text1_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'text2_corpus': u'./data/WikiQA/corpus_preprocessed.txt', u'word_triletter_map_file': u'./data/WikiQA/word_triletter_map.txt', u'dtype': u'dssm', u'phase': u'PREDICT', 'embed': array([[-0.18291523],
       [-0.00574826],
       [-0.13887608],
       ..., 
       [-0.17844775],
       [-0.1465386 ],
       [-0.13503003]], dtype=float32)}
[./data/WikiQA/relation_test.txt]
	Instance size: 2341
List Instance Count: 237
[Triletter_ListGenerator] init done
[DSSM] init done
[layer]: Input	[shape]: [None, 3314] 
 [Memory] Total Memory Use: 169.1758 MB 	 Resident: 173236 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: Input	[shape]: [None, 3314] 
 [Memory] Total Memory Use: 169.1758 MB 	 Resident: 173236 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: MLP	[shape]: [None, 300] 
 [Memory] Total Memory Use: 169.5430 MB 	 Resident: 173612 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: MLP	[shape]: [None, 300] 
 [Memory] Total Memory Use: 169.5430 MB 	 Resident: 173612 Shared: 0 UnshareData: 0 UnshareStack: 0 
[layer]: Dot	[shape]: [None, 1] 
 [Memory] Total Memory Use: 169.7461 MB 	 Resident: 173820 Shared: 0 UnshareData: 0 UnshareStack: 0 
[05-20-2018 12:08:16]	[Predict] @ predict [Predict] results:  ndcg@3=0.552114map=0.574365	ndcg@5=0.615588

Will check and report on other models

@aneesh-joshi
Copy link
Contributor

@mandroid6
It works for DSSM, but the outputs I have pasted above are for aNMM, which is what @bwanglzu has pasted as well.
And they don't add up

@bwanglzu
Copy link
Member

bwanglzu commented May 21, 2018

@changlinzhang
Interesting, but I don't think #58 is the problem for this specific issue. If you have read the source code, line 506 (in this PR) was sent to class ListGenerator_Feats,

class ListGenerator_Feats(ListBasicGenerator):

which has nothing to do with DRMM model:

class DRMM_ListGenerator(ListBasicGenerator):

The problem probably comes from two places:

  1. DRMM_ListGenerator.
  2. models/DRMM.py.

Previously I was thinking the reason is loss function until I realized every model is using so-called rank_hinge_loss.

Probably it's because of this line in models/DRMM.py

out_ = Dot( axes= [1, 1])([z, q_w])

since we're expecting cosine similarity between two vectors which was also mentioned in the paper.
Can anyone try to change this line to:

out_ = Dot(axes=[1, 1], normalize=True)([z, q_w])

and run the script again? I was not able to use my dev computer at the moment.

@aneesh-joshi
Copy link
Contributor

I reverted to commit e564565
and did some bench marking on my system.
Here are my results

image

@yangliuy
Copy link
Member

@aneesh-joshi Thank you for providing the results! Your results are very close to the results in the readme file. So this double confirmed my guess before. There are some code changes which introduced bugs into MatchZoo in the recent 5 months. I will discuss with @faneshion on this. I think he will reply to you soon. He is quite busy these days :)

@aneesh-joshi
Copy link
Contributor

Here is a better side by side comparison:

        MZ     MZ   MZ    
  Time taken(s) Size (MB) MAP MAP ndcg@1 ndcg@3 NDCG@3 ndcg@5 NDCG@5 ndcg@10 ndcg@20
DSSM 499 178.4 0.569328 0.5647 0.392405   0.5439 0.618261 0.6134 0.666015 0.676272
CDSSM 1094.6 190 0.403474 0.5593 0.202532 0.381 0.5489 0.44646 0.6084 0.513308 0.540939
ARCI 369.6 223 0.586587 0.587 0.443038 0.559602 0.568 0.630911 0.6317 0.669301 0.686561
ARCII 1429 223 0.378296 0.5845 0.151899 0.331383 0.5647 0.4498 0.6176 0.502118 0.523973
MVLSTM 1550.6 239.6 0.626253 0.5988 0.49789 0.608242 0.5818 0.671299 0.6452 0.710842 0.719987
DRMM 113 278 0.605541 0.6195 0.447257 0.598329 0.6107 0.646328 0.6621 0.688525 0.703999
K NRM       0.6256     0.6268   0.6693    
ANMM 104.4 258.4 0.617191 0.6297 0.468354 0.598831 0.616 0.647311 0.6696 0.702035 0.711173
DUET 435.7 225.3 0.630124 0.6301 0.472574 0.625243 0.6065 0.676017 0.6722 0.711811 0.723448
Match Pyramid 2252 439.8 0.642877 0.6434 0.476793 0.640863 0.6317 0.685916 0.6913 0.725052 0.732838
DRMM_TKS 509.2 439.8 0.648083 0.6586 0.49789 0.635046 0.6458 0.684256 0.6956 0.724096 0.734201
CDSSM_WORD 256 223.3 0.535876   0.345992     0.588656   0.629348 0.648953

@aneesh-joshi
Copy link
Contributor

some of the models do a lot worse than expected.
Also, there are some gaps in my code due to some missing models from that commit.

@yangliuy
Copy link
Member

@aneesh-joshi Have you optimized the hyper-parameters of the models ? Did you run the model using the default settings ? According to your results, most performances are matched with the reported metrics in our readme file. Only CDSSM and ARC-II have gaps. You need to optimize the hyper-parameters with the validation data of WikiQA.

@aneesh-joshi
Copy link
Contributor

@yangliuy
This was run with the default config provided for each model.
I assumed it was already optimized.

@bwanglzu
Copy link
Member

@millanbatra @aneesh-joshi Can you try again?

@aneesh-joshi
Copy link
Contributor

@bwanglzu
Will run it soon and let you know.

@aneesh-joshi
Copy link
Contributor

As mentioned in #106
I used my own script to evaluate the results and got the following:

Method map ndcg@1 ndcg@3 ndcg@5 ndcg@10 ndcg@20
anmm 0.60231 0.476793 0.617116 0.659008 0.710042 0.71779
arci 0.568631 0.451477 0.563715 0.637976 0.672469 0.690647
cdssm 0.523825 0.341772 0.54126 0.596421 0.638492 0.656304
conv_knrm_ranking 0.569223 0.43038 0.587006 0.636489 0.684447 0.692889
drmm 0.587125 0.447257 0.597009 0.646743 0.692769 0.706267
drmm_tks 0.631183 0.506329 0.641376 0.690293 0.727095 0.73819
dssm 0.526801 0.341772 0.537857 0.597434 0.645009 0.656767
duet 0.592516 0.451477 0.608629 0.657476 0.701125 0.711556
knrm_ranking 0.501507 0.35865 0.499058 0.559678 0.619065 0.636557
matchpyramid 0.605238 0.447257 0.62509 0.676466 0.714582 0.721466
mvlstm 0.585831 0.459916 0.597372 0.648699 0.692848 0.705998

I couldn't add arcii since the predict.test.wikiqa.txt gives nan

I will also paste the MZ reported values soon.

@bwanglzu
Copy link
Member

I just compared your result with the benchmark in the readme, the results seem to be reasonable. Let's focus on #147 and I'll close this one, thanks for everyone's effort! especially @aneesh-joshi !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants