Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yue+, ICML'09 #197

AkihikoWatanabe · 2018-01-01T08:05:31Z

https://www.cs.cornell.edu/people/tj/publications/yue_joachims_09a.pdf

AkihikoWatanabe · 2018-01-01T08:06:04Z

online learning to rankに関する論文でよくreferされる論文

提案手法は、Dueling Bandit Gradient Descent(DBGD)と呼ばれる.

onlineでlearning to rankを行える手法で、現在の重みwとwをランダムな方向に動かした新たな重みw'を使って、予測を行い、duelを行う。
duelを行った結果、新たな重みw'の方が買ったら、重みwをその方向に学習率分更新するというシンプルな手法

duelのやり方は、詳しく書いてないからなんともよくわからなかったが、Interleavedなlist(二つのモデルのoutputを混合したリスト)などを作り、実際にユーザにリストを提示してユーザがどのアイテムをクリックしたかなどから勝敗の確率値を算出し利用する、といったやり方が、IRの分野では行われている。

onlineでユーザのフィードバックから直接モデルを学習したい場合などに用いられる。

offlineに持っているデータを使って、なんらかのmetricを計算してduelをするという使い方をしたかったのだが、その使い方はこの手法の本来の使い方ではない（単純に何らかのmetricに最適化するというのであれば目的関数が設計できるのでそっちの手法を使ったほうが良さそうだし）。
そもそもこの手法は単純にMetricとかで表現できないもの（ユーザの満足度とか）を満たすようなweightをexploration/exploitationを繰り返して見つけていこう、というような気持ちだと思われる。

AkihikoWatanabe added InformationRetrieval LearningToRank Online/Interactive labels Jan 1, 2018

AkihikoWatanabe mentioned this issue Jan 1, 2018

Reusing Historical Interaction Data for Faster Online Learning to Rank for IR, Hofmann+, WSDM'13 #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yue+, ICML'09 #197

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yue+, ICML'09 #197

AkihikoWatanabe commented Jan 1, 2018

AkihikoWatanabe commented Jan 1, 2018 •

edited

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yue+, ICML'09 #197

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yue+, ICML'09 #197

Comments

AkihikoWatanabe commented Jan 1, 2018

AkihikoWatanabe commented Jan 1, 2018 • edited

AkihikoWatanabe commented Jan 1, 2018 •

edited