Weighted Ordinal Bandit for Streaming Recommender System(WONDERS)

TEAM WONDERS.

Youngin Kwon, Yeongho Lee, Yejin Kim, Yeongjin Ko, Gi-Soo Kim

🚩 Competition info

🏷️ Name

2021 Artificial Intelligence Challengers Program (AICP)

🔍 Objective

Developed a news article recommendation bandit algorithm that adapts to users' changing preferences

⏱️ Timeline

Mar, 2021 - Dec 2021

Process

1. Research on the contextual multi-armed bandit

Contextual multi-armed bandit (CMAB) : Unlike traditional multi-armed bandit algorithm, each choice (arm) has a corresponding context ($x_{t,i}$). For example, a news article's classification, title, date, etc.[pdf]
LinUCB : An algorithm that assumes a linear model of each arm's reward expectation with a given context vector and makes decisions based on an upper confidence bound (UCB) on each arm's reward expectation.[pdf]
Thompson Sampling (LinTS) : An algorithm that assumes that each arm's reward expectation is a linear model with a given context vector, such as LinUCB, but makes decisions using the reward expectation of each arm obtained by sampling parameters from a posterior distribution estimated at each point in time by assuming that the parameters are random variables.[pdf]

2. Choosing Data

R6B - Yahoo! Front Page Today Module User Click Log Dataset, version 2.0 (300 MB) [url]
Data is collected on whether a user with a random feature (displayed_arm) clicks on a news article (reward=1) or not (reward=0) with equal probability on a randomly displayed news article (displayed_arm) from a given set of news articles (pool) at a given point in time on the Yahoo front page.

3. Proposed Algorithm

weight $0 \leq w_t \leq 1$ : Define queues as large as window size ($w$) to reflect changing preferences. Queue only news articles with positive rewards to learn which news articles users prefer. The average value of the cosine distance (1-cosine similarity ($S_c$)) between the selected news article and the queued favourite news article is used as a weighting factor, with higher weighting if the selected news article is different from the favourite news article
Learn the Weighted Least Squares method for weighting LinUCB and LinTS : $\hat{\theta}=\underset{\theta}{\arg\min} \sum_{i=1}^Nw_i(y_i-x_i'\theta)^2$ $\Rightarrow$ $\hat{\theta}_{WLS}=(X^TWX)^{-1}X^TWY$
Performance issues with the Thompson Sampling method : As the dimensionality of the estimated parameters for the bandit algorithm increased, the performance of the Thompson Sampling algorithm, which samples by estimating the posterior distribution, degraded. Estimates the distribution of parameters for the selected arm, taking advantage of the fact that the gram matrix $(X^TX)$ is block-diagonal when computing estimates due to the nature of the data. The inverse transform sampling method is used to sample parameters based on distributions, and the cholesky decomposition of the covariance matrix $\Sigma$ for the selected arm is used to improve sampling speed by storing the decomposed matrix separately.

4. Implement Algorithm

We propose a LinUCB/LinTS-based weighted ordinal bandit (WO Bandit) with weights that reflect users' changing preferences.

5. Result

The two suggestion algorithms generated up to 7.7% more cumulative clicks than the rest of the existing algorithms.
Performance is best when window size is small to give less consideration to recent positive news articles.
uniform: algorithm for randomly extracting news articles from a given set of news articles with uniform distribution
ucb_0.5: LinUCB (hyperparameter $\alpha=0.5$) algorithm
ts_1.5: LinTS (hyperparmeter $\epsilon=1.5$) algorithm
(proposed) wlsucb_0.5_10: Weighted LinUCB (hyperparameter $\alpha=0.5, w=10$) algorithm
(proposed) wlsts_1.5_10: Weighted LinTS (hyperparameter $\epsilon=1.5, w=10$) algorithm

Conclusion & Discussion

Understand the LinUCB and LinTS algorithms, two of the most representative algorithms for CMAB problems using user information (context).
Solve the problem of sampling degradation in the Thompson sampling algorithm as the dimensionality of the context vector increases.
The absence of a theoretical analysis of the regret involved in the bandit algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weighted Ordinal Bandit for Streaming Recommender System(WONDERS)

🚩 Competition info

🏷️ Name

🔍 Objective

⏱️ Timeline

Process

1. Research on the contextual multi-armed bandit

2. Choosing Data

3. Proposed Algorithm

4. Implement Algorithm

5. Result

Conclusion & Discussion

About

Releases

Packages

YeongHo-Lee/Weighted-Ordinal-Bandit-for-Streaming-Recommender-System

Folders and files

Latest commit

History

Repository files navigation

Weighted Ordinal Bandit for Streaming Recommender System(WONDERS)

🚩 Competition info

🏷️ ​Name

🔍 Objective

⏱️ Timeline

Process

1. Research on the contextual multi-armed bandit

2. Choosing Data

3. Proposed Algorithm

4. Implement Algorithm

5. Result

Conclusion & Discussion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

🏷️ Name

Packages