TEAM WONDERS.
Youngin Kwon, Yeongho Lee, Yejin Kim, Yeongjin Ko, Gi-Soo Kim
2021 Artificial Intelligence Challengers Program (AICP)
Developed a news article recommendation bandit algorithm that adapts to users' changing preferences
Mar, 2021 - Dec 2021
- Contextual multi-armed bandit (CMAB) : Unlike traditional multi-armed bandit algorithm, each choice (arm) has a corresponding context (
$x_{t,i}$ ). For example, a news article's classification, title, date, etc.[pdf] - LinUCB : An algorithm that assumes a linear model of each arm's reward expectation with a given context vector and makes decisions based on an upper confidence bound (UCB) on each arm's reward expectation.[pdf]
- Thompson Sampling (LinTS) : An algorithm that assumes that each arm's reward expectation is a linear model with a given context vector, such as LinUCB, but makes decisions using the reward expectation of each arm obtained by sampling parameters from a posterior distribution estimated at each point in time by assuming that the parameters are random variables.[pdf]
- R6B - Yahoo! Front Page Today Module User Click Log Dataset, version 2.0 (300 MB) [url]
- Data is collected on whether a user with a random feature (displayed_arm) clicks on a news article (reward=1) or not (reward=0) with equal probability on a randomly displayed news article (displayed_arm) from a given set of news articles (pool) at a given point in time on the Yahoo front page.
- weight
$0 \leq w_t \leq 1$ : Define queues as large as window size ($w$ ) to reflect changing preferences. Queue only news articles with positive rewards to learn which news articles users prefer. The average value of the cosine distance (1-cosine similarity ($S_c$ )) between the selected news article and the queued favourite news article is used as a weighting factor, with higher weighting if the selected news article is different from the favourite news article - Learn the Weighted Least Squares method for weighting LinUCB and LinTS :
$\hat{\theta}=\underset{\theta}{\arg\min} \sum_{i=1}^Nw_i(y_i-x_i'\theta)^2$ $\Rightarrow$ $\hat{\theta}_{WLS}=(X^TWX)^{-1}X^TWY$ - Performance issues with the Thompson Sampling method : As the dimensionality of the estimated parameters for the bandit algorithm increased, the performance of the Thompson Sampling algorithm, which samples by estimating the posterior distribution, degraded. Estimates the distribution of parameters for the selected arm, taking advantage of the fact that the gram matrix
$(X^TX)$ is block-diagonal when computing estimates due to the nature of the data. The inverse transform sampling method is used to sample parameters based on distributions, and the cholesky decomposition of the covariance matrix$\Sigma$ for the selected arm is used to improve sampling speed by storing the decomposed matrix separately.
- We propose a LinUCB/LinTS-based weighted ordinal bandit (WO Bandit) with weights that reflect users' changing preferences.
- The two suggestion algorithms generated up to 7.7% more cumulative clicks than the rest of the existing algorithms.
- Performance is best when window size is small to give less consideration to recent positive news articles.
- uniform: algorithm for randomly extracting news articles from a given set of news articles with uniform distribution
- ucb_0.5: LinUCB (hyperparameter
$\alpha=0.5$ ) algorithm - ts_1.5: LinTS (hyperparmeter
$\epsilon=1.5$ ) algorithm - (proposed) wlsucb_0.5_10: Weighted LinUCB (hyperparameter
$\alpha=0.5, w=10$ ) algorithm - (proposed) wlsts_1.5_10: Weighted LinTS (hyperparameter
$\epsilon=1.5, w=10$ ) algorithm
- Understand the LinUCB and LinTS algorithms, two of the most representative algorithms for CMAB problems using user information (context).
- Solve the problem of sampling degradation in the Thompson sampling algorithm as the dimensionality of the context vector increases.
- The absence of a theoretical analysis of the regret involved in the bandit algorithm.