-
Notifications
You must be signed in to change notification settings - Fork 0
/
info.json
16 lines (16 loc) · 1.64 KB
/
info.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"abstract": "The online multi-armed bandit problem and its generalizations are repeated decision making problems, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision in hindsight. The difference in these costs is known as the <i>regret</i> of the algorithm. The term <i>bandit</i> refers to the setting where one only obtains the cost of the decision used in a given iteration and no other information.\n<br>\nA very general form of this problem is the non-stochastic bandit linear optimization problem, where the set of decisions is a convex set in some Euclidean space, and the cost functions are linear. Only recently an efficient algorithm attaining <i>Õ(√T)</i> regret was discovered in this setting.\n<br>\nIn this paper we propose a new algorithm for the bandit linear optimization problem which obtains a tighter regret bound of <i>Õ(√Q)</i>, where <i>Q</i> is the total variation in the cost functions. This regret bound, previously conjectured to hold in the full information case, shows that it is possible to incur much less regret in a slowly changing environment even in the bandit setting. Our algorithm is efficient and applies several new ideas to bandit optimization such as reservoir sampling.",
"authors": [
"Elad Hazan",
"Satyen Kale"
],
"id": "hazan11a",
"issue": 35,
"pages": [
1287,
1311
],
"title": "Better Algorithms for Benign Bandits",
"volume": "12",
"year": "2011"
}