-
Notifications
You must be signed in to change notification settings - Fork 1
/
info.json
22 lines (22 loc) · 1.7 KB
/
info.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"abstract": "We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than $d$ hops to arrive, where $d$ is a delay parameter. We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with $K$ actions and $N$ agents the average per-agent regret after $T$ rounds is at most of order $\\sqrt{\\bigl(d+1 + \\tfrac{K}{N}\\alpha_{\\le d}\\bigr)(T\\ln K)}$, where $\\alpha_{\\le d}$ is the independence number of the $d$-th power of the communication graph $G$. We then show that for any connected graph, for $d=\\sqrt{K}$ the regret bound is $K^{1/4}\\sqrt{T}$, strictly better than the minimax regret $\\sqrt{KT}$ for noncooperating agents. More informed choices of $d$ lead to bounds which are arbitrarily close to the full information minimax regret $\\sqrt{T\\ln K}$ when $G$ is dense. When $G$ has sparse components, we show that a variant of Exp3-Coop, allowing agents to choose their parameters according to their centrality in $G$, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay.",
"authors": [
"Nicol{{\\`o}} Cesa-Bianchi",
"Claudio Gentile",
"Yishay Mansour"
],
"emails": [
"nicolo.cesa-bianchi@unimi.it",
"cla.gentile@gmail.com",
"mansour@tau.ac.il"
],
"id": "17-631",
"issue": 17,
"pages": [
1,
38
],
"title": "Delay and Cooperation in Nonstochastic Bandits",
"volume": 20,
"year": 2019
}