Bandits #35

kasia-kobalczyk · 2022-01-16T16:53:34Z

This refers to issue #26.

I implemented two simple bandits:

Explore First N strategy
Epsilon-greedy strategy

The idea is that all bandits will inherit the basic properties from the MultiArmedBandit class. This class itself will never be used on its own, so it should be a sort of "abstract" class which I don't know how to go about in R. I would appreciate your review and alternative ideas.

codecov-commenter · 2022-01-16T16:58:48Z

Codecov Report

Merging #35 (d0ee252) into master (f16c1f1) will decrease coverage by 16.13%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           master      #35       +/-   ##
===========================================
- Coverage   96.55%   80.41%   -16.14%     
===========================================
  Files           3        3               
  Lines         319      383       +64     
===========================================
  Hits          308      308               
- Misses         11       75       +64

Impacted Files	Coverage Δ
R/statistics.R	`72.94% <0.00%> (-24.45%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f16c1f1...d0ee252. Read the comment docs.

THargreaves

Overall, nice code with a few suggested changes for clarity. Can you also write some test cases please?

THargreaves · 2022-01-17T18:33:51Z

R/statistics.R

+        #' @param T number of rounds / draws
+        #'
+        #' @return The new \code{MultiArmedBandit} (invisibly)
+        initialize = function(K, T, N = NULL) {


Would it make more sense to have the initialize method raise an error saying that it's abstract and then use a private method to run the code currently in initialize that gets called by the child?

Makes sense.

THargreaves · 2022-01-17T18:36:03Z

R/statistics.R

+        #'
+        #'
+        #' @return The updated code{MultiArmedBandit} (invisibly)
+        update = function(x, k) {


If k comes from the previous value, would it be cleaner to remember that internally so that k can be NULL be default but specified if the user ignore the streamer's suggestion?

I like this suggestion.

THargreaves · 2022-01-17T18:37:04Z

R/statistics.R

+        #'
+        #'
+        #' @return list with summary of the state of the Bandit.
+        value = function() {


I think we said that value would just return the next arm to pull. summary can be used for the full state.

THargreaves · 2022-01-17T18:38:23Z

R/statistics.R

+        #' @return The new \code{ExploreFristNBandit} (invisibly)
+        initialize = function(K, T, N = NULL) {
+            super$initialize(K, T)
+            private$state <- "exploration"


Can this be a Boolean?

So instead of state = "exploration" or state = "exploitation" you are suggesting a variable exploration taking values TRUE or FALSE ?

Exactly. Less potential for typos.

THargreaves · 2022-01-17T18:40:42Z

R/statistics.R

+            }
+            if (T < N * K) {
+                stop(sprintf(
+                    "More draws are required for the values of


I think this error could be clearer. Something like "More exploration draws are specified than the total number of rounds"

THargreaves · 2022-01-17T18:42:22Z

R/statistics.R

+            super$update(x, k)
+            if (runif(1) < private$epsilon[private$n_observed]) {
+                private$state <- "exploration"
+                k <- floor(runif(1, 1, K + 1))


Use sample

THargreaves · 2022-01-17T18:45:54Z

R/statistics.R

+        #'
+        #' @param K the number of arms
+        #' @param T number of rounds / draws
+        #' @param epsilon vector of length T specifying the exploration


Typically, epsilon greedy has a scalar epsilon held constant over the game (Sutton, R. S. & Barto, A. G. 1998 Reinforcement learning). I think something this general doesn't really need to exist. The closest thing is epsilon decreasing where epsilons form a geometric series.

I relied on this: https://arxiv.org/pdf/1904.07272.pdf, see Algorithm 1.2, epsilon can change in time. Perhaps we can leave it as it is and just add the possibility for epsilon to be a constant value.

Fair enough about the source but for the purpose of online algorithms, specifying the whole epsilon doesn't make much sense. I think sticking with a fixed epsilon of a geometric series is more in-line with the packages vision.

Related to this, does it make sense that these games have finite time horizons? Shouldn't the point be that you can steam indefinitely?

I agree + was also not convinced by the fine time T appearing in those algorithms. I will adjust them for infinite sampling.

THargreaves · 2022-01-17T18:46:25Z

R/statistics.R

+#'
+#' @export
+#' @format An \code{\link{R6Class}} generator object
+ExploreFristNBandit <- R6::R6Class(


Typo + this is typically called epsilon-first

Again, I relied on the Introduction to Multi-Armed Bandits by Aleksandrs Slivkin, where they call it an Explore-First algorithm with a parameter N.

Okay, perhaps just ExploreFirst then. I think the N is evident from the init function.

kasia-kobalczyk · 2022-01-17T19:43:40Z

Thanks for the review. It was only a draft pull request to see if I am heading in the right direction. The tests will follow soon.

kasia-kobalczyk added 2 commits January 16, 2022 16:13

Add first two simple bandits

e62b197

Restructure multiarmed bandits.

d0ee252

kasia-kobalczyk requested a review from THargreaves January 16, 2022 16:53

THargreaves requested changes Jan 17, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bandits #35

Bandits #35

kasia-kobalczyk commented Jan 16, 2022

codecov-commenter commented Jan 16, 2022

THargreaves left a comment

THargreaves Jan 17, 2022

kasia-kobalczyk Jan 17, 2022

THargreaves Jan 17, 2022

kasia-kobalczyk Jan 17, 2022

THargreaves Jan 17, 2022

THargreaves Jan 17, 2022

kasia-kobalczyk Jan 17, 2022

THargreaves Jan 19, 2022

THargreaves Jan 17, 2022

THargreaves Jan 17, 2022

THargreaves Jan 17, 2022

kasia-kobalczyk Jan 17, 2022

THargreaves Jan 19, 2022

kasia-kobalczyk Jan 20, 2022

THargreaves Jan 17, 2022

kasia-kobalczyk Jan 17, 2022

THargreaves Jan 19, 2022

kasia-kobalczyk commented Jan 17, 2022

Bandits #35

Are you sure you want to change the base?

Bandits #35

Conversation

kasia-kobalczyk commented Jan 16, 2022

codecov-commenter commented Jan 16, 2022

Codecov Report

THargreaves left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kasia-kobalczyk commented Jan 17, 2022