You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the slider images have f(x) in y-axis and remove the gold content references. They make sense for the examples but not for the summary. Also no need to then mention Ground Truth for ..Also, the x-axis should be "x" and not "X"
In next slide mention, this could be trivial if f(x) was cheap to evaluate, but, often it is expensive to evaluate, like, amount of gold in a particular location, or accuracy for a set of hyper-parameters of a machine learning model.
In slide you mention f(x*) in fewest evaluations, instead write: Objective: find the maxima f(x*) in few evaluations as sampling is expensive.
Remove the word constraints in the next slide. And just have the 2 enumerated points (second one without the mention of ground truth)
In next slide and other slides, remove GT references from legend also
Use a surroage function ... remove the comma after prior
Next slide, use let us instead of let's and write GP in full
Next slide - write functional observation (an observation from f(x))
The Big Question - where to sample next to quickly find the maxima
No comma after One,
The next chosen point to observe is the one that maximises the probability of improvement over the current maximum (write this for the Choose point that maxim..)
In meain article, when you introduce Gaussian Processes, write as Gaussian Processes (GPs)
In active learning procedure, "automate" replace with "simulate"
Old text
Given the fact that we are only interested in knowing the location where the maximum occurs. It might be a good idea to evaluate at locations where our surrogate model's prediction mean is the highest, i.e. to exploit. But unfortunately, our mean is not always accurate, so we need to correct our mean which can be done by reducing variance or exploration. BO looks at both exploitation and exploration, whereas in the case of Active Learning Problem, we only cared about exploration.
New text
Given the fact that we are only interested in knowing the location where the maximum occurs, it might be a good idea to evaluate at locations where our surrogate model's prediction mean is the highest, i.e. to exploit. But unfortunately, our model mean is not always accurate (since we have limited observations), so we need to correct our model, which can be done by reducing variance or exploration. BO looks at both exploitation and exploration, whereas in the case of active learning, we only cared about exploration.
Acquisition Functions
Text should be:
We just discussed that our original optimisation problem (equation) is hard given the expensive nature of evaluating f. The key idea of BO is to transform this original difficult optimisation into a sequence of easier inexpensive optimisations called an acquisition function (alpha(x)). Each of these sequence of easier inexpensive optimisations involves finding the next point to sample. Thus, we can interpret the acquisition function as commensurate
with how desirable evaluating f at x is expected to be for the maximisation problem [CITE: https://www.cse.wustl.edu/~garnett/cse515t/spring_2015/files/lecture_notes/12.pdf]
While we have just now discussed that our goal is to transform the original optimisation into a sequence of easier optimisation, where is the "Bayesian" in this optimisation, and how is the acquisition function related? Let us re-wind and go back to our surrogate model and build the link between all the things we have discussed thus far, by noting the steps of BO [CITE: https://www.youtube.com/watch?list=PLZ_xn3EIbxZHoq8A3-2F4_rLyy61vkEpU&v=EnXxO3BAgYk]:
Choose a surrogate model and its prior over space of objectives f
Given the set of observations (function sampling), use Bayes rule to obtain the posterior
Use an acquisition function (alpha(x)), which is a function of the posterior to decide where to sample next (x_t = argmax()..)
Add new sampled data to the set of observations and Goto Step TODO Master issue #2 till convergence or budget elapses
We now have three core ideas associated with acquisition functions: i) they are a function of the surrogate posterior; ii) they combine exploration and exploitation; and iii) they are inexpensive to evaluate. Let us now look into a few examples of commonly used acquisition functions to understand the concept better.
Remove the following text
Let us understand this concept in two cases:
We have two points of similar means (of function values (gold in our case)). We now want to choose one of these to obtain the labels or values. We will choose the one with higher variance. This basically says that given the same exploitability, we choose the one with higher exploration value.
We have two points having the same variance. We would now choose the point with the higher mean. This basically says that given the same explorability, we will choose the one with higher exploitation value.
Remove the text "hero plot" and instead write below plot
In Intuition behind E -> change spread out sigma to symbol sigma
Everywhere except the title - change Active Learning to active learning
SVM example remove the GIFs for random and GP-UCB. Also, mention the optimal <C, gamma> found via grid search and via EI and PI.
dimentions --> dimensions
The text was updated successfully, but these errors were encountered:
Next and previous are not very clearly visible in the hero slides. use https://www.jssor.com/demos/image-slider.slider
In the slider images have f(x) in y-axis and remove the gold content references. They make sense for the examples but not for the summary. Also no need to then mention Ground Truth for ..Also, the x-axis should be "x" and not "X"
In next slide mention, this could be trivial if f(x) was cheap to evaluate, but, often it is expensive to evaluate, like, amount of gold in a particular location, or accuracy for a set of hyper-parameters of a machine learning model.
In slide you mention f(x*) in fewest evaluations, instead write: Objective: find the maxima f(x*) in few evaluations as sampling is expensive.
Remove the word constraints in the next slide. And just have the 2 enumerated points (second one without the mention of ground truth)
In next slide and other slides, remove GT references from legend also
Use a surroage function ... remove the comma after prior
Next slide, use let us instead of let's and write GP in full
Next slide - write functional observation (an observation from f(x))
The Big Question - where to sample next to quickly find the maxima
No comma after One,
The next chosen point to observe is the one that maximises the probability of improvement over the current maximum (write this for the Choose point that maxim..)
In meain article, when you introduce Gaussian Processes, write as Gaussian Processes (GPs)
In active learning procedure, "automate" replace with "simulate"
Old text
Given the fact that we are only interested in knowing the location where the maximum occurs. It might be a good idea to evaluate at locations where our surrogate model's prediction mean is the highest, i.e. to exploit. But unfortunately, our mean is not always accurate, so we need to correct our mean which can be done by reducing variance or exploration. BO looks at both exploitation and exploration, whereas in the case of Active Learning Problem, we only cared about exploration.
New text
Given the fact that we are only interested in knowing the location where the maximum occurs, it might be a good idea to evaluate at locations where our surrogate model's prediction mean is the highest, i.e. to exploit. But unfortunately, our model mean is not always accurate (since we have limited observations), so we need to correct our model, which can be done by reducing variance or exploration. BO looks at both exploitation and exploration, whereas in the case of active learning, we only cared about exploration.
Text should be:
We just discussed that our original optimisation problem (equation) is hard given the expensive nature of evaluating f. The key idea of BO is to transform this original difficult optimisation into a sequence of easier inexpensive optimisations called an acquisition function (alpha(x)). Each of these sequence of easier inexpensive optimisations involves finding the next point to sample. Thus, we can interpret the acquisition function as commensurate
with how desirable evaluating f at x is expected to be for the maximisation problem [CITE: https://www.cse.wustl.edu/~garnett/cse515t/spring_2015/files/lecture_notes/12.pdf]
While we have just now discussed that our goal is to transform the original optimisation into a sequence of easier optimisation, where is the "Bayesian" in this optimisation, and how is the acquisition function related? Let us re-wind and go back to our surrogate model and build the link between all the things we have discussed thus far, by noting the steps of BO [CITE: https://www.youtube.com/watch?list=PLZ_xn3EIbxZHoq8A3-2F4_rLyy61vkEpU&v=EnXxO3BAgYk]:
We now have three core ideas associated with acquisition functions: i) they are a function of the surrogate posterior; ii) they combine exploration and exploitation; and iii) they are inexpensive to evaluate. Let us now look into a few examples of commonly used acquisition functions to understand the concept better.
Let us understand this concept in two cases:
We have two points of similar means (of function values (gold in our case)). We now want to choose one of these to obtain the labels or values. We will choose the one with higher variance. This basically says that given the same exploitability, we choose the one with higher exploration value.
We have two points having the same variance. We would now choose the point with the higher mean. This basically says that given the same explorability, we will choose the one with higher exploitation value.
Remove the text "hero plot" and instead write below plot
In Intuition behind E -> change spread out sigma to symbol sigma
Everywhere except the title - change Active Learning to active learning
SVM example remove the GIFs for random and GP-UCB. Also, mention the optimal <C, gamma> found via grid search and via EI and PI.
dimentions --> dimensions
The text was updated successfully, but these errors were encountered: