There are a number of different possible players that each seek to explore/exploit using different methods. Each can be configured in the bandit.yml file in the config directory under your Rails.root.
The UCB player calculates the best alternative based on the sum of its conversion rate and its calculated confidence interval. The player will automatically shift between an experimental phase and a non-experimental phase based on its confidence that it has found the optimal strategy. See the following blog post for more details: www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html
The epsilon greedy player selects the best alternative with a probability of 1 - epsilon, and selects uniformly among the other alternatives with a probability of epsilon. You can set the value of epsilon in the config file like this:
production: ... player: epsilon_greedy player_config: epsilon: 0.1 ...
The softmax player selects an an alternative with a probability based on its past success compared to the other alternatives. The temperature determines how likely the player is to explore less successful alternatives.
production: ... player: softmax player_config: temperature: 0.2 ...
This is included for testing purposes only. It will not optimize. There are no config options.
production: ... player: round_robin ...