User Guide

Charles-Philippe Clermont edited this page Feb 25, 2015 · 14 revisions
Clone this wiki locally

This guide explains the ins-and-outs of using bestcase to run A/B and multivariate tests.

You should take a look at the API Docs and the Examples Page, the latter one of which puts everything together in the context of a simple webapp with multi-page A/B tests. Yum!

Table of Contents

  • Getting Started
  • Configuration Options
  • Running An A/B Test
  • Tracking Conversions
  • Results
  • Ending Tests
  • Statistical Significance
  • Keeping Track of Identity
  • Adding Weights To Alternatives
  • Multiple Participation
  • Using The Web Dashboard
  • Bot Filtering
  • Testing Bestcase
  • Creating Your Own Store
  • Performance
  • Limitations
  • Contributions Welcome

Getting Started

Add a leiningen dependency to your project.clj file

[com.cpclermont/bestcase "0.2.1"]

Specifying the store to use to keep track of tests

Bestcase comes with two stores:

  • the in-memory store which gets wiped every time you restart clojure or your webserver; and
  • a Redis store.

In production, you probably want to use the Redis store. Because 100% accuracy of A/B testing data is not generally mission critical and to preserve performance, you can use RDB with a relaxed snapshotting policy. AOF also does well. As always, if performance if a key concern for you, benchmark under expected load before making a choice.

Before running any tests, you need to tell bestcase which storage mechanism to use. You can do so globally using (set-config! ...) and passing it a map with a :store key.

(ns your-app
  (:require [bestcase.core :as bc]
            [bestcase.store.redis :as bcr]
            [bestcase.store.memory :as bcm]))

;; in-memory store
(bc/set-config! {:store (bcm/create-memory-store)})

;; redis store 
(def conn-opts {:pool {} :spec {:uri "redis://..."}}) ;; see goo.gl/nwQ0A7
(bc/set-config! {:store (bcr/create-redis-store conn-opts)}) 

Carmine is a great clojure client for Redis.

If you don't want a global store, but would rather establish the configuration dynamically (or if you want to use a different store for a subset of operations), you can use the (with-bestcase-config config & body) macro.

(bc/with-bestcase-config {:store (bcm/create-memory-store)}
  ;; do stuff here
)

Advanced Configuration Options

The config map can take the following options:

  • :store as described above
  • :throw-exceptions true/false (defaults to false)

:throw-exceptions defaults to false, meaning that most (but not all!) errors will result in the called function returning nil instead of throwing an exception. That being said, if you want to be 100% sure that bestcase will not throw an exception on backend failure, you should follow the usual jvm route of wrapping invocations in (try ... (catch ...)).

Running An A/B Test

A/B tests are defined through the (alt test-name alternatives) macro and should be run from within the (with-identity id & body) macro.

(bc/with-identity "alice"
  (bc/alt :test-name
       :alternative-1 "some value"
       :alternative-2 "another value"
       ...))

The first time (alt ...) is called for a particular identity, (alt ...) randomly picks an alternative for that identity (in this case, :alternative-1 => "some value" or :alternative-2 => "another value") and returns the value for that alternative ("some value" or "another value"). On subsequent calls for that same identity, it will always return the same alternative's value.

You can have as many alternatives as you want (i.e., you can use bestcase to run multivariate tests). Values can be any clojure list, map, set, string, or other Clojure rich data type.

There are a couple of things to watch out for:

  • the alt macro evaluates the value picked for a particular alternative each and every time that alternative is chosen;
  • test, alternative, and goal names (see Tracking Conversions hereunder) should always be keywords;
  • you should never use the same name for two tests, even if one of the two tests has already been ended (see hereunder);
  • bestcase defaults to tracking a particular test participant only once, so if "alice" reloads the same page 500 times she will be counted as only 1 participant; and
  • the (with-identity ...) macro is used to set the identity of the user so that the same user is always shown the same A/B test alternative.

I will consider writing a "memoized" version of (alt ...) that would only evaluate the value of a particular alternative only once, if I get enough requests. Email me if this is something you need.

Tracking Conversions

Here is how you track goals / conversions:

(bc/with-identity "alice"
  (bc/score :test-name :goal-name))

Like (alt ...), (score test-name goal-name) should be used in a (with-identity ...) block.

You can have multiple, statistically independent, goals for a test. For example, you can track whether a new button color increases purchases and/or signups.

Results

You can see results as a map using (results test-name control-alternative-name):

(bc/results :purchase-button-test :red-button)
;; => {:test-name :purchase-button-test
;;     :test-type :ab-test
;;     :alternatives ({:alternative-name :red-button
;;                     :count 182
;;                     :control true
;;                     :goal-results ({:goal-name :purchase
;;                                     :score 35
;;                                     :z-score: 0.0})}
;;                    {:alternative-name :green-button
;;                     :count 180
;;                     :goal-results ({:goal-name :purchase
;;                                     :score 45
;;                                     :z-score: 1.3252611961151077})}
;;                    {:alternative-name :blue-button
;;                     :count 188
;;                     :goal-results ({:goal-name :purchase
;;                                     :score 61
;;                                     :z-score: 2.941015722492861})})}

Alternatively, you can use [bestcase.util.pretty-print]'s (result->string-seq ...) to convert this map into a sequence of strings:

(ns your-app
  (:require [bestcase.util.pretty-print :as bcpp]))

(map println (bcpp/result->string-seq (bc/results :purchase-button-test :red-button)))
;; => Test Name: :purchase-button-test
;;    Test Type: AB
;;      :blue-button (188 trials)
;;        :purchase (61 or 32.446808%) [2.941015722492861 99%] *** very confident
;;      :green-button (180 trials)
;;        :purchase (45 or 25.0%) [1.3252611961151077 90%] * fairly confident
;;      :red-button (182 trials) CONTROL
;;        :purchase (35 or 19.23077%) [0.0] not yet confident

Finally, you can use the web dashboard to see results. See Using The Web Dashboard hereunder.

Ending Tests

There are three ways to end tests:

  1. programmatically using (end test-name winning-alternative-name);
  2. using the web dashboard by selecting an alternative as the winner; or
  3. by deleting the code and hard-coding the winner.

Statistical Significance

Today, bestcase only uses one statistical test to determine significance, the one-tail z-score test.

To the extent you need and want more, you should be able to easily roll-your-own from the raw data of the (results ...) function. Also, don't hesitate to contact charles@cpclermont.com with requests for more features on the statistics / visualization front. I'll do my best to FIFO your request into my work-stream.

Keeping Track of Identity

A/B testing requires that you assign each user a unique identifier. In bestcase, the identifier is simply a unique string.

There are many ways to pick a unique identifier:

  • create a random string and put it in a cookie (but this means the same user on different machines will be subject to different tests); or
  • use unique usernames or userids (but this won't allow you to track conversions from a random visitor to a registered user).

Bestcase comes with a ring middleware generator (identity-middleware-wrapper ...) that can cover both cases and wraps the request in the appropriate (with-identity...) macro; see the bestcase.util.ring package.

Important: This middleware requires you to use ring's sessions middleware for cookies to be set correct. See the Examples Page for more details.

If you want to carry-over experiments from unregistered users to registered users, you'll need to provide you own id-fn. Here's an example:

(defn create-new-user 
  [request username]
  (let [user (db/create-new-user username)
        bestcase-id (let-if [b-id (get-in request [:session :bestcase])] b-id (str (UUID/randomUUID)))]
    (db/update-user-bestcase-id (:user-id user) bestcase-id)
    user))        

(defn your-custom-identity-fn
  [request]
  (let-if [user (get-user-from-request request)]
    (:bestcase-id user)
    (let-if [b-id (get-in request [:session :bestcase])] b-id (str (UUID/randomUUID)))))

Adding Weights To Alternatives

By default, bestcase evenly distributes participants across all alternatives for a test. However, you may want to have only 10% of users see a particular test, as opposed to all users. Bestcase allows you to do this by adding weights to the (alt ...) macro:

;; without weights / evenly distributed participants
(alt :test-name
     :alternative-1 "some value"
     :alternative-2 "another value"
     :alternative-3 "a third value")

;; with weights / unevenly distributed participants
(alt :test-name
     [:alternative-1 10] "a1"  ; will get 10% of participants
     [:alternative-2 60] "a2"  ; will get 60% of participants
     [:alternative-3 30] "a3") ; will get 30% of participants 

The weights do not have to add up to 100. They can add up to anything, it doesn't matter. Weights must be integers.

Warning: Adding weights to alternatives is every-so-slightly slower than the default evenly-weighed implementation.

Warning: Do not use zero (0) or negative weights.

Multiple Participation

As mentioned above, by default, bestcase only tracks participants in a test once. I.e., even if "alice" sees the same alternative a 100 times, it will count her as one test participant. Ditto for goals: however many times "alice" purchases something, she only counts as one conversion. 99% of the time, that's the way you'll want to run your tests. For the other 1%, both (alt ...) and (score ...) can take an option map with :multiple-participation true:

(alt :test-name 
     :alternative1 "alt1"
     :alternative2 "alt2"
     {:multiple-participation true})

(score :test-name :goal1 {:multiple-participation true})

Using The Web Dashboard

Bestcase's bestcase.util.ring namespace allows you to create a ring route for a basic dashboard that allows you to see results and choose winners for your active tests. You can easily add this dashboard to your webapp as follows (this assumes you are using Compojure, but it'll work with Noir and other ring-handler compliant Clojure frameworks):

(ns your-app
  (:require [bestcase.util.ring :as bcur]
            [compojure.core]))

(defroutes your-routes
   (GET "/" [] ...) ; your root route
   ...              ; your other routes go here
   (bcur/dashboard-routes "/bestcase" {}))

Just point your browser to /bestcase and voila. The option map can be used to customize the look and feel of the dashboard using css and javascript:

   (bcur/dashboard-routes "/bestcase" {:css ["/css/custom.css"]
                                       :js  ["/js/custom.js"]}))

If you want to create your own look-and-feel, you may want to look at the bestcase.util.ring namespace to see the html that is generated for the dashboard.

Bot Filtering

There are good arguments for excluding bots from A/B tests. By default, bestcase does not exclude bots.

Warning: Even if you enable the bot-detection, bots will see A/B tests, they just won't be counted in the results.

To enable user-agent based bot detection, set :simple-no-bots true in the option map passed to `(identity-middleware-wrapper ...). We detect bots by looking at the user-agent header in which case bestcase does not increment test counts and scores.

((bcur/identity-middleware-wrapper default-identity-fn {:simple-no-bots true}) your-webapp)

I will consider implementing A/Bingo's approach to excluding bots if there is enough interest.

Testing Bestcase

One of the problems with A/B testing is that... well... it makes it difficult for you to test your code in the browser since, by the very nature of A/B testing, it'll show you something random which you won't be able to change without resetting your cookies.

Much like A/Bingo and Vanity, two split-testing libraries for Rails, bestcase lets you cheat by using request parameters. Basically, if the request's params contain a key with the same name as one of your tests, bestcase will use the value of that key as the alternative to use for that test.

Thus, given the test:

(alt :test-name
     :alternative-1 "foo"
     :alternative-2 "bar")

appending ?test-name=alternative-1 to the URL in your browser will force (alt ...) to return "foo".

Obviously this functionality is dangerous and should only be used for testing. To enable it, add :easy-testing true to the config map of (identity-middleware-wrapper ...).

Finally, the bestcase.for-testing namespace contains some helpful testing functions for bestcase. You hopefully won't ever need them.

Creating Your Own Store

bestcase.core contains a Clojure protocol definition of a store, of which there are currently two implementations: the in-memory store and the Redis store.

If you define a type that implements the Store protocol, you can create you own store (e.g., Cassandra-backed).

Performance

Some non-rigorous and arbitrary performance non-benchmarks follow. Again, if every ms counts to you, benchmark first.

Redis Store

  • randomly picking one of four alternatives (evenly-distributed) 10,000 times; and
  • randomly picking one of four weighed alternatives 10,000 times

both take ~3500 msecs: 0.35 msecs / pick or 2857 picks / second

In-Memory Store

  • randomly picking one of four alternatives (evenly-distributed) 10,000 times; and
  • randomly picking one of four weighed alternatives 10,000 times

both take ~170 msecs: 0.017 msecs / pick or 58,823 picks / second

Obviously, Redis is to blame?! <3 Redis

These "performance tests" above were run on a MacBook Pro with a 2.7 GhZ Intel Core i7 processor and 16GP 1600 MHz DDR3 memory.

Limitations

  • The keywords for test, alternative, and goal names should never include the "|" (pipe) character.

Contributions Welcome

Contributions are super welcome (code, documentation, tests, personality, humour, spelling). Thanks in advance.