Install with pip:
$ pip install manhattan
Manhattan is broken up into several conceptual modules. Those modules pass data as follows:
.. graphviz:: digraph architecture { rankdir=LR; node [shape=box]; Middleware -> Log -> Server; subgraph cluster_backend { label = "Backend"; Server -> "LRU Cache" -> SQL; SQL -> "LRU Cache" -> Server; } Server -> Client; Client -> Server; }
The Log component of a system can be interchanged, and a few different options are available for different deployment scenarios.
You can set up a typical WSGI application to use Manhattan in a few easy steps,
and be on your way to detailed analytics and easy AB testing. For the sake of
explanation we'll use a very basic 'Hello World' WSGI app. To begin, put the
following into a new file, e.g. manhattandemo.py
.:
from webob import Response from webob.dec import wsgify @wsgify def app(req): return Response('Hello') if __name__ == '__main__': from wsgiref.simple_server import make_server httpd = make_server('', 8000, app) print "Serving on 0.0.0.0:8000..." httpd.serve_forever()
Wrap the WSGI application in the ManhattanMiddleware
, setting up a
TimeRotatingLog
instance for the application to log events to.
The Manhattan frontend (middleware) is completely stateless, and simply logs events to a log instance for later reconciliation. This ensures that the additional in-band request latency incurred by adding Manhattan to an application is very small: typically less than 1ms.
The recommended log type for basic deployments is the TimeRotatingLog
,
which writes events as plaintext lines in a series of append-only files, with
one file per hour. There are other choices of logs for advanced deployments,
for more information see the manhattan.log
module.:
from manhattan.middleware import ManhattanMiddleware from manhattan.log.timerotating import TimeRotatingLog log = TimeRotatingLog('/tmp/manhattan.log') app = ManhattanMiddleware(app, log, secret='s3krit')
Try opening up http://localhost:8000 in your browser and visiting a few urls, e.g. http://localhost:8000/some-path. Then, look at the generated files, e.g.:
$ cat /tmp/manhattan.log.*
You should see log entries from the requests that you just generated.
The ManhattanMiddleware
places a key in the WSGI environ which acts as a
handle to perform testing operations. This handle is called the visitor
and
is an instance of manhattan.visitor.Visitor
. There are three types of
operations you can perform on this handle.
visitor.page(req)
- Record a page view, passing in areq
object that is an instance ofwebob.Request
. This event is recorded implicitly on every web request which uses the middleware, and does not need to be done by the wrapped application unless additional page view records are desired.visitor.pixel()
- Record that this visitor has requested a tracking pixel. This is used to exclude events from visitors which either don't request images or don't support cookies (both likely symptoms of a bot). This event is record implicitly by the middleware, and does not need to be done by the wrapped application.visitor.goal(name, value=None, value_type=None, value_format=None)
- Record a goal conversion, wherename
is a string describing the goal.value
and associated parameters are optional.visitor.split(test_name, populations=None)
- Perform a split test, record the population assigned to this visitor, and return it. In the most basic form, with nopopulations
argument specified, this just does a 50/50 AB test and returns True or False to indicate the assigned population.
For example, to record a goal conversion, we can modify our basic app like so:
@wsgify def app(req): visitor = req.environ['manhattan.visitor'] if req.path_info == '/pie': visitor.goal('pie accomplished') return Response('Hello')
After making this change, you should be able to visit http://localhost:8000/pie, and see an event recorded in the log for the corresponding goal conversion.
Recording a goal is not idempotent: if you call visitor.goal()
twice, two
goal conversions will be recorded for that visitor. Depending on the particular
analysis being performed, this may affect results.
Performing a split test is similar:
@wsgify def app(req): visitor = req.environ['manhattan.visitor'] if visitor.split('superior dessert preference'): s = 'cake' else: s = 'pie' return Response(s)
Visiting http://localhost:8000 should show either 'cake' or 'pie', and record the returned population in the event log.
Recording a split test is idempotent: for the same visitor and the same test,
the same population will always be returned, so you can make as many successive
calls to visitor.split()
as desired without affecting the results of the
split test.
As we've seen, all the frontend does is record events to a log. Although having
the log is useful, in order to do something with the data, we'll want to
aggregate it somehow. This is done by the Manhattan backend, using the
manhattan-server
executable.
The backend reconciles events from a log and aggregates the data in-memory, periodically flushing it to SQL in a denormalized format for result viewing. To launch the server, pass in a SQLAlchemy-friendly database connection URL and the log path used by the frontend.:
$ manhattan-server --path=/tmp/manhattan.log --url=sqlite:///test.db -v
The server will spawn two threads. One thread will begin reconciling the existing log events, and watch for new events to be recorded. The other thread will answer aggregate queries over a loopback zeromq connection.
To query the server, start:
$ manhattan-client
This will provide a python shell with a client
object. Try:
>>> client.count('pie accomplished')
You can also view conversion statistics for split test populations.:
>>> client.count('pie accomplished', variant=('superior dessert preference', 'True'))
You'll probably want to be able to query analytics results from within another
application. The same client
object is also available inside other python
processes with just:
from manhattan.client import Client client = Client()
For more sophisticated production analytics, check out the Advanced Usage section.