Pylons Execution Analysis
By Mike Orr and Alfredo Deza
This chapter shows how Pylons calls your application, and how Pylons interacts with Paste, Routes, Mako, and its other dependencies. We'll create a simple application and then analyze the Python code executed starting from the moment we run the "paster serve" command.
Abbreviations: $APP is your top-level application directory.
$SP is the site-packages directory where Pylons is installed.
$BIN is the location of
paster and other executables. $SP paths are
shown in pip style ($SP/pylons) rather than easy_install style
The sample application
Create an application called "Analysis" with a controller called "main":
$ paster create -t pylons Analysis $ cd Analysis $ paster controller main
Press Enter at all question prompts.
Edit analysis/controllers/main.py to look like this:
from analysis.lib.base import BaseController class MainController(BaseController): def index(self): return '<h1>Welcome to the Analysis Demo</h1>Here is a <a href="/page2">link</a>.' def page2(self): return 'Thank you for using the Analysis Demo. <a href="/">Home</a>'
There are two shortcuts here which you would not use in a normal application. One, we're returning incomplete HTML documents. Two, we've hardcoded the URLs to make the analysis easier to follow, rather than using the
Now edit analysis/config/routing.py. Add these lines after "CUSTOM ROUTES HERE" (line 21):
map.connect("home", "/", controller="main", action="index") map.connect("page2", "/page2", controller="main", action="page2")
Delete the file analysis/public/index.html.
Now run the server. (Press ctrl-C to quit it.)
$ paster serve development.ini Starting server in PID 7341. serving on http://127.0.0.1:5000
Pylons 1.0 has the following direct and indirect dependencies, which will be found in your site-packages directory ($SP):
- Beaker 1.5.4
- decorator 3.2.0
- FormEncode 1.2.2
- Mako 0.3.4
- MarkupSafe 0.9.3
- Nose 0.11.4
- Paste 184.108.40.206
- PasteDeploy 1.3.3
- PasteScript 1.7.3
- Routes 1.12.3
- simplejson 2.0.9 (if Python < 2.6)
- Tempita 0.4
- WebError 0.10.2
- WebHelpers 1.2
- WebOb 0.9.8
- Webtest 1.2.1
These are the current versions as of August 29, 2010. Your installation may have slightly newer or older versions.
When you run
paster serve development.ini, it runs the "$BIN/paster" program.
This is a platform-specific stub created by
__requires__ = 'PasteScript==1.7.3' import sys from pkg_resources import load_entry_point sys.exit( load_entry_point('PasteScript==1.7.3', 'console_scripts', 'paster')() )
This says to load a Python object "paster" located in an egg "PasteScript",
version 1.7.3, under the entry point group
To explain what this means we have to get into Setuptools. Setuptools is
Python's de facto package manager, and was installed as part of your virtualenv
or Pylons installation. (If you're using Distribute 0.6, an alternative
package manager, it works the same way.)
load_entry_point is a function
that looks up a Python object via entry point and returns it.
So what's an entry point? It's an alias for a Python object. Here's the entry point itself:
This is from $SP/PasteScript-VERSION.egg-info/entry_points.txt. (If you used easy_install rather than pip, the path would be slightly different: $APP/PasteScript-VERSION.egg/EGG-INFO/entry_points.txt.)
"console_scripts" is the entry point group. "paster" is the
entry point. The right side of the value tells which module to import
paste.script.command) and which object in it to return (the
function). (To create an entry point, define it in your package's setup.py. Pip
or easy_install will create the egg_info metadata from that. If you modify a
package's entry points, you must reinstall the package to update the egg_info.)
The most common use case for entry points is for plugins. So Nose for instance defines an entry point group by which it will look for plugins. Any other package can provide plugins for Nose by defining entry points in that group. Paster uses plugins extensively, as we'll soon see.
So to make a long story short, "paster serve" calls this
run function. I
inserted print statements into
paste.script.command to figure out what it
does. Here's a simplified description:
run()function parses the command-line options into a subcommand
get_commands(), which loads Paster commands from plugins located at various entry points. (You can add custom commands with the "--plugin" command-line argument.) Paste's standard commands are listed in the same entry_points.txt file we saw above:
[paste.global_paster_command] serve=paste.script.serve:ServeCommand [Config] #... other commands like "make-config", "setup-app", etc ...
invoke(), which essentially does
paste.script.serve.ServeCommand(["development.ini"]).run(). This in turn calls
ServeCommand.command(), which handles daemonizing and other top-level stuff. Since our command line is short, there's no top-level stuff to do. It creates 'server' and 'app' objects based on the configuration file, and calls
Loading the server and the application (PasteDeploy)
This all happens during step 3 of the application startup. We need to find and instantiate the WSGI application and server based on the configuration file. The application is our Analysis application. The server is Paste's built-in multithreaded HTTP server. A simplified version of the code is:
# Inside paste.script.serve module, ServeCommand.command() method. from paste.deploy.loadwsgi import loadapp, loadserver server = self.loadserver(server_spec, name=server_name, relative_to=base, global_conf=vars) app = self.loadapp(app_spec, name=app_name, relative_to=base, global_conf=vars)
loadapp() are defined in module
paste.deploy.loadwsgi. The code here is complex, so we'll just look at its
general behavior. Both functions see the "config:" URI and read our config
file. Since there is no server name or app name they both default to "main".
Therefore loadserver() looks for a "[server:main]" section in the config file,
and loadapp()` looks for "[app:main]". Here's what they find in
[server:main] use = egg:Paste#http host = 127.0.0.1 port = 5000 [app:main] use = egg:Analysis full_stack = true static_files = true ...
The "use =" line in each section tells which object to load. The other lines are configuration parameters for that object, or for plugins that object is expected to load. We can also put custom parameters in [app:main] for our application to read directly.
loadserver()'s args are
uri="config:development.ini", name=None, relative_to="$APP".
A "config:" URI means to read a config file.
A server name was not specified so it defaults to "main". So loadserver() looks for a section "[server:main]". The "server" part comes from the loadwsgi._Server.config_prefixes class attribute in $SP/paste/deploy/loadwsgi.py).
"use = egg:Paste#http" says to load an egg called "Paste".
loadwsgi._Server.egg_protocols lists two protocols it supports: "server_factory" and "server_runner".
"paste.server_runner" is an entry point group in the "Paste" egg, and it has an entry point "http". The relevant lines in $SP/Paste*.egg_info/entry_points.txt are:
[paste.server_runner] http = paste.httpserver:server_runner
There's a server_runner() function in the paste.httpserver module ($SP/paste/httpserver.py).
We'll stop here for a moment and look at how the application is loaded.
loadapp() looks for a section "[app:main]" in the config file. The "app" part comes from the loadwsgi._App.config_prefixes class attribute (in $SP/paste/deploy/loadwsgi.py).
"use = egg:Analysis" says to find an egg called "Analysis".
loadwsgi._App.egg_protocols lists "paste.app_factory" as one of the protocols it supports.
"paste.app_factory" is also an entry point group in the egg, as seen in $APP/Analysis.egg-info/entry_points.txt:
[paste.app_factory] main = analysis.config.middleware:make_app
The line "main = analysis.config.middleware:make_app" means to look for a
make_app()object in the
analysispackage. This is a function imported from
Instantiating the application (Analysis)
Here's a closer look at our application's
# In $APP/analysis/config/middleware.py def make_app(global_conf, full_stack=True, static_files=True, **app_conf): config = load_environment(global_conf, app_conf) app = PylonsApp(config=config) app = SomeMiddleware(app, ...) # Repeated for several middlewares. app.config = config return app
This sets up the Pylons environment (next subsection), creates the application object (following subsection), wraps it in several layers of middleware (listed in "Anatomy of a Request" below), and returns the complete application object.
The [DEFAULT] section of the config file is passed as dict
The [app:main] section is passed as keyword arguments into dict
full_stack defaults to True because we're running the application
standalone. If we were embedding this application as a WSGI component of some
larger application, we'd set
full_stack to False to disable some of the
static_files=True means to serve static files from our public
directory ($APP/analysis/public). Advanced users can arrange for Apache to
serve the static files itself, and put "static_files = false"
in their configuration file to gain a bit of efficiency.
load_environment & pylons.config
Before we begin, remember that
pylons.cache are special globals that change value depending on the
current request. The objects are proxies which maintain a thread-local stack of
real values. Pylons pushes the actual values onto them at the beginning of a
request, and pops them off at the end. (Some of them it also pushes at other
times so they can be used outside of requests.) The proxies delegate attribute
access and key access to the topmost actual object on the stack. (You can also
myproxy._current_obj() to get the actual object itself.) The proxy
code is in
paste.registry.StackedObjectProxy, so these are called
"StackedObjectProxies", or "SOPs" for short.
The first thing
analysis.config.middleware.make_app() does is call
def load_environment(global_conf, app_conf): config = PylonsConfig() root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) paths = dict(root=root, controllers=os.path.join(root, 'controllers'), static_files=os.path.join(root, 'public'), templates=[os.path.join(root, 'templates')]) # Initialize config with the basic options config.init_app(global_conf, app_conf, package='analysis', paths=paths) config['routes.map'] = make_map(config) config['pylons.app_globals'] = app_globals.Globals(config) config['pylons.h'] = analysis.lib.helpers # Setup cache object as early as possible import pylons pylons.cache._push_object(config['pylons.app_globals'].cache) # Create the Mako TemplateLookup, with the default auto-escaping config['pylons.app_globals'].mako_lookup = TemplateLookup( directories=paths['templates'], error_handler=handle_mako_error, module_directory=os.path.join(app_conf['cache_dir'], 'templates'), input_encoding='utf-8', default_filters=['escape'], imports=['from webhelpers.html import escape']) # CONFIGURATION OPTIONS HERE (note: all config options will override # any Pylons config options) return config
config is the Pylons configuration object, which will later be pushed onto
pylons.config. It's an instance of
config.init_app() initializes the dict's keys. It sets the
keys to a merger of app_conf and global_conf (with app_conf overriding). It
also adds "app_conf" and "global_conf" keys so you can access the original
app_conf and global_conf if desired. It also adds several Pylons-specific keys.
config["routes.map"] is the Routes map defined in
config["pylons.app_globals"] is the application's globals object, which
will later be pushed onto
pylons.app_globals. It's an instance of
config["pylons.h"] is the helpers module,
will assign it to
h in the templates' namespace.
The "cache" lines push
backward compatibility. This gives a preview of how StackedObjectProxies work.
The Mako stanza creates a TemplateLookup, which
render() will use to find
templates. The object is put on
If you've used older versions of Pylons, you'll notice a couple differences in
config object is created as a local variable and returned, and
it's passed explicitly to the route map factory and globals factory. Previous
versions pushed it onto
pylons.config immediately and used it from there.
This was changed to make it easier to nest Pylons applications inside other
The other difference is that Buffet is gone, and along with it the
template_engine argument and template config options. Pylons 1.0 gets out
of the business of initializing template engines. You use one of the standard
render functions such as
render_mako or write your own, and define any
app_globals that your render function depends on.
The second line of
make_app() creates a Pylons application object
based on your configuration. Again the
config object is passed around
explicitly, unlike older versions of Pylons. A Pylons application is an
pylons.wsgiapp.PylonsApp instance. (Older versions of Pylons
PylonsBaseWSGIApp superclass, but that has been merged into
make_app() then wraps the application (the
app variable) in several
layers of middleware. Each middleware provides an optional add-on service.
|Middleware||Service||Effect if disabled|
|RoutesMiddleware||Use Routes to manage URLs.||Routes and
|SessionMiddleware||HTTP sessions using Beaker, with flexible persistence backends (disk, memached, database).||
|ErrorHandler||Display interactive traceback if an exception occurs. In production mode, email the traceback to the site admin.||Paste will catch exceptions and convert them to Internal Server Error.|
|StatusCodeRedirect||If an HTTP error occurs, make a subrequest to display a fancy styled HTML error page.||If an HTTP error occurs, display a plain white HTML page with the error message.|
|RegistryManager||Handles the special globals
||The special globals won't work. There are other ways to access the objects without going through the special globals.|
|StaticURLParser||Serve the static files in the application's public directory.||The static files won't be found. Presumably you've configured Apache to serve them directly.|
|Cascade||Call several sub-middlewares in order, and use the first one that doesn't return "404 Not Found". Used in conjunction with StaticURLParser.||No cascading through alternative apps.|
At the end of the function,
app.config is set to the
config object, so
that any part of the application can access the config without going through
the special global.
Anatomy of a request
Let's say you're running the demo and click the "link" link on the home page. The browser sends a request for "http://localhost:5000/page2". In my Firefox the HTTP request headers are:
GET /page2 Host: 127.0.0.1:5000 User-Agent: Mozilla/5.0 ... Accept: text/html,... Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://127.0.0.1/5000/ Cache-Control max-age=0
The response is:
HTTP/1.x 200 OK Server: PasteWSGIServer/0.5 Python/2.6.4 Date: Sun, 06 Dec 2009 14:06:05 GMT Content-Type: text/html; charset=utf-8 Pragma: no-cache Cache-Control: no-cache Content-Length: 59 Thank you for using the Analysis Demo. <a href="/">Home</a>
Here's the processing sequence:
server(app)is still running, called by
paste.httpserver.server_runner()in $SP/paste/httpserver. The only keyword args are 'host' and 'port' extracted from the config file.
server_runnerde-stringifies the arguments and calls
serve(wsgi_app, **kwargs)(same module).
serve()'s 'use_threadpool' arg defaults to True, so it creates a
WSGIThreadPoolServerinstance called (
server) with the following inheritance:
SocketServer.BaseServer # In SocketServer.py in Python stdlib. BaseHTTPServer.HTTPServer # In BaseHTTPServer.py in Python stdlib. paste.httpserver.SecureHTTPServer # Adds SSL (HTTPS). paste.httpserver.WSGIServerBase # Adds WSGI. paste.httpserver.WSGIThreadPoolServer multiple inheritance: ThreadPoolMixIn <= ThreadPool Note that SecureHTTPServer overrides the implementation of Python's SocketServer.TCPServer
server.serve_forever(), implemented by the
ThreadPoolMixInsuperclass. This calls
self.handle_request()in a loop until
self.runningbecomes false. That initiates this call stack:
# In paste.httpserver.serve(), calling 'server.serve_forever()' ThreadPoolMixIn.serve_forever() # Defined in paste.httpserver. -> TCPServer.handle_request() # Called for every request. -> WSGIServerBase.get_request() -> SecureHTTPServer.get_request() -> self.socket.accept() # Defined in stdlib socket module.
self.socket.accept()blocks, waiting for the next request.
The request arrives and
self.socket.accept()returns a new socket for the connection.
TCPServer.handle_request()continues. It calls
ThreadPoolMixIn.process_request(), which puts the request in a thread queue:
self.thread_pooladd.add_task( lambda: self.process_request_in_thread(request, client_address)) # 'request' is the connection socket.
The thread pool is defined in the
ThreadPoolclass. It spawns a number of threads which each wait on the queue for a callable to run. In this case the callable will be a complete Web transaction including sending the HTML page to the client. Each thread will repeatedly process transactions from the queue until they receive a sentinel value ordering them to die.
The main thread goes back to listening for other requests, so we're no longer interested in it.
Thread #2 pulls the lambda out of the queue and calls it:
lambda -> ThreadPoolMixIn.process_request_in_thread() -> BaseServer.finish_request() -> self.RequestHandlerClass(request, client_address, self) # Instantiates this. The class instantiated is paste.httpserver.WSGIHandler; i.e., the 'handler' variable in serve().
The newly-created request handler takes over:
SocketServer.BaseRequestHandler.__init__(request, client_address, server) -> WSGIHandler.handle() -> BaseHTTPRequestHandler.handle() # In stdlib BaseHTTPServer.py Handles requests in a loop until self.close_connection is true. (For HTTP keepalive?) -> WSGIHandler.handle_one_request() Reads the command from the socket. The command is "GET /page2 HTTP/1.1" plus the HTTP headers above. BaseHTTPRequestHandler.parse_request() parses this into attributes .command, .path, .request_version, and .headers. -> WSGIHandlerMixin.wsgi_execute(). -> WSGIHandlerMixin.wsgi_setup() Creates the .wsgi_environ dict.
The WSGI environment dict is described in PEP 333, the WSGI specification. It contains various keys specifying the URL to fetch, query parameters, server info, etc. All keys required by the CGI specification are present, as are other keys specific to WSGI or to paricular middleware. The application will calculate a response based on the dict. The application is wrapped in layers of middleware -- nested function calls -- which modify the dict on the way in and modify the response on the way out.
The request handler, still in
WSGIHandlerMixin.wsgi_execute(), calls the application thus:
result = self.server.wsgi_application(self.wsgi_environ, self.wsgi_start_response)
wsgi_start_responseis a callable mandated by the WSGI spec. The application will call it to specify the HTTP headers. The return value is an iteration of strings, which when concatenated form the HTML document to send to the browser. Other MIME types are handled analagously.
The application, as we remember, was returned by
analysis.config.middleware.make_app(). It's wrapped in several layers of middleware, so calling it will execute the middleware in reverse order of how they're listed in $APP/analysis/config/middleware.py and $SP/pylons/wsgiapp.py:
Cascade(defined in $SP/paste/cascade.py) lists a series of applications which will be tried in order (Skipped if static_files is set to False):
StaticURLParser(defined in $SP/paste/urlparser) looks for a file URL under $APP/analysis/public that matches the URL. The demo has no static files.
- If that fails the cascader tries your application. But first there are other middleware to go through...
RegistryManager(defined in $SP/paste/registry.py) makes Pylons special globals both thread-local and middleware-local. This includes app_globals, cache, request, response, session, tmpl_context, url, and any other
StackedObjectProxylisted in $SP/pylons/__init__.py. (h is a module so it doesn't need a proxy.)
StatusCodeRedirect(defined in $SP/pylons/middleware.py) intercepts any HTTP error status returned by the application (e.g., "Page Not Found", "Internal Server Error") and sends another request to the application to get the appropriate error page to display instead. (Skipped if
full_stackargument was false.)
ErrorHandler(defined in $SP/pylons/middleware.py) sends an interactive traceback to the browser if the app raises an exception, if "debug" is true in the config file. Otherwise it attempts to email the traceback to the site administrator, and substitutes a generic Internal Server Error for the response. (Skipped if
full_stackargument was false.
User-defined middleware goes here.
SessionMiddleware(wsgiapp.py) adds Beaker session support (the
pylons.sessionobject). (Skipped if the WSGI environment has a key 'session' -- it doesn't in this demo.)
RoutesMiddleware(wsgiapp.py) compares the request URI against the routing rules in $APP/analysis/config/routing.py and sets 'wsgi.routing_args' to the routing match dict (useful) and 'routes.route' to the Route (probably not useful). Pylons 1.0 apps have a
singleton=Falseargument that suppresses initializing the deprecated
url_for()function. Routes now puts a URL generator in the WSGI environment, which Pylons aliases to
The innermost middleware calls the PylonsApp instance it was initialized with.
Note: CacheMiddleware is no longer used in Pylons 1.0. Instead,
app_globalscreates the cache as an attribute, and a line in environment.py aliases
Surprise! PylonsApp is itself middleware. Its .__call__() method does:
self.setup_app_env(environ, start_response) controller = self.resolve(environ, start_response) response = self.dispatch(controller, environ, start_response) return response
.setup_app_env()registers all those special globals.
.resolve()calculates the controller class based on the route chosen by the RoutesMiddleware, and returns the controller class.
.dispatchinstantiates the controller class and calls in the WSGI manner. If the controller does not exist (
.resolve()returned None), raise an Exception that tells you what controller did not have any content.
This method also handles the special URL "/_test_vars", which is enabled if the application is running under a Nose test. This URL initializes Pylons' special globals, for tests that have to access them before making a regular request.
analysis.controllers.main.MainControllerdoes not have a
.\_\_call\_\_()method, so control falls to its parent,
analysis.lib.base.BaseController. This trivially calls the grandparent,
pylons.controllers.WSGIController. It calls the action method
MainController.page2(). The action method may have any number of positional arguments as long as they correspond to variables in the routing match dict. (GET/POST variables are in the request.params dict.) If the method has a
\*\*kwargsargument, all other match variables are put there. Any variables passed to the action method are also put on the tmpl_context object as attributes. If an action method name starts with "_", it's private and HTTPNotFound is raised.
If the controller has .__before__() and/or .__after__() methods, they are called before and after the action, respectively. These can perform authorization, lock OS resources, etc. These methods can have arguments in the same manner as the action method. However, if the code is used by all controllers, most Pylons programmers prefer to it in the base controller's
The action method returns a string, unicode, Response object, or is a generator of strings. In this trivial case it returns a string. A typical Pylons action would set some tmpl_context attributes and 'return render('/some/template.html")' . In either case the global response object's body would be set to the string.
WSGIController.\_\_call\_\_()continues, converting the Response object to an appropriate WSGI return value. (First it calls the start_response callback to specify the HTTP headers, then it returns an iteration of strings. The Response object converts unicode to utf-8 encoded strings, or whatever encoding you've specified in the config file.)
The stack of middleware calls unwinds, each modifying the return value and headers if it desires.
The server receives the final return value. (We're way back in
paste.httpserver.WSGIHandlerMixin.wsgi_execute()now.) The outermost middleware has called back to
server.start_response(), which has saved the status and HTTP headers in
.wsgi_execute()then iterates the application's return value, calling
.wsgi_write_chunk(chunk)for each encoded string yielded.
.wsgi_write_chunk('')formats the status and HTTP headers and sends them on the socket if they haven't been sent yet, then sends the chunk. The convoluted header behavior here is mandated by the WSGI spec.
Control returns to
.close_connectionis true so this method returns. The call stack continues unwinding all the way to
paste.httpserver.ThreadPoolMixIn.process_request_in_thread(). This tries to finish the request first and then close it unless it finds errors in it to end raising an Exception.
The request lambda finishes and control returns to
ThreadPool.worker_thread_callback(). It waits for another request in the thread queue. If the next item in the queue is the shutdown sentinel value, thread #2 dies.
Thus endeth our request's long journey, and this analysis is finished too.