Skip to content

Commit

Permalink
Use a doubled proxy-logging instead of each middleware handling it
Browse files Browse the repository at this point in the history
differently (if at all)

Adding a swift.source to wsgi pre_auth funcs and all middleware that makes
subrequests to proxy server.

NOTE: This change will result in a change in the number of proxy logs made for
staticweb, formpost, tempurl, and any other middleware that performs sub
requests (including swauth and SOS).

Please see docs for details.

DocImpact

Change-Id: I80cf2806add1c3d34054147e2515944be340455b
  • Loading branch information
dpgoetz committed Feb 11, 2013
1 parent 9de643e commit a622349
Show file tree
Hide file tree
Showing 12 changed files with 119 additions and 292 deletions.
16 changes: 6 additions & 10 deletions etc/proxy-server.conf-sample
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
# eventlet_debug = false

[pipeline:main]
pipeline = catch_errors healthcheck cache ratelimit tempauth container-quotas proxy-logging proxy-server
pipeline = catch_errors healthcheck proxy-logging cache ratelimit tempauth container-quotas proxy-logging proxy-server

[app:proxy-server]
use = egg:swift#proxy
Expand Down Expand Up @@ -265,15 +265,6 @@ use = egg:swift#cname_lookup
use = egg:swift#staticweb
# Seconds to cache container x-container-meta-web-* header values.
# cache_timeout = 300
# You can override the default log routing for this filter here:
# set log_name = staticweb
# set log_facility = LOG_LOCAL0
# set log_level = INFO
# set log_address = /dev/log
# set access_log_name = staticweb
# set access_log_facility = LOG_LOCAL0
# set access_log_level = INFO
# set log_headers = False

# Note: Put tempurl just before your auth filter(s) in the pipeline
[filter:tempurl]
Expand Down Expand Up @@ -331,6 +322,11 @@ use = egg:swift#proxy_logging
# What HTTP methods are allowed for StatsD logging (comma-sep); request methods
# not in this list will have "BAD_METHOD" for the <verb> portion of the metric.
# log_statsd_valid_http_methods = GET,HEAD,POST,PUT,DELETE,COPY,OPTIONS
# Note: The double proxy-logging in the pipeline is not a mistake. The
# left-most proxy-logging is there to log requests that were handled in
# middleware and never made it through to the right-most middleware (and
# proxy server). Double logging is prevented for normal requests. See
# proxy-logging docs.

# Note: Put before both ratelimit and auth in the pipeline.
[filter:bulk]
Expand Down
44 changes: 33 additions & 11 deletions swift/common/middleware/bulk.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,24 +79,43 @@ class Bulk(object):
Acceptable formats are text/plain, application/json, application/xml, and
text/xml.
There are proxy logs created for each file (which becomes a subrequest) in
the tar. The subrequest's proxy log will have a swift.source set to "EA"
the log's content length will reflect the unzipped size of the file. If
double proxy-logging is used the leftmost logger will not have a
swift.source set and the content length will reflect the size of the
payload sent to the proxy (the unexpanded size of the tar.gz).
Bulk Delete:
Will delete multiple objects from their account with a single request.
Responds to DELETE requests with query parameter ?bulk-delete set.
The Content-Type should be set to text/plain. The body of the DELETE
request will be a newline separated list of url encoded objects to delete.
You can only delete 1000 (configurable) objects per request. The objects
specified in the DELETE request body must be URL encoded and in the form:
Will delete multiple objects or containers from their account with a
single request. Responds to DELETE requests with query parameter
?bulk-delete set. The Content-Type should be set to text/plain.
The body of the DELETE request will be a newline separated list of url
encoded objects to delete. You can only delete 1000 (configurable) objects
per request. The objects specified in the DELETE request body must be URL
encoded and in the form:
/container_name/obj_name
If all objects were successfully deleted (or did not exist), will return an
HTTPOk. If any objects failed to delete, will return an HTTPBadGateway. In
both cases the response body will specify the number of objects
successfully deleted, not found, and a list of the objects that failed.
or for a container (which must be empty at time of delete)
/container_name
If all items were successfully deleted (or did not exist), will return an
HTTPOk. If any failed to delete, will return an HTTPBadGateway. In
both cases the response body will specify the number of items
successfully deleted, not found, and a list of those that failed.
The return body will be formatted in the way specified in the request's
Accept header. Acceptable formats are text/plain, application/json,
apllication/xml, and text/xml.
application/xml, and text/xml.
There are proxy logs created for each object or container (which becomes a
subrequest) that is deleted. The subrequest's proxy log will have a
swift.source set to "BD" the log's content length of 0. If double
proxy-logging is used the leftmost logger will not have a
swift.source set and the content length will reflect the size of the
payload sent to the proxy (the list of objects/containers to be deleted).
"""

def __init__(self, app, conf):
Expand All @@ -117,6 +136,7 @@ def create_container(self, req, container_path):
"""
new_env = req.environ.copy()
new_env['PATH_INFO'] = container_path
new_env['swift.source'] = 'EA'
create_cont_req = Request.blank(container_path, environ=new_env)
resp = create_cont_req.get_response(self.app)
if resp.status_int // 100 != 2:
Expand Down Expand Up @@ -231,6 +251,7 @@ def handle_delete(self, req):
new_env['CONTENT_LENGTH'] = 0
new_env['HTTP_USER_AGENT'] = \
'%s BulkDelete' % req.environ.get('HTTP_USER_AGENT')
new_env['swift.source'] = 'BD'
delete_obj_req = Request.blank(delete_path, new_env)
resp = delete_obj_req.get_response(self.app)
if resp.status_int // 100 == 2:
Expand Down Expand Up @@ -337,6 +358,7 @@ def handle_extract(self, req, compress_type):
new_env['wsgi.input'] = tar_file
new_env['PATH_INFO'] = destination
new_env['CONTENT_LENGTH'] = tar_info.size
new_env['swift.source'] = 'EA'
new_env['HTTP_USER_AGENT'] = \
'%s BulkExpand' % req.environ.get('HTTP_USER_AGENT')
create_obj_req = Request.blank(destination, new_env)
Expand Down
3 changes: 2 additions & 1 deletion swift/common/middleware/container_quotas.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ def __call__(self, req):

# check user uploads against quotas
elif obj and req.method == 'PUT':
container_info = get_container_info(req.environ, self.app)
container_info = get_container_info(
req.environ, self.app, swift_source='CQ')
if not container_info or not is_success(container_info['status']):
# this will hopefully 404 later
return self.app
Expand Down
56 changes: 8 additions & 48 deletions swift/common/middleware/formpost.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@
from time import gmtime, strftime, time
from urllib import quote, unquote

from swift.common.utils import get_logger, streq_const_time
from swift.common.utils import streq_const_time
from swift.common.wsgi import make_pre_authed_env
from swift.common.http import HTTP_BAD_REQUEST

Expand Down Expand Up @@ -285,6 +285,9 @@ class FormPost(object):
See above for a full description.
The proxy logs created for any subrequests made will have swift.source set
to "FP".
:param app: The next WSGI filter or app in the paste.deploy
chain.
:param conf: The configuration dict for the middleware.
Expand All @@ -295,10 +298,6 @@ def __init__(self, app, conf):
self.app = app
#: The filter configuration dict.
self.conf = conf
#: The logger to use with this middleware.
self.logger = get_logger(conf, log_route='formpost')
#: The HTTP user agent to use with subrequests.
self.agent = '%(orig)s FormPost'

def __call__(self, env, start_response):
"""
Expand All @@ -314,13 +313,12 @@ def __call__(self, env, start_response):
_parse_attrs(env.get('CONTENT_TYPE') or '')
if content_type == 'multipart/form-data' and \
'boundary' in attrs:
env['HTTP_USER_AGENT'] += ' FormPost'
status, headers, body = self._translate_form(
env, attrs['boundary'])
self._log_request(env, int(status.split(' ', 1)[0]))
start_response(status, headers)
return body
except (FormInvalid, EOFError), err:
self._log_request(env, HTTP_BAD_REQUEST)
body = 'FormPost: %s' % err
start_response(
'400 Bad Request',
Expand Down Expand Up @@ -413,7 +411,8 @@ def _perform_subrequest(self, orig_env, attributes, fp, key):
max_file_size = int(attributes.get('max_file_size') or 0)
except ValueError:
raise FormInvalid('max_file_size not an integer')
subenv = make_pre_authed_env(orig_env, 'PUT', agent=self.agent)
subenv = make_pre_authed_env(orig_env, 'PUT', agent=None,
swift_source='FP')
subenv['HTTP_TRANSFER_ENCODING'] = 'chunked'
subenv['wsgi.input'] = _CappedFileLikeObject(fp, max_file_size)
if subenv['PATH_INFO'][-1] != '/' and \
Expand Down Expand Up @@ -471,7 +470,7 @@ def _get_key(self, env):
key = memcache.get('temp-url-key/%s' % account)
if not key:
newenv = make_pre_authed_env(env, 'HEAD', '/v1/' + account,
self.agent)
agent=None, swift_source='FP')
newenv['CONTENT_LENGTH'] = '0'
newenv['wsgi.input'] = StringIO('')
key = [None]
Expand All @@ -491,45 +490,6 @@ def _start_response(status, response_headers, exc_info=None):
memcache.set('temp-url-key/%s' % account, key, timeout=60)
return key

def _log_request(self, env, response_status_int):
"""
Used when a request might not be logged by the underlying
WSGI application, but we'd still like to record what
happened. An early 401 Unauthorized is a good example of
this.
:param env: The WSGI environment for the request.
:param response_status_int: The HTTP status we'll be replying
to the request with.
"""
the_request = quote(unquote(env.get('PATH_INFO') or '/'))
if env.get('QUERY_STRING'):
the_request = the_request + '?' + env['QUERY_STRING']
client = env.get('HTTP_X_CLUSTER_CLIENT_IP')
if not client and 'HTTP_X_FORWARDED_FOR' in env:
# remote host for other lbs
client = env['HTTP_X_FORWARDED_FOR'].split(',')[0].strip()
if not client:
client = env.get('REMOTE_ADDR')
self.logger.info(' '.join(quote(str(x)) for x in (
client or '-',
env.get('REMOTE_ADDR') or '-',
strftime('%d/%b/%Y/%H/%M/%S', gmtime()),
env.get('REQUEST_METHOD') or 'GET',
the_request,
env.get('SERVER_PROTOCOL') or '1.0',
response_status_int,
env.get('HTTP_REFERER') or '-',
(env.get('HTTP_USER_AGENT') or '-') + ' FormPOST',
env.get('HTTP_X_AUTH_TOKEN') or '-',
'-',
'-',
'-',
env.get('swift.trans_id') or '-',
'-',
'-',
)))


def filter_factory(global_conf, **local_conf):
""" Returns the WSGI filter for use with paste.deploy. """
Expand Down
25 changes: 24 additions & 1 deletion swift/common/middleware/proxy_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,27 @@
* Values that are missing (e.g. due to a header not being present) or zero
are generally represented by a single hyphen ('-').
The proxy-logging can be used twice in the proxy server's pipeline when there
is middleware installed that can return custom responses that don't follow the
standard pipeline to the proxy server.
For example, with staticweb, the middleware might intercept a request to
/v1/AUTH_acc/cont/, make a subrequest to the proxy to retrieve
/v1/AUTH_acc/cont/index.html and, in effect, respond to the client's original
request using the 2nd request's body. In this instance the subrequest will be
logged by the rightmost middleware (with a swift.source set) and the outgoing
request (with body overridden) will be logged by leftmost middleware.
Requests that follow the normal pipeline (use the same wsgi environment
throughout) will not be double logged because an environment variable
(swift.proxy_access_log_made) is checked/set when a log is made.
All middleware making subrequests should take care to set swift.source when
needed. With the doubled proxy logs, any consumer/processor of swift's proxy
logs should look at the swift.source field, the rightmost log value, to decide
if this is a middleware subrequest or not. A log processor calculating
bandwidth usage will want to only sum up logs with no swift.source.
"""

import time
Expand All @@ -56,7 +77,6 @@ def __init__(self, app, conf):
self.log_hdrs = config_true_value(conf.get(
'access_log_headers',
conf.get('log_headers', 'no')))

# The leading access_* check is in case someone assumes that
# log_statsd_valid_http_methods behaves like the other log_statsd_*
# settings.
Expand Down Expand Up @@ -89,6 +109,8 @@ def log_request(self, env, status_int, bytes_received, bytes_sent,
:param bytes_sent: bytes yielded to the WSGI server
:param request_time: time taken to satisfy the request, in seconds
"""
if env.get('swift.proxy_access_log_made'):
return
req = Request(env)
if client_disconnect: # log disconnected clients as '499' status code
status_int = 499
Expand Down Expand Up @@ -122,6 +144,7 @@ def log_request(self, env, status_int, bytes_received, bytes_sent,
'%.4f' % request_time,
req.environ.get('swift.source'),
)))
env['swift.proxy_access_log_made'] = True
# Log timing and bytes-transfered data to StatsD
if req.path.startswith('/v1/'):
try:
Expand Down
Loading

0 comments on commit a622349

Please sign in to comment.