Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-memory caching using only python #18

Merged
merged 7 commits into from
Mar 22, 2013
Merged
5 changes: 5 additions & 0 deletions README
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -118,6 +118,11 @@ this type, we do not regularly test it. Currently, the threaded_scanner.py will
for SSL services, though work to have it scan for SSH services is pretty minor. for SSL services, though work to have it scan for SSH services is pretty minor.




==== MORE INFO ====

See doc/advanced_notary_configuration.txt for tips on improving notary performance.


==== CONTRIBUTING ==== ==== CONTRIBUTING ====


Please visit the github page to submit changes and suggest improvements: Please visit the github page to submit changes and suggest improvements:
Expand Down
9 changes: 9 additions & 0 deletions doc/advanced_notary_configuration.txt
Original file line number Original file line Diff line number Diff line change
@@ -0,0 +1,9 @@
There are several options that can make your notary run even better.


1. Set up caching!

Data caching will significantly increase your notary's performance.

For best performance you may want to use a dedicated caching server such as memcached, memcachier, or redis. If you do not have access to or don't want to set up a dedicated caching server, use the built-in python caching with '--pycache'.

8 changes: 7 additions & 1 deletion notary_http.py
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class NotaryHTTPServer:
Collect and share information on website certificates from around the internet. Collect and share information on website certificates from around the internet.
""" """


VERSION = "3.1" VERSION = "pre3.2a"
DEFAULT_WEB_PORT=8080 DEFAULT_WEB_PORT=8080
ENV_PORT_KEY_NAME='PORT' ENV_PORT_KEY_NAME='PORT'
STATIC_DIR = "notary_static" STATIC_DIR = "notary_static"
Expand Down Expand Up @@ -66,6 +66,10 @@ def __init__(self):
help="Use memcachier to cache observation data. " + cache.Memcachier.get_help()) help="Use memcachier to cache observation data. " + cache.Memcachier.get_help())
cachegroup.add_argument('--redis', action='store_true', default=False, cachegroup.add_argument('--redis', action='store_true', default=False,
help="Use redis to cache observation data. " + cache.Redis.get_help()) help="Use redis to cache observation data. " + cache.Redis.get_help())
cachegroup.add_argument('--pycache', default=False, const=cache.Pycache.CACHE_SIZE,
nargs='?', metavar=cache.Pycache.get_metavar(),
help="Use RAM to cache observation data on the local machine only.\
If you don't use any other type of caching, use this! " + cache.Pycache.get_help())


args = parser.parse_args() args = parser.parse_args()


Expand Down Expand Up @@ -103,6 +107,8 @@ def __init__(self):
self.cache = cache.Memcachier() self.cache = cache.Memcachier()
elif (args.redis): elif (args.redis):
self.cache = cache.Redis() self.cache = cache.Redis()
elif (args.pycache):
self.cache = cache.Pycache(args.pycache)


self.active_threads = 0 self.active_threads = 0
self.args = args self.args = args
Expand Down
29 changes: 29 additions & 0 deletions test/Network Notary Test Cases.txt
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -62,3 +62,32 @@ Failing Gracefully:
- For Machines, and Event Types (on startup) does it log an error, disable database metrics, and continue? - For Machines, and Event Types (on startup) does it log an error, disable database metrics, and continue?
- For Metrics does it ignore the metric, log an error, and continue? - For Metrics does it ignore the metric, log an error, and continue?
- Are metrics throttled back if the server receives many requests in a short period of time? (e.g. 200 requests per second) - Are metrics throttled back if the server receives many requests in a short period of time? (e.g. 200 requests per second)


In-memory caching with pycache:
-------------------------------
- If the cache is below the memory limit, are new keys continually added upon request?
- If adding a new key would use too much memory, does the cache remove an entry and then store the key?
- Is the least recently used entry removed?
- If removing one entry doesn't clear enough RAM, does the cache remove multiple entries until it has enough space?
- Do both the hash and the heap size go down?
- If a requested object is bigger than total RAM allowed, do we log a warning and not store it?
- When an existing entry is retrieved from the cache is it's 'last viewed' time updated?

expiry:
- Are expired entries removed during get() calls and None returned instead?
- Are expired entries cleaned up as they are encountered when clearing new memory?
- Are negative expiry times rejected and cache entries not created?

pycache threads:
- Do we only create a single cache and return the proper results regardless of how many threads the server uses?
- If multiple threads attempt to set a value for the same key is only one of them allowed to set and the rest return immediately?
- If multiple threads attempt to set a value for *different* keys, are they all allowed to do so?
- Is only one thread at a time allowed to adjust the current memory usage?

pycache arguments:
- Are characters other than 0-9MGB rejected, throwing an error?
- Does it stop you from specifyng MB *and* GB?
- Is it case insensitive?
- Does the cache have to be at least 1MB?

86 changes: 83 additions & 3 deletions util/cache.py
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -102,13 +102,11 @@ def __init__(self):
print >> sys.stderr, "ERROR: Could not connect to memcache server: '%s'. memcache is disabled." % (str(e)) print >> sys.stderr, "ERROR: Could not connect to memcache server: '%s'. memcache is disabled." % (str(e))
self.pool = None self.pool = None



def __del__(self): def __del__(self):
"""Clean up resources""" """Clean up resources"""
if (self.pool != None): if (self.pool != None):
self.pool.relinquish() self.pool.relinquish()



def get(self, key): def get(self, key):
"""Retrieve the value for a given key, or None if no key exists.""" """Retrieve the value for a given key, or None if no key exists."""
if (self.pool != None): if (self.pool != None):
Expand All @@ -118,7 +116,6 @@ def get(self, key):
print >> sys.stderr, "Cache does not exist! Create it first" print >> sys.stderr, "Cache does not exist! Create it first"
return None return None



def set(self, key, data, expiry=CacheBase.CACHE_EXPIRY): def set(self, key, data, expiry=CacheBase.CACHE_EXPIRY):
"""Save the value to a given key name.""" """Save the value to a given key name."""
if (self.pool != None): if (self.pool != None):
Expand Down Expand Up @@ -210,3 +207,86 @@ def set(self, key, data, expiry=CacheBase.CACHE_EXPIRY):
self.redis.expire(key, expiry) self.redis.expire(key, expiry)
else: else:
print >> sys.stderr, "ERROR: Redis cache does not exist! Create it first" print >> sys.stderr, "ERROR: Redis cache does not exist! Create it first"


class Pycache(CacheBase):
"""
Cache data using RAM.
"""

CACHE_SIZE = "50" # megabytes

@classmethod
def get_help(cls):
"""Tell the user how they can use this type of cache."""
# TODO: quantify how many observation records this can store
return "Size can be specified in Megabytes (M/MB) or Gigabytes (G/GB). \
Megabytes is assumed if no unit is given. \
Default size: " + cls.CACHE_SIZE + "MB."

@classmethod
def get_metavar(cls):
"""
Return the string that should be used for argparse's metavariable
(i.e. the string that explains how to specify a cache size on the command line)
"""
return "CACHE_SIZE_INTEGER[M|MB|G|GB]"

def __init__(self, cache_size=CACHE_SIZE):
"""Create a cache using RAM."""
self.cache = None

import re

# let the user specify sizes with the characters 'MB' or 'GB'
if (re.search("[^0-9MGBmgb]+", cache_size) != None):
raise ValueError("Invalid Pycache cache size '%s': use '%s'." %
(str(cache_size), self.get_metavar()))

if (re.search("[Mm]", cache_size) and re.search("[Gg]", cache_size)):
raise ValueError("Invalid Pycache cache size '%s': " % (str(cache_size)) +
"specify only one of MB and GB.")

multiplier = 1024 * 1024 # convert to bytes

if (re.search("[Gg]", cache_size)):
multiplier *= 1024

# remove non-numeric characters
cache_size = cache_size.translate(None, 'MGBmgb')
cache_size = int(cache_size)

if (cache_size < 1):
raise ValueError("Invalid Pycache cache size '%s': " % (str(cache_size)) +
"cache must be at least 1MB.")

cache_size *= multiplier

try:
from util import pycache
self.cache = pycache
pycache.set_cache_size(cache_size)
except ImportError, e:
print >> sys.stderr, "ERROR: Could not import module 'pycache': '%s'." % (e)
self.cache = None
except Exception, e:
print >> sys.stderr, "ERROR creating cache in memory: '%s'." % (e)
self.cache = None

def get(self, key):
"""Retrieve the value for a given key, or None if no key exists."""
if (self.cache != None):
return self.cache.get(key)
else:
print >> sys.stderr, "pycache get() error: cache does not exist! create it before retrieving values."
return None

def set(self, key, data, expiry=CacheBase.CACHE_EXPIRY):
"""Save the value to a given key name."""
if (self.cache != None):
try:
self.cache.set(key, data, expiry)
except Exception, e:
print >> sys.stderr, "pycache set() error: '%s'." % (e)
else:
print >> sys.stderr, "pycache set() error: cache does not exist! create it before setting values."
Loading