-
Notifications
You must be signed in to change notification settings - Fork 120
Configuration
The following configuration options are set in the root of Sketchy.
config-default.py
is the default configuration used by Sketchy.
config-test.py
is the configuration file used for the nose tests (located in tests.py
).
import os
_basedir = os.path.abspath(os.path.dirname(__file__))
DEBUG = True
# Database setup
SQLALCHEMY_DATABASE_URI = 'sqlite:////tmp/sketchy-db.db'
# Set scheme and hostname:port of your server.
# Alterntively, you can export the 'host' variable on your system to set the
# host and port.
# If you are using Nginx with SSL, change the scheme to https.
BASE_URL = 'http://%s' % os.getenv('host', '127.0.0.1:8000')
# Broker configuration information, currently only supporting Redis
CELERY_BROKER_URL = 'redis://localhost:6379'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
# Local Screenshot storage
LOCAL_STORAGE_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'files')
# Maximum time to wait for PhantomJS to generate a screenshot
PHANTOMJS_TIMEOUT = 35
# Maximum number of Celery Job retries on failure
MAX_RETRIES = 1
# Seconds to sleep before retrying the task
COOLDOWN = 5
# Path to Phanotom JS
PHANTOMJS = '/usr/local/bin/phantomjs'
# S3 Specific configurations
# This will store your sketches, scrapes, and html in an S3 bucket
USE_S3 = os.getenv('use_s3', 'False').lower() == 'true'
S3_BUCKET_PREFIX = os.getenv('bucket_prefix', '')
S3_LINK_EXPIRATION = 6000000
S3_BUCKET_REGION_NAME = os.getenv('bucket_region_name', 'us-east-1')
# Token Auth Setup
REQUIRE_AUTH = False
AUTH_TOKEN = os.getenv('auth_token', 'test')
# Log file configuration (currently only logs errors)
SKETCHY_LOG_FILE = "sketchy.log"
# Perform SSL host validation (set to False if you want to scrape/screenshot sketchy websites)
SSL_HOST_VALIDATION = False
# Enable this option to screenshot webpages that generate 4xx or 5xx HTTP error codes
CAPTURE_ERRORS = True
Celery uses SQLAlchmey for object relational mapping. The Database URI can be set as an environmental variable. The Database URI is set in the following directive in config-default.py
:
# Database setup
SQLALCHEMY_DATABASE_URI = os.getenv('sketchy_db', 'sqlite:////tmp/sketchy-db.db')
The config-default.py
file is currently set to use SQLite as the DBMS. I suggest you use a more robust database management system, such as MySQL.
An example MySQL string for Amazon RDS may look like the following:
# Database setup
SQLALCHEMY_DATABASE_URI = 'mysql://sketchydb:super_password@sketchy.blahblah.us-west-1.rds.amazonaws.com:3306/sketchy'
If you change your database URI, you will need to recreate the DB tables by running:
python manage.py create_db
Celery needs a broker to be configured. You need to specify the broker URL as well as the broker backend (which stores results for specific tasks run by Celery). The following example leverages Redis as a broker and backend:
# Broker configuration information, currently only supporting Redis
CELERY_BROKER_URL='redis://localhost:6379'
CELERY_RESULT_BACKEND='redis://localhost:6379'
Alternatively, you can use another broker such as ActiveMQ.
You can configure the maximum time to wait for PhantomJS to generate a screenshot using the PHANTOMJS_TIMEOUT
setting. An example configuration option for PhantomJS are below:
# Maximum time to wait for PhantomJS to generate a screenshot
PHANTOMJS_TIMEOUT = 35
You can configure the number of retries Celery will execute when a task fails using the MAX_RETRIES
settings. You can also specify how long to wait between retries with the COOLDOWN
setting. Some example configuration options for Celery are below:
# Maximum number of Celery task retries on failure
MAX_RETRIES = 1
# Seconds to sleep before retrying the task
COOLDOWN = 5
S3 can be configured to store captures. You simply need to specify a bucket, the region your S3 storage is setup in, as well as a link expiration for the links that are generated. These can be optionally set as environmental variables. The following is an example configuration for S3:
USE_S3 = os.getenv('use_s3', 'True').lower() == 'true'
S3_BUCKET_PREFIX = os.getenv('bucket_prefix', 'mytestbucket.foobar.net')
S3_LINK_EXPIRATION = 6000000
S3_BUCKET_REGION_NAME = os.getenv('bucket_region_name', 'us-east-1')
Note: Celery needs to have AWS session keys exported for this to work. This should automatically be setup if deploying on an AWS instance.
Token authentication can be configured optionally for all API requests. If REQUIRE_AUTH
is set to True, all requests to Sketchy will require the token header specified.
# Token Auth Setup
REQUIRE_AUTH = False
AUTH_TOKEN = 'your_token_here_and_what_not'
The header you need to send is Token
. More information can be found in API Authentication
These settings control how http requests are handled within Sketchy. SSL_HOST_VALIDATION
header is used to control if Sketchy should capture screenshots from web sites that have invalid SSL certificates. CAPTURE_ERRORS
can be used to screenshot webpages that return 4xx or 5xx error http status codes.
# Perform SSL host validation (set to False if you want to scrape/screenshot sketchy websites)
SSL_HOST_VALIDATION = False
# Enable this option to screenshot webpages that generate 4xx or 5xx HTTP error codes
CAPTURE_ERRORS = True