Audit Archivematica user activities via nginx access logs.
auditmatica
is intended to facilitate auditing of user activities in
Archivematica and the Archivematica Storage Service. It uses nginx access logs
as its data source, and outputs either logs in Common Event Format (CEF)
or a JSON overview of user activities.
auditmatica
has two subcommands, write-cef
and overview
.
Usage: auditmatica [OPTIONS] COMMAND [ARGS]...
Auditmatica: Archivematica auditing package
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
overview Print overview of user activities from nginx access log.
write-cef Write Common Event Format (CEF) log from nginx access log.
To write CEF events, use the write-cef
subcommand. E.g.:
auditmatica write-cef /path/to/nginx/access.log
or
cat /var/log/nginx/access.log | auditmatica write-cef
CEF is a widely used standard for network and security analysis. CEF events
can be sent to applications for review, monitoring, and visualization via a
file or over syslog. CEF events written by auditmatica
include an event name,
signature (unique identifier), and severity level (0-10), which are determined
based on details from the nginx access log such as URL, HTTP method, and HTTP
return code.
A sample CEF event written by auditmatica
looks like the following:
CEF:0|Artefactual Systems, Inc.|Archivematica|hosted|16|AIP downloaded from Archival Storage|3|cs1Label=requestClientApplication cs1="Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" rt=Jan 13 2021 20:01:33 requestMethod=GET request=/archival-storage/download/aip/8fa54cfc-f5c5-4673-b44e-fc514496bad7/ src=172.19.0.1 suser=test msg=UUID:8fa54cfc-f5c5-4673-b44e-fc514496bad7
In addition to the required CEF fields, each CEF event produced by auditmatica
also has the following extensions:
Field | Value | Mandatory? |
---|---|---|
cs1Label |
requestClientApplication |
True |
cs1 |
User agent string | True |
rt |
Request time | True |
requestMethod |
Request HTTP method | True |
request |
Request URL | True |
src |
Requester IP address | True |
suser |
Authenticated username | False - present only if usernames are configured in nginx log and there is a username associated with the event's log line |
msg |
UUID:<uuid from request URL> |
False - present only if there is a UUID associated with the event |
For a comprehensive list of Archivematica CEF events, see IMPLEMENTATION.md.
For more details on CEF, see the CEF specification.
This command accepts several optional arguments:
Usage: auditmatica write-cef [OPTIONS] [LOG]
Write Common Event Format (CEF) log from nginx access log.
Options:
-o, --output PATH Filepath for output CEF file (default=None, print to
stdout)
-s, --syslog Write CEF events to syslog instead of file
--syslog-address TEXT Address for syslog connection (default='/dev/log')
--syslog-facility TEXT Facility for syslog messages (default='USER')
--syslog-port INTEGER Port for remote syslog connections
--ss-base-url TEXT Override the Storage Service URL to scan for
(default='http://127.0.0.1:62081')
--suppress Suppress log lines that do not map to a specific
event instead of reverting to default event
-v, --verbose Enable verbose error message reporting
--help Show this message and exit.
auditmatica
looks for Storage Service events in the nginx access log by
checking each URL to determine if it begins with the expected base URL of the
Storage Service. By default, this is http://127.0.0.1:62081
.
To override the Storage Service URL to scan for, use --ss-base-url
. E.g.:
auditmatica --ss-base-url http://archivematica.example.com:8000
By default, auditmatica
writes CEF events to stdout and some end-user facing
messages to stderr.
To write CEF events to a file, use the -o/--output
option to specify a
filepath for the output file. E.g.:
auditmatica write-cef /path/to/nginx/access.log --output my-output-file.log
To write CEF events to syslog, use the -s/--syslog
option. By
default, this will write syslog messages to /dev/log/
using the USER
facility. The --syslog-address
, --syslog-port
, and --syslog-facility
options can be used to customize the syslog connection. E.g.:
auditmatica write-cef -s \
--syslog-address localhost \
--syslog-port 514 \
--syslog-facility local0 \
/path/to/nginx/access.log
--syslog-port
will only be used if an address other than /dev/log
is also
passed with--syslog-address
.
Valid --syslog-facility
values are local0
-local7
, which are reserved by
syslog for local use.
To generate a high-level JSON overview of Archivematica user activities, use
the overview
subcommand. E.g.:
cat access.log | auditmatica overview
Usage: auditmatica overview [OPTIONS] [LOG]
Write JSON overview of user activities from nginx access log.
Options:
--ss-base-url TEXT Override the Storage Service URL to scan for
(default='http://127.0.0.1:62081')
--help Show this message and exit.
auditmatica
looks for Storage Service events in the nginx access log by
checking each URL to determine if it begins with the expected base URL of the
Storage Service. By default, this is http://127.0.0.1:62081
.
To override the Storage Service URL to scan for, use --ss-base-url
. E.g.:
auditmatica --ss-base-url http://archivematica.example.com:8000
auditmatica
requires Python 3.6+.
pip install auditmatica
Download this repo:
git clone https://github.com/artefactual-labs/auditmatica.git
Change into the cloned directory and install:
cd auditmatica/
pip install .
Including authenticated usernames in auditmatica
's outputs requires some
additional setup:
- Enable auditing middleware in Archivematica and the Storage Service via environment variables
# Archivematica 1.13+
ARCHIVEMATICA_DASHBOARD_DASHBOARD_AUDIT_LOG_MIDDLEWARE: "true"
# Storage Service 0.18+
SS_AUDIT_LOG_MIDDLEWARE: "true"
- Restart Archivematica and Storage Service services
sudo service archivematica-dashboard restart
sudo service archivematica-storage-service restart
- Add the following configuration to the
http
block of thenginx.conf
configuration file (likely/etc/nginx/nginx.conf
, though this may vary)
log_format main '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" user=$upstream_http_x_username';
access_log "/var/log/nginx/access.log" main;
This format must be exact, as it maps directly to a pattern auditmatica
uses
to parse nginx access log lines with usernames. If an access_log
is already
specified, replace it or name it something other than main
. The access_log
path (by default /var/log/nginx/access.log
) can be changed as needed.
-
Optionally, add
proxy_hide_header x-username;
to the nginxserver
blocks to prevent the authenticated username from being sent back with each response to the client device. -
Restart nginx service:
sudo service nginx restart
If the above is configured correctly, the resulting nginx access log lines should look as follows:
172.10.5.1 - - [13/Jan/2021:19:53:10 +0000] "GET /backlog/download/2e28b8a9-351c-4da7-92d2-837ac04cd2d9/ HTTP/1.1" 200 19436360 "http://127.0.0.1:62080/backlog/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" "-" user=test
For development, it may be useful to install auditmatica
with
pip install -e .
, which will apply changes made to the source code
immediately.
To run all tests with tox: tox
Or run tests directly with pytest:
pip install -r requirements/test.txt
pytest
This repository contains a Makefile with commands to aid in building packages and publishing to PyPI.
To check that the package is valid:
make package-check
To upload the package to PyPI (this requires PyPI credentials and being
listed as a collaborator on the auditmatica
project):
make package-upload
To clean up package distribution files:
make clean