Skip to content
The Lumen Database collects and analyzes legal complaints and requests for removal of online materials.
Ruby HTML CSS Other
Branch: dev
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Add pull request template Jul 30, 2018
app Styling tweak Jan 31, 2020
bin Run rails app:update Nov 15, 2019
config Make blog feed Jan 30, 2020
db Update documentation Jan 30, 2020
doc Update documentation Jan 30, 2020
lib Remove dependencies between tests Jan 30, 2020
log Initial commit May 20, 2013
paperclip Move paperclip storage for FileUploads and guard with a controller. Oct 27, 2015
public Rip out static pages Jan 29, 2020
script Remove delayed_job May 14, 2014
spec Delete unused code Feb 4, 2020
.gitignore Rip out static pages Jan 29, 2020
.rspec Initial commit May 20, 2013
.rubocop.yml Add turnout back Jan 22, 2019
.ruby-version Upgrade to ruby 2.4.x Nov 15, 2019
.travis.yml Upgrade to rails 5.0 Nov 15, 2019 Update Feb 4, 2020
Gemfile Allow parallelizing tests Feb 3, 2020
Gemfile.lock Merge pull request #586 from berkmancenter/dependabot/bundler/simple_… Feb 4, 2020
Procfile Remove delayed_job May 14, 2014 Simplify setup with a bin/setup script Jun 13, 2013
Rakefile Revert "Add turnout back in" Sep 7, 2018 Initial commit May 20, 2013

Build Status Coverage Status Code Climate

Lumen Database

The Lumen Database collects and analyzes legal complaints and requests for removal of online materials, helping Internet users to know their rights and understand the law. These data enable us to study the prevalence of legal threats and let Internet users see the source of content removals.

Automated Submissions and Search Using the API

The main Lumen Database instance has an API that allows individuals and organizations that receive large numbers of notices to submit them without using the web interface. The API also provides an easy way for researchers to search the database. Members of the public can test the database, but will likely need to request an API key from the Lumen team in order to receive a token that provides full access. To learn about the capabilities of the API you can consult the API documentation.



  • ruby 2.5.5
  • PostgreSQL 9.6
  • Elasticsearch 5.6.x
  • Java Runtime Environment (OpenJDK works fine)
  • Piwik Tracking (only used in prod)
  • Mail server (SMTP, Sendmail)
  • phantomjs (used only by test runner)


By default the app will try to connect to Elasticsearch on http://localhost:9200. If you want to use a different host set the ELASTICSEARCH_URL environment variable.

$ bundle install
$ cp config/database.yml.example config/database.yml
  (edit database.yml as you wish)
  (ensure PostgreSQL and Elasticsearch are running)
$ rails db:setup

Running the app

$ rails s

Viewing the app

$BROWSER 'http://localhost:3000'

You can customize behavior during seeding (db:setup) with a couple of environment variables:

  • NOTICE_COUNT=10 will generate 10 (or any number you pass it) notices instead of the default 500
  • SKIP_FAKE_DATA=1 will skip generating fake seed data entirely.

Sample user logins

The seed data creates logins of the following form:

Username: {username}
Password: password

username is one of {user, submitter, redactor, publisher, admin, super_admin}, with corresponding privileges.

If you seeded your database with an older version of seeds.rb, your username may use rather than

Running Tests

$ rspec

The integration tests are quite slow; for some development purposes you may find it more convenient to bundle exec rspec spec/ --exclude-pattern="spec/integration/*".

If elasticsearch isn't on your $PATH, set ENV['TEST_CLUSTER_COMMAND']=/path/to/elasticsearch, and make sure permissions are set correctly for your test suite to run it.

If you're running a subset of tests that you know don't require Elasticsearch, you can run them without setting it up via TEST_WITH_ELASTICSEARCH=0 rspec path/to/tests.

Parallelizing Tests

You can speed up tests by running them in parallel: $ rake parallel:spec

You will need to do some setup before the first time you run this:

  • alter config/database.yml so that the test database is yourproject_test<%= ENV['TEST_ENV_NUMBER'] %>
  • run rake parallel:setup

It will default to using the number of processors parallel_tests believes to be available, but you can change this by setting ENV['PARALLEL_TEST_PROCESSORS'] to the desired number.


Use rubocop and leave the code at least as clean as you found it. If you make linting-only changes, it's considerate to your code reviewer to keep them in their own commit.


  • Skylight
    • track page rendering time, count allocations, find possibly dodgy SQL
    • analytics to help you find the problem areas at a high level
    • login required
    • runs in prod
  • mini-profiler
    • available in dev by default
    • in use on prod, visible only to super_admins
    • in-depth memory profiling, stacktracing, and SQL queries; good for granular analysis
  • bullet
    • find N+1 queries and unused eager loading
    • runs in dev
    • logs to log/bullet.log
  • oink
    • memory usage, allocations
    • more specific than Skylight as to which objects are being created where
    • runs in dev by default; can run anywhere by setting ENV[LUMEN_USE_OINK] (ok to run in production)
    • logs to log/oink.log

Environment variables

Here are all the environment variables which Lumen recognizes. Find them in the code for documentation.

Environment variables should be set in .env and are managed by the dotenv gem. .env is not version-controlled so you can safely write secrets to it (but will also need to set these on all servers).

Unless setting an environment variable on the command line in the context of a command-line process, environment variables should ONLY be set in .env.

Most of these are optional and have sensible defaults (which may vary by environment).

  • BATCH_SIZE - batch size of model items indexed during each run of Elasticsearch re-indexing
  • BROWSER_VALIDATIONS - enable user html5 browser form validations
  • DEFAULT_SENDER - default mailer sender
  • ES_INDEX_SUFFIX - can be used to specify a suffix for the name of elasticsearch indexes
  • FILE_NAME - name of csv file to import as blog entries
  • from - a date formatted '%Y-%m-%d' for use in recreating elasticsearch indexes after said date
  • LOG_ELASTICSEARCH - only used in tests
  • NOTICE_COUNT - how many fake notices to create when seeding the db
  • RACK_ENV - don't use this; it's overridden by RAILS_ENV
  • RAILS_SERVE_STATIC_FILES - if present (with any value) will enable rails to serve static files
  • RETURN_PATH - default mailer return path
  • SEARCH_SLEEP - used in specs only, time out of Elasticsearch searches
  • SECRET_KEY_BASE - the Rails secret token; required in prod
  • SITE_HOST - site host, used in mailer templates
  • SKIP_FAKE_DATA - don't generate fake data when seeding the database
  • SMTP_ADDRESS - SMTP server address
  • SMTP_DOMAIN - SMTP server domain
  • SMTP_USERNAME - SMTP server username
  • SMTP_PASSWORD - SMTP server password
  • SMTP_PORT - SMTP server port
  • USER_CRON_EMAIL - for use in sending reports of court order files; can be a string or a list (in a JSON.parse-able format)
  • WEB_CONCURRENCY - number of Unicorn workers
  • WEB_TIMEOUT - Unicorn timeout
  • The following are used only for imports from oldchill:
    • RESTART_SEQUENCE_WITH - for compatibility between oldchill imports and new Lumen notices. Should not ever be needed at this point, nor have any effect in production.
    • WHERE

Email setup

The application requires a mail server, in development it's best to use a local SMTP server that will catch all outgoing emails. Mailcatcher is a good option.


The /blog_entries page can contain a google custom search engine that searches the Lumen blog. To enable, create a custom search engine here restricted to the path the blog lives at, for instance*. Extract the "cx" id from the javascript embed code and put it in the GOOGLE_CUSTOM_BLOG_SEARCH_ID environment variable. The blog search will appear after this variable has been configured.

Lumen API

You can search the database and, if you have a contributer token, add to the database using our API.

The Lumen API is documented in our GitHub Wiki:


Lumen Database is licensed under GPLv2. See LICENSE.txt for more information.


Copyright (c) 2016 President and Fellows of Harvard College

Performance Monitoring

View performance data on Skylight

You can’t perform that action at this time.