A project of the Library of Congress. Note: project members may work on both official Library of Congress projects and non-LC projects. Project mailing list can be found at http://listserv.loc.gov/archives/chronam-users.html.
Permalink
Failed to load latest commit information.
conf Tidy Apache 2.4 configuration Aug 29, 2018
core Blacken context_processors.py Sep 11, 2018
example screenshots Jul 24, 2013
loc Whitespace cleanup Sep 11, 2018
scripts PEP-8 trailing whitespace Jun 21, 2018
static Remove .keep files Aug 23, 2018
vagrant Docs: update Django settings files discussion Oct 30, 2017
.editorconfig Add editor config Jan 5, 2018
.gitignore adding idea folder to gitignore Sep 26, 2017
CHANGELOG.md getting ready to tag v3.9.0 Jul 30, 2013
README.md Follow LSB FHS for application deployment Aug 23, 2018
Vagrantfile vagrant local development environment Apr 29, 2016
__init__.py Reorganized repo: merged in chronam-loc and chronam-core May 30, 2012
install_redhat.md refs CHRONAM-41 - updated documentation to include steps to install p… May 24, 2018
install_ubuntu.md refs CHRONAM-41 - updated documentation to include steps to install p… May 24, 2018
package.json Add prettier to dev tools Jan 5, 2018
requirements.pip fixing pip requirements by not listing requests twice May 24, 2018
requirements_loc.pip updated the location of the extra pip requirements since we don't use… May 4, 2017
settings_jenkins.py PEP-8 Continuation identation Jun 21, 2018
settings_loc.py fixing django settings Dec 8, 2017
settings_template.py Enable persistent database connections by default Sep 4, 2018
settings_test.py Clean trailing whitespace Jun 21, 2018
setup.cfg Add setup.cfg with isort & pycodestyle settings Oct 31, 2017
urls.py removed redundant listing in urls.py and updated /awardees/<AWARDEE> … Jul 5, 2018

README.md

chronam

chronam is the Django application that the Library of Congress uses to make its Chronicling America website. The Chronicling America website makes millions of pages of historic American newspapers that have been digitized by the National Digital Newspaper Program (NDNP) browsable and searchable on the Web. A little bit of background is needed to understand why this software is being made available.

NDNP is actually a partnership between the Library of Congress, the National Endowment for the Humanities (NEH), and cultural heritage organizations (awardees) across the United States who have applied for grants to help digitize newspapers in their state. Awardees digitize newspaper microfilm according to a set of specifications and then ship the data back to the Library of Congress where it is loaded into Chronicling America.

Awardee institutions are able to use this data however they want, including creating their own websites that highlight their newspaper content in the local context of their own collections. The idea of making chronam available here on Github is to provide a technical option to these awardees, or other interested parties who want to make their own websites of NDNP newspaper content available. chronam provides a core set of functionality for loading, modeling and indexing NDNP data, while allowing you to customize the look and feel of the website to suit the needs of your organization.

The NDNP data is in the Public Domain and is itself available on the Web for anyone to use. The hope is that the chronam software can be useful for others who want to work with and/or publish the content.

Install

System level dependencies can be installed by following these operating system specific instructions:

After you have installed the system level dependencies you will need to install some application specific dependencies, and configure the application.

First you will need to set up the local Python environment and install some Python dependencies:

cd /opt/chronam/
virtualenv -p /usr/bin/python2.7 ENV
source /opt/chronam/ENV/bin/activate
cp conf/chronam.pth ENV/lib/python2.7/site-packages/chronam.pth
pip install -r requirements.pip

Next you need to create some directories for data:

mkdir /srv/chronam/batches
mkdir /srv/chronam/cache
mkdir /srv/chronam/bib

And you will need a MySQL database. If this is a new server, you will need to start MySQL and assign it a root password:

sudo service mysqld start
/usr/bin/mysqladmin -u root password '' # pick a real password

You will probably want to change the password 'pick_one' in the example below to something else:

echo "DROP DATABASE IF EXISTS chronam; CREATE DATABASE chronam CHARACTER SET utf8mb4; CREATE USER 'chronam'@'localhost' IDENTIFIED BY 'pick_one'; GRANT ALL ON chronam.* to 'chronam'@'localhost'; GRANT ALL ON test_chronam.* TO 'chronam'@'localhost';" | mysql -u root -p

You will need to create a Django settings file which uses the default settings and sets custom values specific to your site:

  1. Create a settings.py file in the chronam directory which imports the default values from the provided template for possible customization:

     echo 'from chronam.settings_template import *' > /opt/chronam/settings.py
    
  2. Ensure that the DJANGO_SETTINGS_MODULE environment variable is set to chronam.settings before you start a Django management command. This can be set as a user-wide default in your ~/.profile or but the recommended way is simply to make it part of the virtualenv activation process::

     echo 'export DJANGO_SETTINGS_MODULE=chronam.settings' >> /opt/chronam/ENV/bin/activate
    
  3. Add your database password to the settings.py file following the standard Django settings documentation:

     DATABASES = {
         'default': {
             'ENGINE': 'django.db.backends.mysql',
             'NAME': 'chronam_db',
             'USER': 'chronam_user',
             'HOST': 'mysql.example.org',
             'PASSWORD': 'NotTheRealPassword',
         }
     }
    

You should never edit the settings_template.py file since that may change in the next release but you may wish to periodically review the list of changes to that file in case you need to update your local settings.

Next you will need to initialize database schema and load some initial data:

django-admin.py migrate
django-admin.py loaddata initial_data
django-admin.py chronam_sync --skip-essays

And finally you will need to collect static files (stylesheets, images) for serving up by Apache in production settings:

django-admin.py collectstatic --noinput

Load Data

As mentioned above, the NDNP data that awardees create and ship to the Library of Congress is in the public domain and is made available on the Web as batches. Each batch contains newsaper issues for one or more newspaper titles. To use chronam you will need to have some of this batch data to load. If you are an awardee you probably have this data on hand already, but if not you can use a tool like wget to bulk download the batches. For example:

cd /srv/chronam/
wget --recursive --no-host-directories --cut-dirs 1 --reject index.html* --include-directories /data/batches/batch_uuml_thys_ver01/ http://chroniclingamerica.loc.gov/data/batches/batch_uuml_thys_ver01/

In order to load data you will need to run the load_batch management command by passing it the full path to the batch directory. So assuming you have downloaded batch_uuml_thys_ver01 you will want to:

django-admin.py load_batch /srv/chronam/batches/batch_uuml_thys_ver01

If this is a new server, you may need to start the web server:

sudo service httpd start

After this completes you should be able to view the batch in the batches report via the Web:

http://www.example.org/batches/

Caching

After loading data, you will need to clear the cache. If you are using a reverse proxie (like Varnish) you will need to also clear that, as well as any CDN you have. Below is a list of URLS that should be cleared based on what content you are loading.

All pages that contain a LCCN are tagged with that LCCN in the cache headers. This allows for purging by specific LCCN tag if there is a update to a batch.

List of URLs to purge when loading new batch

  • All URLs tagged with lccn=<LCCN>
  • All URLs matching these patterns:
    chroniclingamerica.loc.gov/tabs
    chroniclingamerica.loc.gov/sitemap*
    chroniclingamerica.loc.gov/frontpages*
    chroniclingamerica.loc.gov/titles*
    chroniclingamerica.loc.gov/states*
    chroniclingamerica.loc.gov/counties*
    chroniclingamerica.loc.gov/states_counties*
    chroniclingamerica.loc.gov/cities*
    chroniclingamerica.loc.gov/batches/summary*
    chroniclingamerica.loc.gov/reels*
    chroniclingamerica.loc.gov/reel*
    chroniclingamerica.loc.gov/essays*
    

List of URLs to purge when loading new Awardees

  • All URLs matching chroniclingamerica.loc.gov/awardees*

List of URLs to purge when loading new basic data

  • All URLs matching chroniclingamerica.loc.gov/institutions*

List of URLs to purge when loading code

  • All URLs matching these patterns:
    chroniclingamerica.loc.gov/ocr
    chroniclingamerica.loc.gov/about
    chroniclingamerica.loc.gov/about/api
    chroniclingamerica.loc.gov/help
    

Run Unit Tests

For the unit tests to work you will need:

  • to have the batch_uuml_thys_ver01 available. You can use the wget command in the previous section to get get it.
  • A local SOLR instance running
  • A local MySQL database
  • Access to the Essay Editor Feed

After that you should be able to:

cd /opt/chronam/
django-admin.py test chronam.core.tests --settings=chronam.settings_test

License

This software is in the Public Domain.