A project of the Library of Congress. Note: project members may work on both official Library of Congress projects and non-LC projects. Project mailing list can be found at http://listserv.loc.gov/archives/chronam-users.html.
Python HTML CSS Shell JavaScript
Permalink
Failed to load latest commit information.
conf Scheduling the cache clearing job to run at 5am Jan 11, 2017
core fixing unit test error Feb 22, 2017
data the batches directory needs to be there for tests Sep 8, 2012
example screenshots Jul 24, 2013
loc reverting commits from March 4, 2014 Sep 26, 2016
scripts Adding the script that I said I added in the last commit message.....… Jun 7, 2013
static want to avoid crawlers sucking down the largish ocr dump files Sep 17, 2012
vagrant vagrant local development environment Apr 29, 2016
.gitignore refs #118 use a different storage path for testing purpose to avoid d… May 6, 2016
CHANGELOG.md getting ready to tag v3.9.0 Jul 30, 2013
README.md updated unit testing instructions May 4, 2016
Vagrantfile vagrant local development environment Apr 29, 2016
__init__.py Reorganized repo: merged in chronam-loc and chronam-core May 30, 2012
install_redhat.md Adding JAVA 8 JDK intructions May 2, 2016
install_ubuntu.md refs #132 - Updating solr update link Sep 28, 2016
requirements.pip Requirements: use compatible versions of Celery & Kombu Dec 2, 2016
requirements_loc.pip fixed errors in loc specific requirements file May 4, 2016
settings_jenkins.py refs #118 use a different storage path for testing purpose to avoid d… May 6, 2016
settings_loc.py fixing the Omniture link Nov 3, 2016
settings_template.py removed work from PR #84 Sep 26, 2016
settings_test.py change the batch_storage directory when doing tests in case we aren't… Feb 22, 2017
urls.py removed work from PR #84 Sep 26, 2016

README.md

chronam

chronam is the Django application that the Library of Congress uses to make its Chronicling America website. The Chronicling America website makes millions of pages of historic American newspapers that have been digitized by the National Digital Newspaper Program (NDNP) browsable and searchable on the Web. A little bit of background is needed to understand why this software is being made available.

NDNP is actually a partnership between the Library of Congress, the National Endowment for the Humanities (NEH), and cultural heritage organizations (awardees) across the United States who have applied for grants to help digitize newspapers in their state. Awardees digitize newspaper microfilm according to a set of specifications and then ship the data back to the Library of Congress where it is loaded into Chronicling America.

Awardee institutions are able to use this data however they want, including creating their own websites that highlight their newspaper content in the local context of their own collections. The idea of making chronam available here on Github is to provide a technical option to these awardees, or other interested parties who want to make their own websites of NDNP newspaper content available. chronam provides a core set of functionality for loading, modeling and indexing NDNP data, while allowing you to customize the look and feel of the website to suit the needs of your organization.

The NDNP data is in the Public Domain and is itself available on the Web for anyone to use. The hope is that the chronam software can be useful for others who want to work with and/or publish the content.

Install

System level dependencies can be installed by following these operating system specific instructions:

After you have installed the system level dependencies you will need to install some application specific dependencies, and configure the application.

First you will need to set up the local Python environment and install some Python dependencies:

cd /opt/chronam/
virtualenv -p /usr/bin/python2.7 ENV
source /opt/chronam/ENV/bin/activate
cp conf/chronam.pth ENV/lib/python2.7/site-packages/chronam.pth
pip install -r requirements.pip

Next you need to create some directories for data:

mkdir /opt/chronam/data/batches
mkdir /opt/chronam/data/cache
mkdir /opt/chronam/data/bib

And you will need a MySQL database. If this is a new server, you will need to start MySQL and assign it a root password:

sudo service mysqld start
/usr/bin/mysqladmin -u root password '' # pick a real password

You will probably want to change the password 'pick_one' in the example below to something else:

echo "DROP DATABASE IF EXISTS chronam; CREATE DATABASE chronam CHARACTER SET utf8; GRANT ALL ON chronam.* to 'chronam'@'localhost' identified by 'pick_one'; GRANT ALL ON test_chronam.* TO 'chronam'@'localhost' identified by 'pick_one';" | mysql -u root -p

You will need to use the settings template to create your application settings. Add your database password to the settings.py file:

cp /opt/chronam/settings_template.py /opt/chronam/settings.py

For Django management commands to work you will need to have the DJANGO_SETTINGS_MODULE environment variable set. You may want to add this to your ~/.profile so you do not need to remember to do it everytime you log in.

export DJANGO_SETTINGS_MODULE=chronam.settings

Next you will need to initialize database schema and load some initial data:

django-admin.py migrate
django-admin.py loaddata initial_data
django-admin.py chronam_sync --skip-essays

And finally you will need to collect static files (stylesheets, images) for serving up by Apache in production settings:

django-admin.py collectstatic --noinput

Load Data

As mentioned above, the NDNP data that awardees create and ship to the Library of Congress is in the public domain and is made available on the Web as batches. Each batch contains newsaper issues for one or more newspaper titles. To use chronam you will need to have some of this batch data to load. If you are an awardee you probably have this data on hand already, but if not you can use a tool like wget to bulk download the batches. For example:

cd /opt/chronam/data/
wget --recursive --no-host-directories --cut-dirs 1 --reject index.html* --include-directories /data/batches/batch_uuml_thys_ver01/ http://chroniclingamerica.loc.gov/data/batches/batch_uuml_thys_ver01/

In order to load data you will need to run the load_batch management command by passing it the full path to the batch directory. So assuming you have downloaded batch_uuml_thys_ver01 you will want to:

django-admin.py load_batch /opt/chronam/data/batches/batch_uuml_thys_ver01

If this is a new server, you may need to start the web server:

sudo service httpd start

After this completes you should be able to view the batch in the batches report via the Web:

http://www.example.org/batches/

Run Unit Tests

For the unit tests to work you will need to have the batch_uuml_thys_ver01 available. You can use the wget command in the previous section to get get it. After that you should be able to:

cd /opt/chronam/
django-admin.py test chronam.core.tests --settings=chronam.settings_test

License

This software is in the Public Domain.