GitHub - dpla-attic/ingestion: The DPLA ingestion system

The DPLA Ingestion System

Build Status

Documentation

Please see the release notes regarding changes and upgrade steps.

Setting up the ingestion server:

Install Python 2.7 if not already installed (http://www.python.org/download/releases/2.7/);

Install PIP (http://pip.readthedocs.org/en/latest/installing.html);

Install the ingestion subsystem;

$ pip install --no-deps --ignore-installed -r requirements.txt

Configure an akara.ini file appropriately for your environment;

[Akara]
Port=<port for Akara to run on>
; Recommended LogLevel is one of DEBUG or INFO
LogLevel=<priority>

[CouchDb]
Url=<URL to CouchDB instance, including trailing forward-slash>
Username=<CouchDB username>
Password=<CouchDB password>
SyncQAViews=<True or False; consider False on production>
; Recommended LogLevel is INFO for production; defaults to INFO if not set
LogLevel=<priority>

[Twofishes]
BaseUrl=<URL to Twofishes server, required for geo-enrichments>

[Rackspace]
Username=<Rackspace username>
ApiKey=<Rackspace API key>
DPLAContainer=<Rackspace container for bulk download data>
SitemapContainer=<Rackspace container for sitemap files>

[APITokens]
NYPL=<Your NYPL API token>

[Sitemap]
SitemapURI=<Sitemap URI>
SitemapPath=<Path to local directory for sitemap files>

[Alert]
To=<Comma-separated email addresses to receive alert email>
From=<Email address to send alert email>

[Enrichment]
QueueSize=4
ThreadCount=4

Merge the akara.conf.template and akara.ini file to create the akara.conf file;

$ python setup.py install

Set up and start the Akara server;

$ akara -f akara.conf setup
$ akara -f akara.conf start

Build the database views;

$ python scripts/sync_couch_views.py dpla
$ python scripts/sync_couch_views.py dashboard
$ python scripts/sync_couch_views.py bulk_download

Testing the ingestion server:

You can test it with this set description from Clemson;

$ curl "http://localhost:8889/oai.listrecords.json?endpoint=http://repository.clemson.edu/cgi-bin/oai.exe&oaiset=jfb&limit=10"

If you have the endpoint URL but not a set id, there's a separate service for listing the sets;

$ curl "http://localhost:8889/oai.listsets.json?endpoint=http://repository.clemson.edu/cgi-bin/oai.exe&limit=10"

To run the ingest process run the setup.py script, if not done so already, initialize the database and database views, then feed it a source profile (found in the profiles directory);

$ python setup.py install
$ python scripts/sync_couch_views.py dpla
$ python scripts/sync_couch_views.py dashboard
$ python scripts/ingest_provider.py profiles/clemson.pjs

License

This application is released under a AGPLv3 license.

Name		Name	Last commit message	Last commit date
Latest commit History 1,765 Commits
couchdb_views		couchdb_views
lib		lib
profiles		profiles
scripts		scripts
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
akara.conf.template		akara.conf.template
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The DPLA Ingestion System

Build Status

Documentation

License

About

Releases 25

Packages

Contributors 11

Languages

dpla-attic/ingestion

Folders and files

Latest commit

History

Repository files navigation

The DPLA Ingestion System

Build Status

Documentation

License

About

Resources

Stars

Watchers

Forks

Releases 25

Packages 0

Contributors 11

Languages

Packages