Skip to content
This repository has been archived by the owner on May 16, 2019. It is now read-only.

Text mining for the Chronicling America collection of the Library of Congress

Notifications You must be signed in to change notification settings

UUDigitalHumanitieslab/LoCMiner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LoCMiner

The LoCMiner is a simple web application that allows users to search the Chronicling America collection of the Library of Congress and then easily export their data to text mining tools of their choice.

Back-end

The LoCMiner is written in the Flask microframework with the SQLAlchemy extension for ORM. All requests to the Chronicling America collection are processed using the Requests: HTTP for Humans package.

For a local installation, the following steps should be sufficient:

> sudo apt install postgresql

Setup a PostgreSQL database user and place the database configuration in config.py.

> git clone https://github.com/UUDigitalHumanitieslab/LoCMiner.git
> cd LoCMiner
> pip install -r requirements.txt
> python2 run.py

This will start the web interface. To process searches, you should start Redis (usually booted on start-up) and Celery (in a separate shell):

> celery -A LoCMiner.tasks worker

You can specify your settings in LoCMiner/config.py. If you want to use the DevelopmentConfig, be sure to change this in both run.py and LoCMiner/factories.py.

The user interface should now be reachable from http://localhost:5000.

Front-end

On the front-end the PureCSS package is used primarily for the lay-out. The following JavaScript libraries are employed:

Text Mining

Texcavator

The application allows for synergy with the text mining tool Texcavator. Your saved searches can be indexed to an Elasticsearch cluster via the pyelasticsearch package. You can then freely search your results with Texcavator.

Voyant

The application can return a simple output file for use in the online text mining tool Voyant.

Export to .csv and .txt

Finally, the application allows for simple exports of both metadata (to a .csv-file) and full-text (to .txt-files).

Demo

A demonstrator is available here. Currently access is limited to a select number of Utrecht University students and employees. If you want a peek, contact the Digital Humanities lab.

About

Text mining for the Chronicling America collection of the Library of Congress

Resources

Stars

Watchers

Forks

Packages

No packages published