The LoCMiner is a simple web application that allows users to search the Chronicling America collection of the Library of Congress and then easily export their data to text mining tools of their choice.
The LoCMiner is written in the Flask microframework with the SQLAlchemy extension for ORM. All requests to the Chronicling America collection are processed using the Requests: HTTP for Humans package.
For a local installation, the following steps should be sufficient:
> sudo apt install postgresql
Setup a PostgreSQL database user and place the database configuration in config.py
.
> git clone https://github.com/UUDigitalHumanitieslab/LoCMiner.git
> cd LoCMiner
> pip install -r requirements.txt
> python2 run.py
This will start the web interface. To process searches, you should start Redis (usually booted on start-up) and Celery (in a separate shell):
> celery -A LoCMiner.tasks worker
You can specify your settings in LoCMiner/config.py
.
If you want to use the DevelopmentConfig, be sure to change this in
both run.py
and LoCMiner/factories.py
.
The user interface should now be reachable from http://localhost:5000
.
On the front-end the PureCSS package is used primarily for the lay-out. The following JavaScript libraries are employed:
The application allows for synergy with the text mining tool Texcavator. Your saved searches can be indexed to an Elasticsearch cluster via the pyelasticsearch package. You can then freely search your results with Texcavator.
The application can return a simple output file for use in the online text mining tool Voyant.
Finally, the application allows for simple exports of both metadata (to a .csv-file) and full-text (to .txt-files).
A demonstrator is available here. Currently access is limited to a select number of Utrecht University students and employees. If you want a peek, contact the Digital Humanities lab.