Skip to content

inpho/corpus-builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InPhO Corpus Builder

The InPhO Corpus Builder matches a plaintext bibliography to volumes in the HathiTrust.

Prequisites

  1. Install anystyle.io and Ruby dependencies by following the directions:

    sudo apt-get install ruby-dev
    gem install --user-install anystyle
    

    Note: you might need to add the ~/.gem/ruby/2.3.0/bin dir to your PATH. The gem install will tell you.

  2. Install rython using distribute:

    pip install rython
    

Usage

  1. Use parse.py to parse a file to the JSON format for the browser:

    python parse.py FILENAME
    
  2. Launch the Corpus Builder:

    python server.py -p 9024
    
  3. Open the Corpus Builder in a browser: http://localhost:9024/

  4. When finished, use extractids.py to create a file with 1 HathiTrust ID per line for use with corpus download tools.

    python extractids.py www/out.json
    

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published