Skip to content

ConstantineXVI/book

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Taming Text, by Grant Ingersoll, Thomas Morton and Drew Farris is designed to teach software engineers the basic concepts of
 working with text to solve search and Natural Language Processing problems.  The book focuses on teaching using existing
 open source libraries like Apache Solr, Apache Mahout and Apache OpenNLP to manipulate text.  To learn more, visit http://www.manning.com/ingersoll.

Getting Started

Throughout this document, TT_HOME is the directory containing the checkout of the Taming Text code base.

Taming Text uses Maven for building and running the code.  To get started, you will
need:

1. JDK 1.6+
2. Maven 3.0 or higher
3. The OpenNLP English models, available at http://maven.tamingtext.com/opennlp-models/models-1.5.  Place them in the TT_HOME directory in a directory named opennlp-models.
  This can be done by using the following commands on UNIX:
  From the TT_HOME directory:
      mkdir opennlp-models
      cd opennlp-models
      wget -nd -np -r http://maven.tamingtext.com/opennlp-models/models-1.5/
      rm index.html*

4. Get WordNet 3.0 and place it in the TT_HOME directory.
  This can be done by using the following commands on UNIX:
  From the TT_HOME directory:
      wget -nd -np -m http://maven.tamingtext.com/wordnet/
      rm index.html*
      tar -xf Wordnet-3.0.tar.gz


Building the Source

To build the source, in TT_HOME:

1. mvn compile

Running the Tests

1. mvn test

Next Steps
=======

* mvn package // Prepares the jar files, etc. for execution

* Many of the examples can be run via the 'tt' script in the TT_HOME/bin directory. Running this script without arguments will display a list of the example names.

* Some of the samples are powered by pre-configured instances of solr. These can be started with the TT_HOME/bin/start-solr.sh script, which takes a single argument, the name of the instance to start. Available instances include solr-qa, solr-clustering and solr-tagging.

About

Taming Text Book Source Code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 60.7%
  • Shell 33.6%
  • JavaScript 5.4%
  • Perl 0.3%