Skip to content

DonAurelio/text-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TextAnalyzer

This projects is dedicated to an University Assignment related with Natural Language Processing. The application was designed in python 2.7 with Django 1.9 and is composed by:

  • Tokenization and Morfological Analisys module (called morfo) using freeling and Python 2.7. This app takes a raw text and performs the corresponding Morfoligical Analysis.
  • The second module (textparser) covers Syntactic Analisys. It deals with the generation of syntactic trees using probabilistic models (Stanford and Bikel) given a raw text.

Running this project

To getting this projecto working we need to setting up the morfo and textparser modules. The configuration

TextAnalyser
│   README.md
│   requirements.txt    
│
└───tkmorfo
        applications
        |
        └───morfo
        |           
        └───textparser
                tools
                |
                └───helpers
                        00-raw
                        00
                        dbparser
                        parseval
                        stanford-parserfull-2015-12-09
                        stanford-postagger-2015-12-09
                        utils.py

Setting The Docker Container

This projects was designed into a container, The first module Tokenization and Morfological Analisys depends on freeling and python 2.7. You can find those package installed on this docker image.

The second module Syntatic Analisys depends of the following libraries

  • Dan Bikel’s Parsing Engine: dbparser.tar.gz

  • Penn Treebank based Trainning set: wsj-02-21.mrg.tar.gz

  • Evaluate the accurancy of the model: parseval.tar.gz

  • Test set: 00-raw.tar.gz

Those files can be found this. Other needed files are:

Runnig Graphical Applications Into a Contaner

To run the Syntactic Analisys module the container needs to be able to "show" or "create" grafical UIS. This allow the app to create the parse tree images generated with nltk.

apt-get install python-tk
apt-get update
apt-get install xvfb
apt-get install imagemagick

Then you need to run the following command every time that the container starts.

Xvfb :1 -screen 0 1024x768x16 &> xvfb.log  &
DISPLAY=:1.0
export DISPLAY

Installing Java for nltk Stanford Pos tagger and parser in the Container

echo deb http://http.debian.net/debian jessie-backports main >> /etc/apt/sources.list
apt-get update && apt-get install openjdk-8-jdk
update-alternatives --config java

References

[1] Image Viwer HTML Module

[2] Running a GUI Application in a Docker Container

[3] Draw Parse Trees with NLTK

[4] Installing Java 8

[5] ImagViwer