A python project that classifies text and parts of speech.
This program requires the NLTK package to be installed. Follow the instructions for your machine from the following link.
NLTK is distributed under the Apache 2.0 license
Python 2.7.10 was used for development. To check your python version run the following.
$ python --version
You will also need the following NLTK models.
- averaged_perceptron_tagger
- punkt
- tagsets
To install these packages start up your python interpreter and run
>>> import nltk
>>> nltk.download()
This will open the downloader utility which you can then use to install the listed models.
This is used to generate the trained data. In particular it will generate pos_pos.txt
and neg_pos.txt
. This data is later used by the miner to find our interesting patterns.
To run the tagger...
$ python tag.py
This is used to generate the trained data. In particular it will generate positive_POS_dict.json
and negative_POS_dict_pos.json
. This data is later used by the classifier to classify test data.
To run the miner...
$ python mine.py
This is used to test the training data. In particular it will run the classifier on Test/positivetestdata/
and Test/negativetestdata/
. The results are published in testResults.txt
.
To run the classifier...
$ python classify.py
An interactive classifier. It will publish the results to testResults.txt
.
To run the interactive classifier...
$ python interactive.py
Enter a file name or a directory > Test/negativetestdata/
...
All of the testing and training data used was from the Stanford link.
It has been extracted and placed in accessible locations for our program.
- Authors: Trevor Blanchard, Brett Small, Samuel Peers, Stefan Harris
- e-mail: umblanc3@myumanitoba.ca, umsmallb@myumanitoba.ca, peerss@myumanitoba.ca, stefanjharris@gmail.com