How to run this program
Required Installation The program used a few auxiliary libraries. These auxiliary libraries include NLTK, numpy, pandas, csv. Please pip/conda install these packages before running the program
Training Process
To train this program, you need to include 3 parameters:
- examples - the training example file.
- hypothesisOut - the file to write the model to - This is a name that you can make up
- learning-type - which learning algorithm to train. It is either “dt” (DecisionTree) or “ada” (Adaboost)
Usage python train.py
Training files are available in the processed_data directory. Under this directory, there is a train.txt and test.txt
- train.txt - For training
- test.txt - For validation, checking what the accuracy the classifier achieved in this test.txt
Warning The adaboost takes a while to train as it is creating a decision stump each time. To loop through 10 stumps, it may take up to a minute.
Information For decision tree, when the training is completed, it will generate a .py file. It is creating a python if statement classifiers.
Example Run:
Training using Adaboost python train.py processed_data/train.txt adaboostclassifier.txt ada
Training using Decision Tree python train.py processed_data/train.txt decisiontreeClassifier.py dt
Prediction
To train this program, you need to include 2 parameters:
- hypothesis - the best classifier generated by decision tree or adaboost
- file - the test file
The prediction will print out whether it predicted it as english or dutch in the console
Usage python predict.py
Predicting Example Run: python predict.py dt processed_data/test1_prof.dat