Skip to content

JLiu1272/LanguagePredictionUsingDecisionTree

Repository files navigation

LanguagePredictionUsingDecisionTree

How to run this program

Required Installation The program used a few auxiliary libraries. These auxiliary libraries include NLTK, numpy, pandas, csv. Please pip/conda install these packages before running the program

Training Process

To train this program, you need to include 3 parameters:

  1. examples - the training example file.
  2. hypothesisOut - the file to write the model to - This is a name that you can make up
  3. learning-type - which learning algorithm to train. It is either “dt” (DecisionTree) or “ada” (Adaboost)

Usage python train.py

Training files are available in the processed_data directory. Under this directory, there is a train.txt and test.txt

  1. train.txt - For training
  2. test.txt - For validation, checking what the accuracy the classifier achieved in this test.txt

Warning The adaboost takes a while to train as it is creating a decision stump each time. To loop through 10 stumps, it may take up to a minute.

Information For decision tree, when the training is completed, it will generate a .py file. It is creating a python if statement classifiers.

Example Run:

Training using Adaboost python train.py processed_data/train.txt adaboostclassifier.txt ada

Training using Decision Tree python train.py processed_data/train.txt decisiontreeClassifier.py dt

Prediction

To train this program, you need to include 2 parameters:

  1. hypothesis - the best classifier generated by decision tree or adaboost
  2. file - the test file

The prediction will print out whether it predicted it as english or dutch in the console

Usage python predict.py

Predicting Example Run: python predict.py dt processed_data/test1_prof.dat

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages