LangClassifiers-P2

All of the code was implemented on Python 3.6 and Sklearn 0.19.

classifiers.py implements most of the classification methods, namely:

'NB' our own implementation of Naive Bayes 'RF' sklearn's Random Forests Classifier 'AB-RF' sklearn's Random Forests Classifier with Adaboost 'AB-SGD' sklearn's Stochastic Gradient Descent Classifier with Adaboost 'NBM' sklearn's Multinomial Naive Bayes 'MLP' sklearn's Multi-layer Perceptron

The classifier type can be selected by changing the line

CLF_type = 'NB'

to match any of the options listed above.

Before running the classifier, you must first compute the feature dictionary using

python featureExtractor.py

After featureExtractor was run, you must run

python classifiers.py

Further information on the functionality of the individual components of the code can be found in each of the respective .py files as comments.

The output .csv file will be generated in the same folder as classifiers.py and will have a name in form of {CLASSIFIER_NAME}.csv, where CLASSIFIER_NAME will be a string describing the classifier type and some of its key hyperparameters.

##############################################################################################################################################

Code below runs on Python 2.

./Naive_Bayes_Bonnie --> Please check ./Naive_Bayes_Bonnie/README for running instructions

##############################################################################################################################################

K-Nearest Neighbors

All of the code was implemented in Python 3.

Run featureExtractor.py with the desired datasets optional:
- names can be specified in trainSetXFilename, trainSetYFilename, testSetXFilename
- forceBuildNewDictionary and forceBuildNewBestDictionary are toggles for creating a new processed dataset or loading a previously made one
- run the file to make sure the dictionary with the respective name is present in the data folder
Run kNN_kmeans: optional:
- adjust parameters (k, num_clust)
- include the name of the dictionary for vectorization as pickle_name
- sort_train and sort_test - toggles for vectorizing new datasets or loading previously processed ones
- raw_train is processing of the train set without the language sorting - required for validation check
- as a default, the script will generate the test_y file for a given test_x. Set validation to True if you want to run a validation check in case your test set

The output file will be generated ("submission.csv").

Validation: As a default, the script will generate the test_y file for a given test_x. If you require a validation check: see commented lines

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Naive Bayes - Bonnie		Naive Bayes - Bonnie
data		data
papers		papers
.gitignore		.gitignore
README.md		README.md
classifier_naiveBayes.py		classifier_naiveBayes.py
classifiers.py		classifiers.py
dataPlotter.py		dataPlotter.py
datasetCleaner.py		datasetCleaner.py
fakeTestSetGenerator.py		fakeTestSetGenerator.py
featureExtractor.py		featureExtractor.py
kNN_kmeans.py		kNN_kmeans.py
ngramGenerator.py		ngramGenerator.py
result_merged-3.csv		result_merged-3.csv
resultsComparator.py		resultsComparator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangClassifiers-P2

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LangClassifiers-P2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages