StackoverflowQuestions

How to use:

Download the training and testing data from the competition and place it in a 'data' folder (create one if it does not exist).
Run scripts/deduplicate.py to remove duplicate samples of the training data set and store the repeated testing set indices.
Run 'python trainer.py generatePreprocess' to create the tfidf and cv models.
Run scripts/calculate_distribution.py to create the inverse tag ordering/mapping.
Run 'python trainer.py' to generate the model.
Run 'python predictor.py' to generate predictions on the non-repeated test samples.
Run scripts/zipper.py to generate the final submission file.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
preprocessing		preprocessing
scripts		scripts
trainers		trainers
LICENSE		LICENSE
README.md		README.md
predictor.py		predictor.py
trainer.py		trainer.py

Provide feedback