Skip to content

adria-p/StackoverflowQuestions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StackoverflowQuestions

Keyword extraction. More details on http://www.kaggle.com/c/facebook-recruiting-iii-keyword-extraction

How to use:

  • Download the training and testing data from the competition and place it in a 'data' folder (create one if it does not exist).
  • Run scripts/deduplicate.py to remove duplicate samples of the training data set and store the repeated testing set indices.
  • Run 'python trainer.py generatePreprocess' to create the tfidf and cv models.
  • Run scripts/calculate_distribution.py to create the inverse tag ordering/mapping.
  • Run 'python trainer.py' to generate the model.
  • Run 'python predictor.py' to generate predictions on the non-repeated test samples.
  • Run scripts/zipper.py to generate the final submission file.

Releases

No releases published

Packages

No packages published

Languages