word sense induction
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
src
README.md
dom4j-2.0.0-ALPHA-2.jar
stanford-corenlp-2012-07-09.jar

README.md

Author: Do Kook Choe

This code is used for experiments described in "Naive Bayes Word Sense Induction."
You can download the SemEval 2010 Word Sense Induction task dataset at: http://www.cs.york.ac.uk/semeval2010_WSI/datasets.html.

USEAGE:

  1. cd src/
  2. ./compile.h
  3. ./run.h (with appropriate arguments)

DESCRIPTIONS OF FILES
in src:

  1. *.java are source files.
  2. compile.h compiles source files.
  3. run.h executes Experiment.class.

in data:

  1. smart_common_words.txt contains a list of stopwords from SMART IR engine.
  2. punctuation.txt contains a list of punctuation.
  3. nouns.txt and verbs.txt contains lists of target nouns and verbs respectively. These files are need to execute Experiment.

jars:

  1. dom4j.jar is to parse XML input. It is downloaded at http://dom4j.sourceforge.net/.
  2. stanford-corenlp-2012-07-09.jar is to tokenize sentences and lemmatize words. It is downloaded at http://nlp.stanford.edu/.