Simon Šuster and Gertjan van Noord (2014) From neighborhood to parenthood: the advantages of dependency representation over bigrams in Brown clustering. COLING.
The following clusters were induced with dep-brown-cluster. If you use dependency Brown clusters (DepBrown), please cite the above paper.
- 1000 clusters, frequency cutoff 3: DepBrown, Brown
- 1000 clusters, frequency cutoff 5: DepBrown, Brown
- 1000 clusters, frequency cutoff 10: DepBrown, Brown
- 1000 clusters, frequency cutoff 20: DepBrown, Brown
- 1000 clusters, frequency cutoff 30: DepBrown, Brown, DepBrownHTML
- 1000 clusters, frequency cutoff 50: DepBrown, Brown
- 3200 clusters, frequency cutoff 50: DepBrown, Brown
- 200 clusters, frequency cutoff 10: DepBrown, Brown
- 400 clusters, frequency cutoff 10: DepBrown, Brown
- 600 clusters, frequency cutoff 10: DepBrown, Brown
- 800 clusters, frequency cutoff 10: DepBrown, Brown
- 1000 clusters, frequency cutoff 1: DepBrown, Brown
- 1000 clusters, frequency cutoff 3: DepBrown, Brown
- 1000 clusters, frequency cutoff 50: DepBrown, Brown, EnDepBrownHTML
- 3200 clusters, frequency cutoff 3: DepBrown, Brown
- 3200 clusters, frequency cutoff 50: DepBrown, Brown
1000 clusters, frequency cutoff 3.
- 10k sentences: DepBrown, Brown
- 50k sentences: DepBrown, Brown
- 100k sentences: DepBrown, Brown
- 200k sentences: DepBrown, Brown
- 400k sentences: DepBrown, Brown
- 600k sentences: DepBrown, Brown
- 800k sentences: DepBrown, Brown
- 1000k sentences: DepBrown, Brown
- 1200k sentences: DepBrown, Brown
- 1400k sentences: DepBrown, Brown
- 1600k sentences: DepBrown, Brown
- 1800k sentences: DepBrown, Brown
- 2000k sentences: DepBrown, Brown
- 2200k sentences: DepBrown, Brown
- 2400k sentences: DepBrown, Brown
- 2600k sentences: DepBrown, Brown
1000 clusters, frequency cutoff 3.
1st order:
2nd order:
Clusters were induced on the data sample from the SoNaR corpus: sentence ids.
To obtain the root forms of words, use the Alpino lexical analyzer with the following setting:
cat text | Alpino -notk batch_command=lex_all
text
is sentences extracted from the SoNaR corpus based on the given sentence ids.
Dependency tuples are generated by Alpino in this way:
cat sentence_ids | xargs Alpino -treebank_dep_features
Obtaining/removing second-order dependency tuples:
Simply query the tuples for lines beginning with hdpp
, which stands for a 2nd-order relation (dep35
denotes a 1st-order relation).
See the full list of results per dependency relation.
Further experimental details and evaluation scripts available from the authors upon request.