Skip to content

rug-compling/dep-brown-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dependency Brown clustering

Data and experimental details for:

Simon Šuster and Gertjan van Noord (2014) From neighborhood to parenthood: the advantages of dependency representation over bigrams in Brown clustering. COLING.

The following clusters were induced with dep-brown-cluster. If you use dependency Brown clusters (DepBrown), please cite the above paper.

Standard and dependency Brown clusters for Dutch

Standard and dependency Brown clusters for English (not in the paper)

Standard and dependency Brown clusters for Dutch: varying amount of data

1000 clusters, frequency cutoff 3.

Relation-specific dependency Brown clusters for Dutch

1000 clusters, frequency cutoff 3.

1st order:

2nd order:

Input data

Clusters were induced on the data sample from the SoNaR corpus: sentence ids.

Data preparation

To obtain the root forms of words, use the Alpino lexical analyzer with the following setting:

cat text | Alpino -notk batch_command=lex_all

text is sentences extracted from the SoNaR corpus based on the given sentence ids.

Dependency tuples are generated by Alpino in this way:

cat sentence_ids | xargs Alpino -treebank_dep_features 

Obtaining/removing second-order dependency tuples:

Simply query the tuples for lines beginning with hdpp, which stands for a 2nd-order relation (dep35 denotes a 1st-order relation).

Dependency-relation selection results

See the full list of results per dependency relation.

Questions?

Further experimental details and evaluation scripts available from the authors upon request.

About

Induced clusters with experimental details

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published