QualsParsing

Slides: http://tinyurl.com/alexr-quals-volume-1

Do something cool with shift-reduce dependency parsing.

Use this data here:

http://ilk.uvt.nl/conll/free_data.html

quals question

For transition-based (shift-reduce) dependency parsing, what are the learning curves for different machine learning algorithms? What does this say about them, if anything? We want to produce a table like...

Bulgarian LAS	10k sentences	20k sentences	1 million sentences
libsvm	0.5	0.6	0.7
weka j48	0.4	0.4	0.9

... or whatever.

more things to report

Think about training set error/accuracy too!

Hook up timbl to weka.

plan 2

Use MaltParser, which already includes plugins to wrap around other machine learning algorithms. It's just not that hard to add new classes that implement that same interface, is it?

Sandra says

I'm in a hurry, but before I forget it again: I would have a suggestion for a research question on dep. parsing: What if you took malt, or Rehj's re-implementation and tried to plug in other machine learners and compare how much influence the ML component has on the results. E.g. how the different ones react to different size of the training set, how much parameter tuning they need, how they influence parsing speed. This would definitely be interesting to know :)

papers to read and cite

http://researchweb.iiit.ac.in/~samar/data/Kolachinaetal-cameraready.pdf

http://www.lrec-conf.org/proceedings/lrec2006/pdf/162_pdf.pdf

http://acl.ldc.upenn.edu/D/D07/D07-1097.pdf

http://w3.msi.vxu.se/~nivre/papers/nivre_hall_2005.pdf

old plan

Implement my own shift-reduce parser using Scala. Made non-trivial progress on this, but it seems kind of tedious: there are lots of sub-tasks, and while they're good for practice in programming Scala and will get me really familiar with the task, maybe it's too much stuff to do in a month, considering how late it's already getting? (also, it's more reproducible: people already know and like MaltParser, why rewrite?)

subparts.

convert into a Nivre-eager parser. (The open question is: is this the best parsing algorithm? I guess the right answer here is, look at what they did for the 2006 and 2007 shared tasks...)
load up sentences from treebank
the format is pretty straightforward
convert from loaded sentences to a series of configurations, so we can produce classifier training samples
plug in Weka and train those classifiers.
run the classifiers; produce parses
calculate LAS and unlabeled accuracy score.
plot the learning curves for each language, for each type of classifier. Also consider different features for different classifiers.

some more links

CategoryQuals

(this space intentionally left blank)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly