QualsParsing
Writeup: http://arxiv.org/abs/1211.0074
Slides: http://tinyurl.com/alexr-quals-volume-1
Do something cool with shift-reduce dependency parsing.
Use this data here:
http://ilk.uvt.nl/conll/free_data.html
For transition-based (shift-reduce) dependency parsing, what are the learning curves for different machine learning algorithms? What does this say about them, if anything? We want to produce a table like...
Bulgarian LAS | 10k sentences | 20k sentences | 1 million sentences | |||||
libsvm | 0.5 | 0.6 | 0.7 | |||||
weka j48 | 0.4 | 0.4 | 0.9 |
... or whatever.
Think about training set error/accuracy too!
Hook up timbl to weka.
Use MaltParser, which already includes plugins to wrap around other machine learning algorithms. It's just not that hard to add new classes that implement that same interface, is it?
I'm in a hurry, but before I forget it again: I would have a suggestion for a research question on dep. parsing: What if you took malt, or Rehj's re-implementation and tried to plug in other machine learners and compare how much influence the ML component has on the results. E.g. how the different ones react to different size of the training set, how much parameter tuning they need, how they influence parsing speed. This would definitely be interesting to know :)
Related links...
http://maltparser.org/conll/conll07/
http://maltparser.org/conll/conllx/
They carefully broke all the old links, but here's the new page: http://depparse.uvt.nl/
http://researchweb.iiit.ac.in/~samar/data/Kolachinaetal-cameraready.pdf
http://www.lrec-conf.org/proceedings/lrec2006/pdf/162_pdf.pdf
http://acl.ldc.upenn.edu/D/D07/D07-1097.pdf
http://w3.msi.vxu.se/~nivre/papers/nivre_hall_2005.pdf
http://acl.ldc.upenn.edu/W/W04/W04-2407.pdf
http://www.maltparser.org/guides/opt/quick-opt.pdf
http://www.ryanmcd.com/courses/esslli2007/
http://www.sussex.ac.uk/Users/davidw/courses/le/resources/seminars/deppar/questions.pdf
http://stp.lingfil.uu.se/~nivre/docs/acl10.pdf
http://stp.ling.uu.se/~nivre/gslt/haulrich.pdf
http://aclweb.org/anthology/P/P10/P10-3010.pdf
http://www.aclweb.org/anthology/P/P11/P11-1068.pdf
http://www.evalita.it/sites/evalita.fbk.eu/files/working_notes2011/Parsing/DEP_PARS_UNIPI.pdf
http://www.ryanmcd.com/papers/multiclustNAACL2012.pdf
http://www.hall.maltparser.org/cv/pub/msireport06050_johan_hall_lic_final.pdf
http://www.hall.maltparser.org/cv/pub/johan_hall_phdthesis.pdf
http://www.maltparser.org/publications.html
http://www.maltparser.org/conll/conllx/
http://w3.msi.vxu.se/users/jha/conllx/
http://www.maltparser.org/conll/conll07/
http://en.wikipedia.org/wiki/Multinomial_logit
http://www.rulequest.com/Personal/
Implement my own shift-reduce parser using Scala. Made non-trivial progress on this, but it seems kind of tedious: there are lots of sub-tasks, and while they're good for practice in programming Scala and will get me really familiar with the task, maybe it's too much stuff to do in a month, considering how late it's already getting? (also, it's more reproducible: people already know and like MaltParser, why rewrite?)
subparts.
- convert into a Nivre-eager parser. (The open question is: is this the best parsing algorithm? I guess the right answer here is, look at what they did for the 2006 and 2007 shared tasks...)
- load up sentences from treebank
- the format is pretty straightforward
- convert from loaded sentences to a series of configurations, so we can produce classifier training samples
- plug in Weka and train those classifiers.
- run the classifiers; produce parses
- calculate LAS and unlabeled accuracy score.
- plot the learning curves for each language, for each type of classifier. Also consider different features for different classifiers.
- http://stp.ling.uu.se/~nivre/gslt/haulrich.pdf
- http://www.cs.waikato.ac.nz/~eibe/pubs/guetlein_et_al.pdf
CategoryQuals