Skip to content

BerlinFrontiers

anonymous edited this page Oct 9, 2011 · 39 revisions

Discussion: Pushing the Frontiers, Strategies for Dissemination

Expanding our research to new languages and applications and setting new research goals

Moderator: HansUszoreit Scribe: ValiaKordoni

slides

Shared tasks and resources:

- common benchmark for base coverage: parallel corpora treebanks for the participating languages

- shared tasks for HPSG processing: a. abstract processing exercises, b. processing with respect to concrete applications

- shared tasks for applications

- cross-framework evaluations

Applications:

- information management (relation extraction, incl. event and opinion detection)

- machine translation (in combination with other checking methods)

- grammar checking (in combination with other checking methods)

- dialogue systems (e.g., for web agents and computer games)

- Others?

Steps towards applications:

HU: generation, exploitation of application semantics for getting to the meaning of applications;

AC: we should work on resource semantics for different applications

Steps towards shared tasks and resources:

Shared corpora:

  • -- Europarl -- parallel corpora based on touristic brochures, guides, etc., which are already translated in many languages, but which will also have to be translated to many more languages
    • AC: we should start with setting up the necessary machinery, even with smaller treebanks, even of different kinds of texts SO: a single coherent corpus DF: we collect the corpus by picking up parts/sentences from different kinds of texts for the various participating languages/grammars

      HU's proposed strategy to be adopted for the near future: a. collect the languages which will participate, b. get the people/sites who will be responsible for finding the corpus, c. choose texts, which are not too highly marked stylistically and have been translated to many other languages --> city/region descriptions, cathedral essay (on Francis' suggestion, translated into all the languages we are working on, approx. 800 sentences), novels, linux/technical documentation, everything, d. do the languages' matrix and see whether there would still be gaps; Tasks: -- languages: en (Stanford/Oslo), no (Trondheim), pt (Lisbon), es/ca (Barcelona), ja (Kyoto), de (Saarbrücken), el (Saarbrücken/Athens), sw (Linköping), fr (Toulouse), zh (Saarbrücken), ko (Seoul?); -- Saarbrücken builds the Wiki page by the 1st of September; -- the groups mentioned above submit the texts and translations to the Wiki page by mid October, accompanying them, in a prose field, with short description in order to know how the text in the various translations correlate to each other; -- guidelines Wiki subpage to be created by Oslo;