A Multilingual Neural RDF Verbalizer #31

DiegoMoussallem · 2019-01-10T19:26:52Z

Description:

Natural Language Generation (NLG) is the process of generating coherent natural language text from non-linguistic data (Reiter and Dale, 2000). Despite community agreement on the actual text and speech output of these systems, there is far less consensus on what the input should be (Gatt and Krahmer, 2017). A large number of inputs have been taken for NLG systems, including images (Xu et al., 2015), numeric data (Gkatzia et al., 2014), semantic representations (Theune et al., 2001) and Semantic Web (SW) data (Ngonga Ngomo et al., 2013; Bouayad-Agha et al., 2014). Presently, the generation of natural language from SW, more precisely from RDF data, has gained substantial attention (Bouayad-Agha et al., 2014; Staykova, 2014). Some challenges have been proposed to investigate the quality of automatically generated texts from RDF (Colin et al., 2016). Moreover, RDF has demonstrated a promising ability to support the creation of NLG benchmarks (Gardent et al., 2017). However, English is the only language which has been widely targeted. Even though there are studies which explore the generation of content in languages other than English, to the best of our knowledge, no work has been proposed to train a multilingual neural model for generating texts in different languages from RDF data.

Goals:

In this GSoc Project, the candidate is entitled to train a multilingual neural model which is capable of generating natural language sentences from DBpedia RDF triples.

Impact:

The project may allow users to generate automatically short summaries about entities which do not have a human abstract using triples.

Warm-up tasks:

Read the papers:

NeuralREG: An end-to-end approach to referring expression generation
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
BENGAL: An Automatic Benchmark Generator for Entity Recognition
and Linking
Download and get familiar with the code of papers above.
https://github.com/DiegoMoussallem/RDF2NL.
https://github.com/dice-group/RDF2PT
https://github.com/dice-group/BENGAL
https://github.com/ThiagoCF05/NeuralREG

Mentors

Diego Moussallem

Keywords

NLG, Semantic Web, NLP

SilentFlame · 2019-01-15T17:17:00Z

This problem statement really interests me, I'll look around with the papers mentioned and get back with queries if stuck in between.

DwaraknathT · 2019-01-20T16:59:57Z

This problem is similar to what I am working on right now, and it's very interesting. I have created a repo and will upload the summaries of papers that are mentioned here along with my own improvement ideas, please go through them and give feedback. Also, porting the NeuralREG code to pytorch might be really helpful and pytext is a boon for data preprocessing.
https://github.com/DwaraknathT/NLG-

DiegoMoussallem · 2019-01-31T13:44:26Z

@DwaraknathT I have been following your summaries, let me know if you need some help. We can have a talk as well. Looking forward to next summaries.

DiegoMoussallem · 2019-01-31T13:47:32Z

@SilentFlame How about you? How is it going with the papers?

DwaraknathT · 2019-01-31T14:35:26Z

@DiegoMoussallem Thank you, I appreciate any comments or suggestions you have for me. Right now, I'm trying to reimplement NeuralREG code but with transformers, to see how much gains we might be abe to get. The fundamental algorithm is same, but it was just a hunch to understand the code properly. May I contact you on slack for further discussions ?

DiegoMoussallem · 2019-02-01T10:22:59Z

of course, you may contact me. I got your point about checking with the Transformer, but it might not be so necessary at the moment.

aditya-malte · 2019-03-10T03:59:38Z

I find this problem interesting. We could improve the state-of-the-art by implementing a Transformer architecture(A. Vaswani et. al.) for these very same papers.

DiegoMoussallem · 2019-03-12T13:15:24Z

Hi @aditya-malte It sounds good. Could you send me a message with the details?

ovshake · 2019-03-31T08:10:00Z

Hi @DiegoMoussallem ! I feel this problem statement well aligns with my current research interests and would love to work on it. I agree with @aditya-malte , transformer architecture should show improvement, but I feel the extra computational complexity introduced by that architecture should be compensated by introducing efficient attention models such as lightweight and dynamic convolutional attention mentioned here. Quasi-RNNs should be swapped with vanilla-LSTMs. This will allow us to generate even longer and descriptive sentences. I would love to work on this if this is not already taken.

DiegoMoussallem · 2019-03-31T08:54:54Z

@ovshake Thanks for being interested in this project. Your plan sounds good, I am looking forward to seeing your proposal. Feel free to contact me.

ovshake · 2019-04-02T17:54:03Z

@DiegoMoussallem Does "Multilingual" mean the model should be able to generate the referring expression in a language other than the language in which the RDF is given or the model should be able to handle any language as the input RDF (other than the language it is trained on) and should give the RF in the same language?

DiegoMoussallem · 2019-04-02T18:24:24Z

Hi @ovshake, "Multilingual" means to be able to generate a natural language sentence in multiple languages from a given RDF input <s,p,o>. It is not only about the referring expression generation, but it must also include a complete verbalization, i.e., with articles, verbs and etc.

ovshake · 2019-04-03T21:06:47Z

Hi @DiegoMoussallem, I have some questions regarding my proposal, which platform should I use reach out to you that is most convenient to you?

DiegoMoussallem · 2019-04-04T11:06:48Z

Hey @ovshake feel free to contact me on skype diegomoussallem

ovshake · 2019-04-04T19:33:44Z

I have messaged you on skype.

ovshake · 2019-04-05T07:23:39Z

@DiegoMoussallem I have shared my proposal on the GSoC platform. Do review it at your convenience.

mommi84 added project-idea This idea has not been accepted yet as a project. gsoc-2019 Google Summer of Code 2019. labels Jan 11, 2019

mommi84 closed this as completed May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Multilingual Neural RDF Verbalizer #31

A Multilingual Neural RDF Verbalizer #31

DiegoMoussallem commented Jan 10, 2019

SilentFlame commented Jan 15, 2019

DwaraknathT commented Jan 20, 2019

DiegoMoussallem commented Jan 31, 2019

DiegoMoussallem commented Jan 31, 2019

DwaraknathT commented Jan 31, 2019 •

edited

Loading

DiegoMoussallem commented Feb 1, 2019

aditya-malte commented Mar 10, 2019 •

edited

Loading

DiegoMoussallem commented Mar 12, 2019

ovshake commented Mar 31, 2019

DiegoMoussallem commented Mar 31, 2019

ovshake commented Apr 2, 2019

DiegoMoussallem commented Apr 2, 2019

ovshake commented Apr 3, 2019

DiegoMoussallem commented Apr 4, 2019

ovshake commented Apr 4, 2019

ovshake commented Apr 5, 2019 •

edited

Loading

A Multilingual Neural RDF Verbalizer #31

A Multilingual Neural RDF Verbalizer #31

Comments

DiegoMoussallem commented Jan 10, 2019

Description:

Goals:

Impact:

Warm-up tasks:

Mentors

Keywords

SilentFlame commented Jan 15, 2019

DwaraknathT commented Jan 20, 2019

DiegoMoussallem commented Jan 31, 2019

DiegoMoussallem commented Jan 31, 2019

DwaraknathT commented Jan 31, 2019 • edited Loading

DiegoMoussallem commented Feb 1, 2019

aditya-malte commented Mar 10, 2019 • edited Loading

DiegoMoussallem commented Mar 12, 2019

ovshake commented Mar 31, 2019

DiegoMoussallem commented Mar 31, 2019

ovshake commented Apr 2, 2019

DiegoMoussallem commented Apr 2, 2019

ovshake commented Apr 3, 2019

DiegoMoussallem commented Apr 4, 2019

ovshake commented Apr 4, 2019

ovshake commented Apr 5, 2019 • edited Loading

DwaraknathT commented Jan 31, 2019 •

edited

Loading

aditya-malte commented Mar 10, 2019 •

edited

Loading

ovshake commented Apr 5, 2019 •

edited

Loading