Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Multilingual Neural RDF Verbalizer #31

Closed
DiegoMoussallem opened this issue Jan 10, 2019 · 16 comments
Closed

A Multilingual Neural RDF Verbalizer #31

DiegoMoussallem opened this issue Jan 10, 2019 · 16 comments
Labels
gsoc-2019 Google Summer of Code 2019. project-idea This idea has not been accepted yet as a project.

Comments

@DiegoMoussallem
Copy link
Contributor

Description:

Natural Language Generation (NLG) is the process of generating coherent natural language text from non-linguistic data (Reiter and Dale, 2000). Despite community agreement on the actual text and speech output of these systems, there is far less consensus on what the input should be (Gatt and Krahmer, 2017). A large number of inputs have been taken for NLG systems, including images (Xu et al., 2015), numeric data (Gkatzia et al., 2014), semantic representations (Theune et al., 2001) and Semantic Web (SW) data (Ngonga Ngomo et al., 2013; Bouayad-Agha et al., 2014). Presently, the generation of natural language from SW, more precisely from RDF data, has gained substantial attention (Bouayad-Agha et al., 2014; Staykova, 2014). Some challenges have been proposed to investigate the quality of automatically generated texts from RDF (Colin et al., 2016). Moreover, RDF has demonstrated a promising ability to support the creation of NLG benchmarks (Gardent et al., 2017). However, English is the only language which has been widely targeted. Even though there are studies which explore the generation of content in languages other than English, to the best of our knowledge, no work has been proposed to train a multilingual neural model for generating texts in different languages from RDF data.

Goals:

In this GSoc Project, the candidate is entitled to train a multilingual neural model which is capable of generating natural language sentences from DBpedia RDF triples.

Impact:

The project may allow users to generate automatically short summaries about entities which do not have a human abstract using triples.

Warm-up tasks:

Mentors

Diego Moussallem

Keywords

NLG, Semantic Web, NLP

@mommi84 mommi84 added project-idea This idea has not been accepted yet as a project. gsoc-2019 Google Summer of Code 2019. labels Jan 11, 2019
@SilentFlame
Copy link

This problem statement really interests me, I'll look around with the papers mentioned and get back with queries if stuck in between.

@DwaraknathT
Copy link

This problem is similar to what I am working on right now, and it's very interesting. I have created a repo and will upload the summaries of papers that are mentioned here along with my own improvement ideas, please go through them and give feedback. Also, porting the NeuralREG code to pytorch might be really helpful and pytext is a boon for data preprocessing.
https://github.com/DwaraknathT/NLG-

@DiegoMoussallem
Copy link
Contributor Author

@DwaraknathT I have been following your summaries, let me know if you need some help. We can have a talk as well. Looking forward to next summaries.

@DiegoMoussallem
Copy link
Contributor Author

@SilentFlame How about you? How is it going with the papers?

@DwaraknathT
Copy link

DwaraknathT commented Jan 31, 2019

@DiegoMoussallem Thank you, I appreciate any comments or suggestions you have for me. Right now, I'm trying to reimplement NeuralREG code but with transformers, to see how much gains we might be abe to get. The fundamental algorithm is same, but it was just a hunch to understand the code properly. May I contact you on slack for further discussions ?

@DiegoMoussallem
Copy link
Contributor Author

of course, you may contact me. I got your point about checking with the Transformer, but it might not be so necessary at the moment.

@aditya-malte
Copy link

aditya-malte commented Mar 10, 2019

I find this problem interesting. We could improve the state-of-the-art by implementing a Transformer architecture(A. Vaswani et. al.) for these very same papers.

@DiegoMoussallem
Copy link
Contributor Author

Hi @aditya-malte It sounds good. Could you send me a message with the details?

@ovshake
Copy link

ovshake commented Mar 31, 2019

Hi @DiegoMoussallem ! I feel this problem statement well aligns with my current research interests and would love to work on it. I agree with @aditya-malte , transformer architecture should show improvement, but I feel the extra computational complexity introduced by that architecture should be compensated by introducing efficient attention models such as lightweight and dynamic convolutional attention mentioned here. Quasi-RNNs should be swapped with vanilla-LSTMs. This will allow us to generate even longer and descriptive sentences. I would love to work on this if this is not already taken.

@DiegoMoussallem
Copy link
Contributor Author

@ovshake Thanks for being interested in this project. Your plan sounds good, I am looking forward to seeing your proposal. Feel free to contact me.

@ovshake
Copy link

ovshake commented Apr 2, 2019

@DiegoMoussallem Does "Multilingual" mean the model should be able to generate the referring expression in a language other than the language in which the RDF is given or the model should be able to handle any language as the input RDF (other than the language it is trained on) and should give the RF in the same language?

@DiegoMoussallem
Copy link
Contributor Author

Hi @ovshake, "Multilingual" means to be able to generate a natural language sentence in multiple languages from a given RDF input <s,p,o>. It is not only about the referring expression generation, but it must also include a complete verbalization, i.e., with articles, verbs and etc.

@ovshake
Copy link

ovshake commented Apr 3, 2019

Hi @DiegoMoussallem, I have some questions regarding my proposal, which platform should I use reach out to you that is most convenient to you?

@DiegoMoussallem
Copy link
Contributor Author

Hey @ovshake feel free to contact me on skype diegomoussallem

@ovshake
Copy link

ovshake commented Apr 4, 2019

I have messaged you on skype.

@ovshake
Copy link

ovshake commented Apr 5, 2019

@DiegoMoussallem I have shared my proposal on the GSoC platform. Do review it at your convenience.

@mommi84 mommi84 closed this as completed May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc-2019 Google Summer of Code 2019. project-idea This idea has not been accepted yet as a project.
Projects
None yet
Development

No branches or pull requests

6 participants