-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Multilingual Neural RDF Verbalizer #31
Comments
This problem statement really interests me, I'll look around with the papers mentioned and get back with queries if stuck in between. |
This problem is similar to what I am working on right now, and it's very interesting. I have created a repo and will upload the summaries of papers that are mentioned here along with my own improvement ideas, please go through them and give feedback. Also, porting the NeuralREG code to pytorch might be really helpful and pytext is a boon for data preprocessing. |
@DwaraknathT I have been following your summaries, let me know if you need some help. We can have a talk as well. Looking forward to next summaries. |
@SilentFlame How about you? How is it going with the papers? |
@DiegoMoussallem Thank you, I appreciate any comments or suggestions you have for me. Right now, I'm trying to reimplement NeuralREG code but with transformers, to see how much gains we might be abe to get. The fundamental algorithm is same, but it was just a hunch to understand the code properly. May I contact you on slack for further discussions ? |
of course, you may contact me. I got your point about checking with the Transformer, but it might not be so necessary at the moment. |
I find this problem interesting. We could improve the state-of-the-art by implementing a Transformer architecture(A. Vaswani et. al.) for these very same papers. |
Hi @aditya-malte It sounds good. Could you send me a message with the details? |
Hi @DiegoMoussallem ! I feel this problem statement well aligns with my current research interests and would love to work on it. I agree with @aditya-malte , transformer architecture should show improvement, but I feel the extra computational complexity introduced by that architecture should be compensated by introducing efficient attention models such as lightweight and dynamic convolutional attention mentioned here. Quasi-RNNs should be swapped with vanilla-LSTMs. This will allow us to generate even longer and descriptive sentences. I would love to work on this if this is not already taken. |
@ovshake Thanks for being interested in this project. Your plan sounds good, I am looking forward to seeing your proposal. Feel free to contact me. |
@DiegoMoussallem Does "Multilingual" mean the model should be able to generate the referring expression in a language other than the language in which the RDF is given or the model should be able to handle any language as the input RDF (other than the language it is trained on) and should give the RF in the same language? |
Hi @ovshake, "Multilingual" means to be able to generate a natural language sentence in multiple languages from a given RDF input <s,p,o>. It is not only about the referring expression generation, but it must also include a complete verbalization, i.e., with articles, verbs and etc. |
Hi @DiegoMoussallem, I have some questions regarding my proposal, which platform should I use reach out to you that is most convenient to you? |
Hey @ovshake feel free to contact me on skype diegomoussallem |
I have messaged you on skype. |
@DiegoMoussallem I have shared my proposal on the GSoC platform. Do review it at your convenience. |
Description:
Natural Language Generation (NLG) is the process of generating coherent natural language text from non-linguistic data (Reiter and Dale, 2000). Despite community agreement on the actual text and speech output of these systems, there is far less consensus on what the input should be (Gatt and Krahmer, 2017). A large number of inputs have been taken for NLG systems, including images (Xu et al., 2015), numeric data (Gkatzia et al., 2014), semantic representations (Theune et al., 2001) and Semantic Web (SW) data (Ngonga Ngomo et al., 2013; Bouayad-Agha et al., 2014). Presently, the generation of natural language from SW, more precisely from RDF data, has gained substantial attention (Bouayad-Agha et al., 2014; Staykova, 2014). Some challenges have been proposed to investigate the quality of automatically generated texts from RDF (Colin et al., 2016). Moreover, RDF has demonstrated a promising ability to support the creation of NLG benchmarks (Gardent et al., 2017). However, English is the only language which has been widely targeted. Even though there are studies which explore the generation of content in languages other than English, to the best of our knowledge, no work has been proposed to train a multilingual neural model for generating texts in different languages from RDF data.
Goals:
In this GSoc Project, the candidate is entitled to train a multilingual neural model which is capable of generating natural language sentences from DBpedia RDF triples.
Impact:
The project may allow users to generate automatically short summaries about entities which do not have a human abstract using triples.
Warm-up tasks:
Read the papers:
NeuralREG: An end-to-end approach to referring expression generation
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
BENGAL: An Automatic Benchmark Generator for Entity Recognition
and Linking
Download and get familiar with the code of papers above.
https://github.com/DiegoMoussallem/RDF2NL.
https://github.com/dice-group/RDF2PT
https://github.com/dice-group/BENGAL
https://github.com/ThiagoCF05/NeuralREG
Mentors
Diego Moussallem
Keywords
NLG, Semantic Web, NLP
The text was updated successfully, but these errors were encountered: