Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge 24 - Knowledge Graph Generation for Enhanced Chatbot and Scientific Literature Synthesis #7

Open
RubenRT7 opened this issue Feb 16, 2024 · 8 comments
Assignees
Labels
ECMWF New feature or request Machine Learning Machine learning for Earth Sciences applications

Comments

@RubenRT7
Copy link
Contributor

RubenRT7 commented Feb 16, 2024

Challenge 24 - Knowledge Graph Generation for Enhanced Chatbot and Scientific Literature Synthesis

Stream 2 - Machine Learning for Earth Sciences applications

Goal

In this challenge, participants will build a tool to enhance the existing ECMWF chatbot*. The goal is to create a system that can understand scientific texts about weather and use that information to make interactive graphs to facilitate better understanding and exploration of weather-related concepts and phenomena.

Mentors and skills

  • Mentors: Ana Prieto Nemesio, Florian Pinault, Baudouin Raoult (all ECMWF)
  • Skills required:
    • Python
    • Machine Learning
    • Data Science and NLP
    • Knowledge of Graph Theory and Knowledge Representation
    • Possible extra: Confluence plugins and macros (Java, Velocity)

Challenge description

Knowledge graphs help by structuring knowledge, making it easier to understand and use. By adding knowledge graphs to chatbots and search engines, user experience can be enhanced. Additionally, a tool capable of generating interactive content graphs could allow simply the process of synthesizing scientific literature, making it easier to explore and analyze connections between ideas.

Challenge Tasks

  • Data Acquisition and Preprocessing - Parsing and Entity Extraction:
    Participants will collect relevant datasets, including scientific literature, weather data, domain-specific ontologies, etc.
    They will then preprocess the data to extract key entities, relationships, and metadata necessary for knowledge graph generation. For this, they will leverage existing functionality developed for the company's internal chatbot around data parsing and entity extraction.
  • Knowledge Graph Generation:
    Participants will design and implement algorithms and techniques to generate knowledge graphs from the acquired data, aiming to capture meaningful relationships between entities, concepts, and events.
    They'll refine the graphs by clarifying entity meanings, resolving entity ambiguity, refining relationships, and enriching entity attributes.
  • Interactive Visualization and Exploration:
    Participants will develop interfaces or tools to interactively visualize and explore the generated knowledge graphs. The visualization should facilitate intuitive navigation, querying, and exploration of the underlying knowledge graph.*
  • Integration with Chatbot/Search Engine:
    Participants will integrate the developed knowledge graph system with the existing ECMWF ChatBOT. This integration will allow users to access insights and information from the knowledge graphs directly through the chatbot or search engine interface.

Dataset
Participants will be provided with datasets containing scientific literature, weather data, domain-specific ontologies, and any other relevant sources, including preliminary entity dataset extracted from the chatbot development.

@EsperanzaCuartero EsperanzaCuartero changed the title Challenge 07 - Knowledge Graph Generation for Enhanced Chatbot and Scientific Literature Synthesis Challenge 12 - Knowledge Graph Generation for Enhanced Chatbot and Scientific Literature Synthesis Feb 22, 2024
@EsperanzaCuartero EsperanzaCuartero added the Machine Learning Machine learning for Earth Sciences applications label Feb 22, 2024
@EsperanzaCuartero EsperanzaCuartero changed the title Challenge 12 - Knowledge Graph Generation for Enhanced Chatbot and Scientific Literature Synthesis Challenge 24 - Knowledge Graph Generation for Enhanced Chatbot and Scientific Literature Synthesis Feb 23, 2024
@RubenRT7 RubenRT7 added the ECMWF New feature or request label Mar 7, 2024
@Ben-EJ
Copy link

Ben-EJ commented Mar 21, 2024

Hello, could you clarify the following questions please. What are the challenges users face regarding user experience interacting with the chatbot, and how can a knowledge graph address them effectively? Who is going to use the chatbot, is it going to be the public or researchers, for example? Thank you.

@lincent
Copy link

lincent commented Mar 21, 2024

Hello, just trying to work out what information will end up in the knowledge graph, can you give examples of what you mean by weather data? is it numerical data in forecast GRIB/netCDF format or more plain language based?

@anaprietonem
Copy link

Hello, could you please clarify the following questions please. What are the challenges users face regarding user experience interacting with the chatbot, and how can a knowledge graph address them effectively? Who is going to use the chatbot, is it going to be the public or researchers, for example? Thank you.

Hello, thanks for your question. Regarding the user experience what we would like as a potential outcome of this challenge is that chatbot could also point users to the knowledge graph so users could also explore the information in an interactive way. But just to flag that the integration between chatbot-knowledge graph would be explored if there is enough time. We would like to keep the main focus around building an interactive knowledge graph and if useful reusing some tools for NLP that were used to implement the chatbot. The chatbot is public and can be found here https://chat.ecmwf.int/

@anaprietonem
Copy link

Hello, just trying to work out what information will end up in the knowledge graph, can you give examples of what you mean by weather data? is it numerical data in forecast GRIB/netCDF format or more plain language based?

Thanks for the question! With weather data we were referring to more plain language based

@tjohnson-scottlogic
Copy link

Hi, we’d like to try the process of acquiring data, processing data, generating the knowledge graph and then visualising that graph with a simple set of initial data. Could you recommend what we’d start with? Ideally the simplest possible that allows us to test the process. Would it be possible to see that data before we submit our proposal?

@anaprietonem
Copy link

anaprietonem commented Mar 25, 2024

Hi, we’d like to try the process of acquiring data, processing data, generating the knowledge graph and then visualising that graph with a simple set of initial data. Could you recommend what we’d start with? Ideally the simplest possible that allows us to test the process. Would it be possible to see that data before we submit our proposal?

Hello, unfortunately I don't think it's possible to share the data ahead of the proposal stage. However you can find examples of the technical texts related to NWP under https://www.ecmwf.int/en/publications/technical-memoranda. Other option of plausible data could be getting arxiv papers related to 'AI-based weather forecasting'. Currently, https://www.connectedpapers.com/ provides generation of knowledge graphs for topics like this but their graphs are based on citations and not on the entities/context of the papers
Screenshot 2024-03-25 at 16 47 38

@Viet1004
Copy link

Hello. I have a question. We have a couple of people, but some of them are interested in some projects but not others. And they can have multiple interests. My question is, can we participate in multiple proposal rounds? And if one person applies for multiple proposals and these proposals pass (supposedly), can they withdraw from the project without affecting other teammates?

@trakasa
Copy link
Contributor

trakasa commented Mar 30, 2024

Hi @Viet1004
many thanks for your interest in this challenge, respectively in the Code for Earth.

Generally participants can submit a proposal for more than one challenge. Also individuals can be members of different teams submitting a proposal. Please note, you have to submit your proposal by 9. April, then we enter the selection process, there won't be other "proposal rounds".

Withdrawing from a team, if you are team member in various selected teams, is possible. As you wrote, this should of course not affect the team in working and providing the solution described in their proposal.

We highly recommend though that you carefully check the challenges you are interested in, get in touch with the mentors for any related questions and chose the one(s) you have really a vetted interest in.

Last but not least, be sure you have checked the eligibility rules, Article 3 in the Terms&Conditions.

I hope this answers your questions.

Bye, Athina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECMWF New feature or request Machine Learning Machine learning for Earth Sciences applications
Projects
None yet
Development

No branches or pull requests

9 participants