Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge 21 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes #3

Open
RubenRT7 opened this issue Feb 16, 2024 · 4 comments
Assignees
Labels
ECMWF New feature or request Machine Learning Machine learning for Earth Sciences applications

Comments

@RubenRT7
Copy link
Contributor

RubenRT7 commented Feb 16, 2024

Challenge 21 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes

Stream 2 - Machine Learning for Earth Sciences applications

Goal

Develop a machine learning “chat” user interface to retrieve arbitrary user-specified queries from the Destination Earth Weather Extremes digital twin using Polytope. Time-permitting, a secondary goal would be to design a web user interface for this chatbot.

Mentors and skills

  • Mentors: Mathilde Leuridan, Adam Warde (all ECMWF)
  • Skills required:
    • Python programming
    • Some experience in ML/NLP/LLM
    • Problem solving

Challenge description

As part of the Destination Earth initiative, ECMWF is implementing a new data access mechanism called Polytope. This method allows users to extract complex shapes of data, such as 2D country cut-outs or 4D flight paths, from NWP datacubes instead of whole fields.

Within the algorithm itself, the Polytope software extracts convex polytopes from datacubes, but the software also comes with a mid-level interface which supports primitive shapes, like boxes or disks, as well as constructive geometry operations, such as taking unions of several shapes.

Building on top of this, we are currently developing a higher-level interface to Polytope which will support a range of more intricate domain-specific shapes like countries, timeseries or vertical profiles for meteorology applications. In order to do this, we are extending the MARS language to include a “feature” keyword which represents the shape we want to retrieve instead of the whole field. Whilst this higher-level interface is much more usable than the Polytope-native shapes, for non-technical stakeholders who would like to use Destination Earth data, defining the exact feature that they want to access might still be challenging.

In this project, we would thus like to go a step further and leverage ML techniques to build a chatbot which transforms more complex user queries, such as “Find the wind speed over tomorrow’s A150 flight from London to Paris”, directly into extended MARS requests with features. These MARS requests will then be called from Polytope to seamlessly retrieve custom data user requests.

The Polytope feature extraction library already exists, and can handle different levels of requests, so the main aim of the project would be to add the final ML layer. This layer could for example be implemented using chatGPT plugins, Poe (a tool for creating chatbots with various LLMs as backends) or any other LLM approach.

@EsperanzaCuartero EsperanzaCuartero changed the title # Challenge 03 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes Challenge 03 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes Feb 16, 2024
@EsperanzaCuartero EsperanzaCuartero changed the title Challenge 03 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes Challenge 09 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes Feb 22, 2024
@EsperanzaCuartero EsperanzaCuartero added the Machine Learning Machine learning for Earth Sciences applications label Feb 22, 2024
@EsperanzaCuartero EsperanzaCuartero changed the title Challenge 09 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes Challenge 21 - Polly: A Natural Language Processing Interface to Extract Complex Features from Weather Datacubes Feb 23, 2024
@RubenRT7 RubenRT7 added the ECMWF New feature or request label Mar 7, 2024
@cdrowley
Copy link

cdrowley commented Apr 5, 2024

Hello @mathleur and @awarde96,

We are a team of six MRes students from the Centre for Geospatial Science CDT, and we're currently preparing our submission for this challenge. To help polish the proposal, we have a couple of questions to ensure our approach aligns with the expectations and best uses the Polytope infrastructure:

  1. Given the types/complexity of the queries we'd aim to process and likely visualise, could you provide some insight into the limitations of the current mid-level interface of Polytope and how you envision the ML layer overcoming these limitations? Specifically, we're interested in understanding the challenges in translating natural language queries into the 'feature' keyword for MARS requests (beyond crafting clear prompts and providing the database schema and response options).

  2. Regarding the transferability and maintenance of the project, we are looking at an open-source solution that can be directly integrated into the ECMWF ecosystem -- namely a web mapping platform to directly map features and automatically plot to help users gain insights, in addition to optional charts, as the main front-end. Does that align with your expectations?

@mathleur
Copy link

mathleur commented Apr 5, 2024

Hi @cdrowley,

Thank you for the questions and we look forward to receiving your proposal!

To answer your questions:

  1. The limitations of the mid-level API is mostly that it only implements basic shapes like boxes or circles and unions of those shapes, so building more complex shapes like time series for example requires some notion of how to convert these simpler base shapes into spatio-temporal shapes that ECMWF users are typically interested in. For the challenge, a first step could be to build the LLM layer on top of our high-level API (polytope-mars) that already implements geospatial shapes like time series in terms of the mid-level API shapes, so the knowledge of how to separate these complex shapes into Polytope base shapes is not needed within the ML model. It would then be interesting to see if it's possible to build a more complex ML model that can convert arbitrary shapes into the base shapes from the mid-level API.

  2. I think such a web layer would be great and it is definitely something that would be useful to ECMWF! Probably for the sake of transferability and maintainability though, it would be best to separate concerns and first build a separate ML API and then maybe build a web layer on top of this after so that both components can work separately still. We are also currently developing some visualisation tools so it would be interesting to see if we could incorporate those in the web layer as well.

I hope this answers your questions, but let us know if something is unclear or you have more questions!

@Oraegbuayomide10
Copy link

Hello @mathleur and @awarde96,

@mathleur  based on my understanding of your previous response, the goal of this challenge is to integrate an LLM layer (chatbot) with your high-level API to extract crucial information (features) from users' complex queries. These features are then forwarded to MARS to formulate MARS requests, which are subsequently integrated with Polytope. Polytope will utilize the MARS request to retrieve custom data required by the user.

Could you confirm if this is the correct workflow, or if there are any additional steps I may have overlooked?

Thank you, as I look forward to your response!

@mathleur
Copy link

mathleur commented Apr 6, 2024

Hi @Oraegbuayomide10,

Thanks for the question!

Indeed, the goal of this challenge is to create a LLM layer on top of our high-level Polytope API.

However, the user requests (or "features") are not forwarded to MARS. Polytope is actually a parallel alternative service we are building to MARS so that we can extract non-box shapes of data. This is currently not supported by MARS and the MARS language so Polytope will enable this capability. Because Polytope will enable such more complex requests, we need to build more specialized APIs to help users request data. For this challenge, we would like to build an LLM layer as we believe this has the potential to be a great API to help users access the data that they need.

The real focus of this challenge however is to translate user queries and prompts directly to the shapes implemented in the mid-level Polytope API that already exists. Note that the main Polytope extraction algorithm already exists so the challenge is not to reimplement this, but rather build a new layer that uses this algorithm/service. Time-permitting, another part of the challenge could be to then create a nice web user interface which users can easily access!

I hope this helps and would be happy to answer any other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECMWF New feature or request Machine Learning Machine learning for Earth Sciences applications
Projects
None yet
Development

No branches or pull requests

7 participants