Skip to content

pangeo-data/egu-2025-course

Repository files navigation

EGU 2025 SC 4.14: Harnessing the Power of Pangeo

Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

This course is made possible thanks to the Pangeo@EOSC platform — a reference deployment of the Pangeo ecosystem on the European Open Science Cloud — developed with the support of [CESNET](https://www.cesnet.cz/en/) through the [EGI-ACE](https://youtu.be/Vc9SZNa2-Os) and [C-SCALE](https://youtu.be/-jBkR_2_vg8) projects. We gratefully acknowledge their contributions.

The analysis and visualisation of data is fundamental to research across the earth and space sciences. The Pangeo community has built an ecosystem of tools designed to simplify these workflows, centred around the Xarray library for n-dimensional data handling and Dask for parallel computing. In this short course, we will offer a gradual introduction to the Pangeo toolkit, through which participants will learn the skills required to scale their local scientific workflows through cloud computing or large HPC with minimal changes to existing codes. The course is beginner-friendly but assumes a prior understanding of the Python language. We will guide you through hands-on jupyter notebooks that showcase scalable analysis of in-situ, satellite observation and earth system modelling datasets to apply your learning. By the end of this course, you will understand how to:

  • Efficiently access large public data archives from Cloud storage using the Pangeo ecosystem of open source software and infrastructure.
  • Leverage labelled arrays in Xarray to build accessible, reproducible workflows.
  • Use chunking to scale a scientific data analysis with Dask.

All the Python packages and training materials used are open-source (e.g., MIT, Apache-2, CC-BY-4). Participants will need a laptop and internet access but will not need to install anything. We will be using the free and open Pangeo@EOSC (European Open Science Cloud) platform for this course. We encourage attendees from all career stages and fields of study (e.g., atmospheric sciences, cryosphere, climate, geodesy, ocean sciences) to join us for this short course. We look forward to an interactive session and will be hosting a Q&A and discussion forum at the end of the course, including opportunities to get more involved in Pangeo and open source software development. Join us to learn about open, reproducible, and scalable Earth science!

Prerequisites

We recommend learners with no prior knowledge of Python review resources such as the Software Carpentry training material and Project Pythia in advance of this short course. Participants should bring a laptop with an internet connection. No software installation is required as resources will be accessed online using the Pangeo@EOSC platform. Temporary user accounts will be provided for the course and we will also teach attendees how to request an account on Pangeo@EOSC to continue working on the platform after the training course.

Set up

If you are participating in this short course, you are welcome to register to Pangeo@EOSC.

First, navigate to https://aai.egi.eu/signup to sign up for an account.

Then, navigate to https://aai.egi.eu/auth/realms/id/account/#/enroll?groupPath=/vo.pangeo.eu to request access.

Lastly, navigate to Access Pangeo@EOSC via https://pangeo-eosc.vm.fedcloud.eu/ and sign in. Select the quay.io/pangeo/pangeo-notebook option.

About

Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •