This is material for a tutorial on the COSIMA Cookbook for a 2023 Intersect course on HPC and Data in Computational Fluid Dynamics. It borrows very heavily from Aidan Heerdegen's training workshop and associated hive post (thanks Aidan!)
Lecture slides from the course are here. Contact me if you'd like to see the video recording of my lecture.
This tutorial is aimed at people who have some familiarity with the Python programming language but have never used the COSIMA Recipes for accessing and analysing COSIMA model output.
The tutorial will use the NCI ARE (Australian Research Environment) JupyterLab App. This is a user-friendly way to run Jupyter Notebooks at NCI, which makes it possible to access the data required for this tutorial.
The tutorial will provide hands-on experience in
- using Python in the JupyterLab App environment on NCI ARE
- accessing and manipulating data with the Xarray package
- using Dask to accelerate your analysis and handle larger datasets
- using the COSIMA Cookbook, Data Explorer and COSIMA Recipes
There are three prerequisites, two to access NCI resources, and one to get the jupyter python notebooks.
All perequisites need to be completed BEFORE the tutorial. We only have 90 min for the tutorial, and there will not be time to do administration and follow the tutorial.
You must have an NCI account. If you do not have an NCI account, follow the NCI instructions to create one.
This tutorial will require membership of particular projects at NCI. To participate you will need to follow the instructions sent by Meiyun. Please don't request membership of these projects directly through Mancini. You will be sent 3 project membership invitations which you need to accept WELL BEFORE the tutorial.
The tutorial uses some notebooks in a git
repository. You will need to download these notebooks to an NCI filesystem. The recommended location is your home directory on gadi
, as this is always accessible to an ARE JupyterLab session.
To do this, log in to gadi
and copy and paste this command:
git clone https://github.com/aekiss/HPC-Data-CFD-2023.git
This will have created a directory called HPC-Data-CFD-2023
in your gadi home directory.
Note: this repository is being updated, so you may need to clone a fresh version of this repository (or do a git pull
, if you know about git) just prior to the tutorial.
This will be the first thing we do in the tutorial, but feel free to test this out before the tutorial day to check it is all working correctly for you.
The instructions from the ACCESS-NRI Intake catalogue docs are excellent and worth reading for background, and of course the NCI ARE Documentation is also available.
Steps:
- Log in to ARE using your NCI username and password: https://are.nci.org.au
- Select the JupyterLab app
For this tutorial there are some recommended settings for your ARE session that have been tested and will work:
Walltime | 2 hours |
Queue | normalbw |
Compute Size | Medium |
Project | choose one, e.g. nf47 |
Storage | see below |
You must choose one of the projects you have available to you. This must be a project with compute allocation remaining as discussed above in prerequisites.
The storage section must contain all the /g/data
projects you need to access. Use the following for this tutorial:
gdata/hh5+gdata/ik11+gdata/cj50
Click on "Advanced Settings" and set the following options:
Module directories
/g/data/hh5/public/modules
Modules
conda/analysis3
Push the Launch button and then you will have a window that looks something like this:
You may need to wait a few minutes for your queued job to start running, which is where your JupyterLab application runs.
When your JupyterLab session is ready the screen will change, with an "Open JupyterLab" button. Push it and your JupyterLab session will open in a new window.
In the JupyterLab window use the file browser to navigate to the directory where you downloaded the jupyter notebook files in the preparation section above. Double-click on the home
directory in the file browser:
and then double-click on the HPC-Data-CFD-2023
directory.
If you've never used Jupyter or JupyterLab before, you might like to look at this overview of the interface and how to use notebooks.
- Click the big blue + button at the top of the file browser (see image above), then click
Python [conda env:analysis3-23.04] *
in the Notebook category. This opens a new untitled Python notebook (running with a specific package environment). You can change the notebook name by right-clicking the notebook tab or in the file browser. - There will be a code cell at the top. Type
1+1
in it, hold down shift and press return. Shift-return will evaluate a cell (rather than creating a new line in the cell), print the result, and give you a new code cell in which you can try out other Python commands. You can go back and change cells and re-evaluate them with shift-return. You can hide input or output cells by clicking in the blue bar that appears in the left margin. Cells can also be rearranged, duplicated, deleted and more. - You can change a cell type from Code to Markdown using the dropdown menu. Markdown cells can be used to create comments and explanations of your code and analysis, including URLs, images and tables, and also mathematical markup in Latex. Try out some of the examples here.
Double-click xarray_demo.ipynb
in the file browser in the left column and work through the notebook.
Xarray will use Dask arrays if you set the chunks
parameter (e.g. chunks='auto'
) in NetCDF file opening commands such as xr.open_dataset
and xr.open_mfdataset
, and subsequent calculations will then be automatically parallelised with Dask, without needing any code changes.
This is done automatically by cc.querying.getvar
in the COSIMA Cookbook, but the default chunking scheme can be overridden if you like. Note that you may need to choose your chunking scheme to suit your calculation for optimum performance.
To see examples of Xarray calculations done in parallel with Dask using the COSIMA Cookbook, double-click Sea_level.ipynb
in the file browser and work through the notebook.
Double-click Explorer_demo.ipynb
in the file browser and play around with it.
There's more information in Finding_COSIMA_data.ipynb
if you're interested.
- try out the other notebooks in this repository
- check out the COSIMA Recipes
When finished make sure you save your work, close the tab and then Delete your running JupyterLab app, otherwise it will continue to consume compute resources and eventually stop when it reaches the walltime limit.