# Using NetCDF files

The `data_samples/netcdf/E-OBS` contains data from the E-OBS dataset, which consist in weather station observations interpolated onto a 0.25x0.25° grid over Europe. 

## 1. Explore and manipulate the data in bash/terminal

Before we actually use Python, we are going to learn how to manipulate NetCDF files using the terminal. You can do the following exercises either switching to the terminal, or you can run shell commands by writing `!command` in a Jupyter Notebook cell. 

In [1]:
# Example of running shell commands in Jupyter: Display the content of you home
!ls ~

[34mDesktop[m[m             [34mLibrary[m[m             [34mPictures[m[m            [34mZotero[m[m
[34mDocuments[m[m           [34mMovies[m[m              [34mPublic[m[m              slp?.png
[34mDownloads[m[m           [34mMusic[m[m               [34mSofts[m[m
[34mHuracan[m[m             [35mOneDrive - Nexus365[m[m [34mTeaching[m[m


In [2]:
# Task: Display the list of files available in the data folder
!ls ../data_samples/netcdf/E-OBS/

[34mdaily[m[m                             rr_ens_mean_0.25deg_reg_v29.0e.nc
pp_ens_mean_0.25deg_reg_v29.0e.nc tg_ens_mean_0.25deg_reg_v29.0e.nc


Files with extension `.nc` are "NetCDF" files. It is a standardized binary format suitable for multi-dimensional data. Binary files cannot be read as easily as the text files you have seen previously (you can try to run `head file.nc`, it will yield something, but nothing you can read with a human brain). You need specific tools to read this type of files. 

### 1.1. ncdump
`ncdump` is the most basic command to check what is in a NetCDF file. Use the `-h` or `-c` option to display the "header" of the file, which contains all the essential information. Use the `-v <var>` option to display the content of a specific variable. If you do not specify any option, the whole content of the file is displayed, which, in most cases, will be very long. If you make this mistake, you'll likely need to kill the process (Ctrl+C in the terminal, Stop button in Jupyter). 

In [3]:
# Task: Display the header of one of the file in the E-OBS folder
!ncdump -h ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc

netcdf tg_ens_mean_0.25deg_reg_v29.0e {
dimensions:
	time = UNLIMITED ; // (888 currently)
	bnds = 2 ;
	longitude = 464 ;
	latitude = 201 ;
variables:
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;
	double longitude(longitude) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude values" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
	double latitude(latitude) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "Latitude values" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
	float tg(time, latitude, longitude) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;

// global at

**Question: Which variable does each file contain? What are the associated units? What are the dimensions of the variables?**

**Question: What is the difference between using the `-c` and `-h` commands?**

In [4]:
# Task: Display the latitude and longitude variables for one of the file in the E-OBS folder
!ncdump -v latitude ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc
!ncdump -v longitude ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc

netcdf tg_ens_mean_0.25deg_reg_v29.0e {
dimensions:
	time = UNLIMITED ; // (888 currently)
	bnds = 2 ;
	longitude = 464 ;
	latitude = 201 ;
variables:
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;
	double longitude(longitude) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude values" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
	double latitude(latitude) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "Latitude values" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
	float tg(time, latitude, longitude) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;

// global at

**Question: Over which coordinate box is the data provided?**

### 1.2 ncview
`ncview` is a graphical tool to check the content of a NetCDF file. When you run `ncview file.nc` a new window opens with a graphical interface.

In [5]:
# Task: Run ncview for one of the E-OBS file, and watch how the variable changes over time. (You may need to click on the variable name to display it)
# NB: To get back control over your terminal or notebook, close the window. 
!ncview ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc

Ncview 2.1.8 David W. Pierce  8 March 2017
http://meteora.ucsd.edu:80/~pierce/ncview_home_page.html
Copyright (C) 1993 through 2015, David W. Pierce
Ncview comes with ABSOLUTELY NO WARRANTY; for details type `ncview -w'.
This is free software licensed under the Gnu General Public License version 3; type `ncview -c' for redistribution details.

Note: no Ncview app-defaults file found, using internal defaults
X connection to /private/tmp/com.apple.launchd.WATKgwVKI5/org.xquartz:0 broken (explicit kill or server shutdown).


**Question: What is the range of values that the variable you observed takes? Does it seem sensible?**

**Question: Over which range of dates is the data provided? What is the frequency of the data**

### 1.3 nco

`nco` provides a suite of commands that can be used to manipulate NetCDF files. Here we introduce the most common ones. You may find all the functions <HERE> for future reference.

* `ncks` (NetCDF kitchen sink) is used to subset from NetCDF file.
* `ncrcat` is used to concatenate files along time dimension.
* `ncra` is used to average variables over time.

Command line tools require the creation of intermediary files, that we will save in the `tmp` folder.

In [6]:
# Task: Extract Temperature over your favourite European country (You must define a longitude and latitude box around the country)
# NB: This can take a few seconds
!ncks -d latitude,42.5,51.0 -d longitude,-4.7,7.8 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc tmp/T_France.nc

In [7]:
# Task: Explore the content of your new file with ncdump and ncview
# Remark the longitude and latitude dimensions are smaller than before, and check than you can see the country you wanted to.
!ncdump -h tmp/T_France.nc
!ncview tmp/T_France.nc

netcdf T_France {
dimensions:
	latitude = 34 ;
	longitude = 50 ;
	time = UNLIMITED ; // (888 currently)
	bnds = 2 ;
variables:
	double latitude(latitude) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "Latitude values" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
	double longitude(longitude) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude values" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
	float tg(time, latitude, longitude) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;

// global attributes:
		:CDI = "Clim

In [8]:
# Task: Compute the average temperature over this selected area
!ncra tmp/T_France.nc tmp/T_France_ncra.nc

In [9]:
# Task: Explore the content of your new file with ncdump and ncview
# Remark the time dimension has been reduced to 1.
!ncdump -h tmp/T_France_ncra.nc
!ncview tmp/T_France_ncra.nc

netcdf T_France_ncra {
dimensions:
	latitude = 34 ;
	longitude = 50 ;
	time = UNLIMITED ; // (1 currently)
	bnds = 2 ;
variables:
	double latitude(latitude) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "Latitude values" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
	double longitude(longitude) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude values" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
	float tg(time, latitude, longitude) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
		time:cell_methods = "time: mean" ;
	double time_bnds(time, bnds)

**Question: What is the average temperature over your country of interest?**

### 1.4 cdo
`cdo` is another suite of command line tools to manipulate NetCDF files. It is more comprehensive than `nco`, but, as such, also less simple. A comprehensive documentation can be found here: http://www.idris.fr/media/ada/cdo.pdf . 
Here again, we introduce basic cdo functions. 

In [23]:
# Task: Check the dimensions of a file using cdo sinfo
!cdo sinfo ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc

[0;1m   File format[0m : NetCDF4
[0;1m    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter ID[0m
     1 : [34munknown  unknown  v instant  [0m[32m     1 [0m  1 [32m    93264 [0m  1 [34m F32  [0m: -1            
[0;1m   Grid coordinates[0m :
     1 : [34mlonlat                  [0m : [32mpoints=93264 (464x201)[0m
                        longitude : -40.375 to 75.375 by 0.25 degrees_east
                         latitude : 25.375 to 75.375 by 0.25 degrees_north
[0;1m   Vertical coordinates[0m :
     1 : [34msurface                 [0m :[32m levels=1[0m
[0;1m   Time coordinate[0m :
                             time : [32m888 steps
[0m     RefTime =  1950-01-01 00:00:00  Units = days  Calendar = standard  Bounds = true
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
[35m  1950-01-16 00:00:00  1950-02-14 00:00:00  1950-03-16 00:00:00  1950-04-15 00:00:00
  1950-05-16 00:00:00  1950-06-15 00:00:00  

In [24]:
# Task: Check the variables contained in a file using cdo showname
!cdo showname ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc

 tg
[32mcdo    showname: [0mProcessed 1 variable [0.05s 37MB]


In [25]:
# Task: Select the same box as before with cdo sellonlatbox
!cdo sellonlatbox,-4.7,7.8,42.5,51.0 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc tmp/T_France.nc

[32mcdo    sellonlatbox: [0mProcessed 82818432 values from 1 variable over 888 timesteps [0.26s 93MB]


In [26]:
# Task: Explore the content of your new file with ncdump and ncview
# Remark the longitude and latitude dimensions are smaller than before, and check than you can see the country you wanted to.
!ncdump -h tmp/T_France.nc
!ncview tmp/T_France.nc

netcdf T_France {
dimensions:
	time = UNLIMITED ; // (888 currently)
	bnds = 2 ;
	longitude = 50 ;
	latitude = 34 ;
variables:
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;
	double longitude(longitude) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude values" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
	double latitude(latitude) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "Latitude values" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
	float tg(time, latitude, longitude) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;

// global attributes:
		:CDI = "Clim

In [27]:
# Task: Compute the average temperature over your box with cdo fldmean
!cdo fldmean tmp/T_France.nc tmp/T_France_fldmean.nc

cdo    fldmean:                        1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 910                     1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9100%[32mcdo    fldmean: [0mProcessed 1509600 values from 1 variable over 888 timesteps [0.11s 49MB]


In [15]:
# Task: Explore the content of your new file with ncdump and ncview
# Remark the longitude and latitude dimension have shrinked to 1. ncview now displays a time series since the data became one-dimensional.
!ncdump -h tmp/T_France_fldmean.nc
!ncview tmp/T_France_fldmean.nc

netcdf T_France_fldmean {
dimensions:
	time = UNLIMITED ; // (888 currently)
	bnds = 2 ;
	lon = 1 ;
	lat = 1 ;
variables:
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	float tg(time, lat, lon) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;

// global attributes:
		:CDI = "Climate Data Interface version 2.4.0 (https://mpimet.mpg.de/cdi)" ;
		:Conventions = "CF-1.4" ;
		:E

**Question: Can you see an increase in monthly temperature over the 1950-2023 period in your region?**

In [28]:
# Task: average the temperature time series using cdo yearmean
!cdo yearmean tmp/T_France_fldmean.nc tmp/T_France_fldmean_yearmean.nc

cdo    yearmean:                        1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 91[32mcdo    yearmean: [0mProcessed 888 values from 1 variable over 888 timesteps [0.07s 33MB]


In [17]:
# Task: Explore the content of your new file with ncdump and ncview
# Remark the time dimension now shows 74 time steps for the 74 years of the dataset.
!ncdump -h tmp/T_France_fldmean_yearmean.nc
!ncview tmp/T_France_fldmean_yearmean.nc

netcdf T_France_fldmean_yearmean {
dimensions:
	time = UNLIMITED ; // (74 currently)
	bnds = 2 ;
	lon = 1 ;
	lat = 1 ;
variables:
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	float tg(time, lat, lon) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;

// global attributes:
		:CDI = "Climate Data Interface version 2.4.0 (https://mpimet.mpg.de/cdi)" ;
		:Conventions = "CF-1.4

**Question: Can you see an increase in yearly temperature over the 1950-2023 period in your region?**

`cdo` commands can be chained, but make sure to use the dashes, and be mindful of the order of processes.

In [21]:
# Task: Run the previous workflow chaining all the commands
!cdo -yearmean -fldmean -sellonlatbox,-4.7,7.8,42.5,51.0 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc tmp/chain.nc

[32mcdo(1) fldmean: [0mProcess started
[32mcdo(2) sellonlatbox: [0mProcess started
cdo    yearmean:     0%cdo(1) fldmean:                        1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 910                     1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9100%[32mcdo    yearmean: [0mProcessed 888 values from 1 variable over 888 timesteps [0.20s 74MB]


In [22]:
# Task: Explore the content of your new file with ncdump and ncview
!ncdump -h tmp/chain.nc
!ncview tmp/chain.nc

netcdf chain {
dimensions:
	time = UNLIMITED ; // (74 currently)
	bnds = 2 ;
	lon = 1 ;
	lat = 1 ;
variables:
	int time(time) ;
		time:standard_name = "time" ;
		time:long_name = "Time in days" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1950-01-01 00:00" ;
		time:calendar = "standard" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;
	double lon(lon) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:axis = "X" ;
	double lat(lat) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:axis = "Y" ;
	float tg(time, lat, lon) ;
		tg:standard_name = "air_temperature" ;
		tg:long_name = "mean temperature" ;
		tg:units = "Celsius" ;
		tg:_FillValue = -9999.f ;
		tg:missing_value = -9999.f ;
		tg:cell_methods = "time: mean" ;

// global attributes:
		:CDI = "Climate Data Interface version 2.4.0 (https://mpimet.mpg.de/cdi)" ;
		:Conventions = "CF-1.4" ;
		:E-OBS_version

In [29]:
# Task: Compare the file built step by step and the one with command chaining using cdo diff
# Remark some records might differ but the small values indicate averaging errors more than a real difference in the outcomes.
!cdo diff tmp/T_France_fldmean_yearmean.nc tmp/chain.nc

               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter ID
     7 :[35m 1956-06-30 00:00:00 [0m[32m      0        1       0       1 [0m: F F [34m  9.5367e-07  1.0837e-07[0m : [32;1m-1         [0m
    18 :[35m 1967-06-30 00:00:00 [0m[32m      0        1       0       1 [0m: F F [34m  9.5367e-07  9.3849e-08[0m : [32;1m-1         [0m
    21 :[35m 1970-06-30 00:00:00 [0m[32m      0        1       0       1 [0m: F F [34m  9.5367e-07  9.6826e-08[0m : [32;1m-1         [0m
    31 :[35m 1980-06-30 00:00:00 [0m[32m      0        1       0       1 [0m: F F [34m  9.5367e-07  1.0168e-07[0m : [32;1m-1         [0m
    39 :[35m 1988-06-30 00:00:00 [0m[32m      0        1       0       1 [0m: F F [34m  9.5367e-07  8.9312e-08[0m : [32;1m-1         [0m
    70 :[35m 2019-06-30 00:00:00 [0m[32m      0        1       0       1 [0m: F F [34m  9.5367e-07  8.1170e-08[0m : [32;1m-1         [0m
[31;1m  6 of 74 recor

### Wrap-up on terminal use
There are two reasons for using the terminal to explore and manipulate your files before going to Python: 
1. To get a quick look at a file and checking that it contains what you want before you open Python
2. Data manipulation with `nco` and `cdo` (see below) are much more efficient than in Python. For heavy file, it is recommended to first reduce data dimensionality and weight with command-line tools before you open the files in Python. 

In [18]:
# Task: Compare the weight of the full temperature file versus the one where you selected only one country
# Remark: Pre-processing the file (in that case extracting the region of interest) can greatly reduce file size, 
# making it faster to load and manipulate in Python.
!ls -lhS ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc
!ls -lhS tmp/T_France.nc

-rw-r--r--  1 bourdin  staff   316M  6 Aug 15:05 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc
-rw-r--r--  1 bourdin  staff   5.9M  6 Aug 15:18 tmp/T_France.nc


In [19]:
# Task: Remove the files created in the tmp folder
!rm -f tmp/*

As you can see, some tools are redundant, and it is up to you to decide which tool works best for you. `nco` and `cdo` contain many more functions, and it is very likely you will be able to do most of any pre-processing with these tools. Do not forget to always check step by step what each function is doing.

## 2. Explore and manipulate the data in Python using `xarray`

`xarray` is a very powerful and intuitive package to manipulate multi-dimensionnal data in Python. It is designed to work well with NetCDF. 