# Using the terminal for NetCDF files

Using the terminal is a very handy way to do a quick preliminary analysis of a NetCDF file. 
You can do run the following examples and exercises either using your terminal, or you can run shell commands by writing `!command` in a Jupyter Notebook cell (as is done below). If you use you terminal, remember to remove the `!` in front of each instruction.

In [1]:
# Example of running shell commands in Jupyter: Display the content of you home
!ls ~

1. Climate and CO2 concentrations(1).ipynb
1. Climate and CO2 concentrations.ipynb
114.mp4
2. 20th century climate.ipynb
2D_GaussianProcess.ipynb
55001.mp4
55006.mp4
AGU_FDL_submission.docx
AGU_FDL_submission_edit.docx
AGU_FDL_submission_edit_DWP.docx
[1m[34mAnacondaProjects[m[m
[1m[34mApplications[m[m
AuditMaster20190123.xlsx
AuditMaster20190123_FIXED.xlsx
AuditMaster20190123_FIXEDSOMEMORE.xlsx
[1m[34mBattleScribe[m[m
Biomass_effect_draft1_Haochi_will_edit.docx
Britain.pcx
CONTROL
ClimateandCO2concentrations.ipynb
DCC-detect-8f61c9e2b0eb.json
DSE_100.png
DSE_10000.png
DSE_1500.png
DSE_500.png
DSE_5000.png
DTP2017_slides.pdf
[1m[34mDesktop[m[m
[1m[34mDocuments[m[m
[1m[34mDownloads[m[m
EGU2020-20208_presentation.pptx
EGU_abstract_WillJones_edit.docx
Empirical_ship_power.ipynb
GLM_histograms_dev.ipynb
GLM_histograms_full.ipynb
GOES.gif
GSO.14.MPLS_Nov_2018_will.docx
GSO.14.MPLS_Nov_2018_will_PS.docx
HadCRUT.4.5.0.0.annual_ns_avg.txt
Iterative_trajectory.ipynb
[1m

In [2]:
# Task: Display the list of file(s) available in the data folder (../../data_samples/netcdf/E-OBS/)
!ls ../../data_samples/netcdf/E-OBS/

UK_monthly.nc


Files with extension `.nc` are "NetCDF" files. It is a standardized binary format suitable for multi-dimensional data. Binary files cannot be read as easily as the text files you have seen previously (you can try to run `head file.nc`, it will yield something, but nothing you can read with a human brain). You need specific tools to read this type of files. 

## 1. Reading the file: ncdump
`ncdump` is the most basic command to check what is in a NetCDF file. Use the `-h` or `-c` option to display the "header" of the file, which contains all the essential information. Use the `-v <var>` option to display the content of a specific variable. If you do not specify any option, the whole content of the file is displayed, which, in most cases, will be very long. If you make this mistake, you'll likely need to kill the process (Ctrl+C in the terminal, Stop button in Jupyter). 

In [3]:
# Display the header of one of the UK_monthly.nc file in the E-OBS folder
!ncdump -h ../../data_samples/netcdf/E-OBS/UK_monthly.nc

netcdf UK_monthly {
dimensions:
	latitude = 38 ;
	longitude = 52 ;
	time = UNLIMITED ; // (888 currently)
variables:
	double latitude(latitude) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "Latitude values" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
	double longitude(longitude) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude values" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
	float pp(time, latitude, longitude) ;
		pp:standard_name = "air_pressure_at_sea_level" ;
		pp:long_name = "sea level pressure" ;
		pp:units = "hPa" ;
		pp:_FillValue = -9999.f ;
		pp:missing_value = -9999.f ;
		pp:cell_methods = "time: mean" ;
	float rr(time, latitude, longitude) ;
		rr:standard_name = "thickness_of_rainfall_amount" ;
		rr:long_name = "rainfall" ;
		rr:units = "mm" ;
		rr:_FillValue = -9999.f ;
		rr:missing_value = -9999.f ;
		rr:cell_methods = "time: mean" ;
	float tg(time, latitude, longitude) ;
		tg:stan

**Question: Which variable does each file contain? What are the associated units? What are the dimensions of the variables?**

**Question: What is the difference between using the `-c` and `-h` commands?**

In [4]:
# Task: Display the latitude and longitude variables for one of UK_monthly.nc file in the E-OBS folder
!ncdump -v latitude ../../data_samples/netcdf/E-OBS/UK_monthly.nc
!ncdump -v longitude ../../data_samples/netcdf/E-OBS/UK_monthly.nc

netcdf UK_monthly {
dimensions:
	latitude = 38 ;
	longitude = 52 ;
	time = UNLIMITED ; // (888 currently)
variables:
	double latitude(latitude) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "Latitude values" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
	double longitude(longitude) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "Longitude values" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
	float pp(time, latitude, longitude) ;
		pp:standard_name = "air_pressure_at_sea_level" ;
		pp:long_name = "sea level pressure" ;
		pp:units = "hPa" ;
		pp:_FillValue = -9999.f ;
		pp:missing_value = -9999.f ;
		pp:cell_methods = "time: mean" ;
	float rr(time, latitude, longitude) ;
		rr:standard_name = "thickness_of_rainfall_amount" ;
		rr:long_name = "rainfall" ;
		rr:units = "mm" ;
		rr:_FillValue = -9999.f ;
		rr:missing_value = -9999.f ;
		rr:cell_methods = "time: mean" ;
	float tg(time, latitude, longitude) ;
		tg:stan

**Question: Over which coordinate box is the data provided?**

## 2. Visualising the file: ncview
`ncview` is a graphical tool to check the content of a NetCDF file. When you run `ncview file.nc` a new window opens with a graphical interface.

In [5]:
# Run ncview for one of the E-OBS file, and watch how the variable changes over time. 
# (You may need to click on the variable name to display it)
# NB: To get back control over your terminal or notebook, close the window. 
!ncview ../data_samples/netcdf/E-OBS/UK_monthly.nc

/bin/bash: ncview: command not found


**Question: What is the range of values each variable takes? Does it seem sensible?**

**Question: Over which range of dates is the data provided? What is the frequency of the data**

## 3. nco

`nco` provides a suite of commands that can be used to manipulate NetCDF files. Here we introduce the most common ones. You may find all the functions <HERE> for future reference.

* `ncks` (NetCDF kitchen sink) is used to subset from NetCDF file.
* `ncrcat` is used to concatenate files along time dimension.
* `ncra` is used to average variables over time.

Command line tools require the creation of intermediary files, that we will save in the `tmp` folder.

### Subsetting particular point or slice with `ncks`
`ncks -d dim_name,value(,value2) file_in.nc file_out.nc`
NB: If value is a integer, `ncks` will read it as an index (look for the nth value), whereas if value is a float, it will look for the closest value. Therefore, if you want to extract a specific round latitude or longitude, remember to still write `35.0` to be sure to get the value closest to 35, and not the 35th value.

In [6]:
# Extract a given time step
!ncks -d time,10 ../../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/time_step_10.nc
# Extract a time slice
!ncks -d time,10,20 ../../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/time_step_10_20.nc

/bin/bash: ncks: command not found
/bin/bash: ncks: command not found


In [7]:
# Extract the values for Oxford
!ncks -d latitude,51.75 -d longitude,-1.26 ../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Oxford_nco.nc

/bin/bash: ncks: command not found


In [8]:
# Extract Temperature over Ireland
!ncks -d latitude,51.0,55.5 -d longitude,-11.0,-5.0 ../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Ireland_nco.nc

/bin/bash: ncks: command not found


In [9]:
# Task: Explore the content of the new files with ncdump and ncview
# Remark the changes in dimensions. Check that it is indeed Ireland that has been selected.
!ncdump -h tmp/Oxford_nco.nc
!ncview tmp/Oxford_nco.nc

ncdump: tmp/Oxford_nco.nc: No such file or directory
/bin/bash: ncview: command not found


### Statistical operations
With `ncra` you can average over the whole time of a file, by running `ncra file_in.nc file_out.nc`. It will compute the average for all variables in the file, unless you specify a specific one with `-v`: `ncra -v var file_in.nc file_out.nc`

In [10]:
# Compute the average variables over Ireland
!ncra tmp/Ireland_nco.nc tmp/Ireland_avg_nco.nc

/bin/bash: ncra: command not found


In [11]:
# Task: Compute the average for only precipitation over Ireland
!ncra -v rr tmp/Ireland_nco.nc tmp/Ireland_rr_avg_nco.nc

/bin/bash: ncra: command not found


In [12]:
# Task: Explore the content of your new file with ncdump and ncview
# Remark the time dimension has been reduced to 1.
!ncdump -h tmp/Ireland_avg_nco.nc
!ncview tmp/Ireland_avg_nco.nc

ncdump: tmp/Ireland_avg_nco.nc: No such file or directory
/bin/bash: ncview: command not found


**Question: What are the rainiest and driest places in Ireland?**

## 4. cdo
`cdo` is another suite of command line tools to manipulate NetCDF files. It is more comprehensive than `nco`, but, as such, also less simple. A comprehensive documentation can be found here: http://www.idris.fr/media/ada/cdo.pdf . 
Here again, we introduce basic cdo functions. 

cdo command always start with `cdo`, then you can call one or several commands, then you name the input file(s) and finally the output file.

### Exploring the file

In [13]:
# Task: Check the dimensions of a file using cdo sinfo
!cdo sinfo ../data_samples/netcdf/E-OBS/UK_monthly.nc


cdo sinfo: Open failed on >../data_samples/netcdf/E-OBS/UK_monthly.nc<
No such file or directory


In [14]:
# Check the variables contained in a file using cdo showname
!cdo showname ../data_samples/netcdf/E-OBS/UK_monthly.nc


cdo showname: Open failed on >../data_samples/netcdf/E-OBS/UK_monthly.nc<
No such file or directory


### Subsetting

In [15]:
# Select values for Oxford
!cdo -remapnn,lon=51.75/lat=1.26 ../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Oxford_cdo.nc


cdo remapnn: Open failed on >../data_samples/netcdf/E-OBS/UK_monthly.nc<
No such file or directory


In [16]:
# Select the same box as before with cdo sellonlatbox
!cdo sellonlatbox,-11,-5,51,55.5 ../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Ireland_cdo.nc


cdo sellonlatbox: Open failed on >../data_samples/netcdf/E-OBS/UK_monthly.nc<
No such file or directory


In [17]:
# Task: Explore the content of your new files with ncdump and ncview
!ncdump -h tmp/Oxford_cdo.nc
!ncview tmp/Oxford_cdo.nc

ncdump: tmp/Oxford_cdo.nc: No such file or directory
/bin/bash: ncview: command not found


**Question: Can you see an increase in temperature over the period in Oxford?**

### Statistical operations

In [18]:
# Compute the yearly averaged time series using cdo yearmean
!cdo yearmean tmp/Oxford_cdo.nc tmp/Oxford_yearly_cdo.nc


cdo yearmean: Open failed on >tmp/Oxford_cdo.nc<
No such file or directory


In [19]:
# Visualize the new file with ncview
!ncview tmp/Oxford_yearly_cdo.nc

/bin/bash: ncview: command not found


**Question: Can you see an increase in yearly temperature over the 1950-2023 period in Oxford?**

In [20]:
# Compute the average (over time) temperature in Ireland
!cdo timmean tmp/Ireland_cdo.nc tmp/Ireland_avg_cdo.nc


cdo timmean: Open failed on >tmp/Ireland_cdo.nc<
No such file or directory


In [21]:
# Task: Explore the content of your new file with ncdump and ncview. Check that you get the same values as with nco before.
!ncdump -h tmp/Ireland_avg_cdo.nc
!ncview tmp/Ireland_avg_cdo.nc

ncdump: tmp/Ireland_avg_cdo.nc: No such file or directory
/bin/bash: ncview: command not found


In [22]:
# Task: Explore the content of your new file with ncdump and ncview
# Remark the time dimension now shows 74 time steps for the 74 years of the dataset.
!ncdump -h tmp/T_France_fldmean_yearmean.nc
!ncview tmp/T_France_fldmean_yearmean.nc

ncdump: tmp/T_France_fldmean_yearmean.nc: No such file or directory
/bin/bash: ncview: command not found


### Chaining commands

`cdo` commands can be chained, but make sure to use the dashes, and be mindful of the order of processes.

In [23]:
# Task: Run the previous workflow chaining all the commands
!cdo -yearmean -fldmean -sellonlatbox,-4.7,7.8,42.5,51.0 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc tmp/chain.nc

cdo yearmean: Started child process "fldmean -sellonlatbox,-4.7,7.8,42.5,51.0 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc (pipe1.1)".
cdo(2) fldmean: Started child process "sellonlatbox,-4.7,7.8,42.5,51.0 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc (pipe2.1)".

cdo(3) sellonlatbox: Open failed on >../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc<
No such file or directory


In [24]:
# Task: Explore the content of your new file with ncdump and ncview
!ncdump -h tmp/chain.nc
!ncview tmp/chain.nc

ncdump: tmp/chain.nc: No such file or directory
/bin/bash: ncview: command not found


In [25]:
# Task: Compare the file built step by step and the one with command chaining using cdo diff
# Remark some records might differ but the small values indicate averaging errors more than a real difference in the outcomes.
!cdo diff tmp/T_France_fldmean_yearmean.nc tmp/chain.nc


cdo diff: Open failed on >tmp/T_France_fldmean_yearmean.nc<
No such file or directory


### Wrap-up on terminal use
There are two reasons for using the terminal to explore and manipulate your files before going to Python: 
1. To get a quick look at a file and checking that it contains what you want before you open Python
2. Data manipulation with `nco` and `cdo` (see below) are much more efficient than in Python. For heavy file, it is recommended to first reduce data dimensionality and weight with command-line tools before you open the files in Python. 

In [26]:
# Task: Compare the weight of the full temperature file versus the one where you selected only one country
# Remark: Pre-processing the file (in that case extracting the region of interest) can greatly reduce file size, 
# making it faster to load and manipulate in Python.
!ls -lhS ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc
!ls -lhS tmp/T_France.nc

ls: ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc: No such file or directory
ls: tmp/T_France.nc: No such file or directory


In [27]:
# Task: Remove the files created in the tmp folder
!rm -f tmp/*

As you can see, some tools are redundant, and it is up to you to decide which tool works best for you. `nco` and `cdo` contain many more functions, and it is very likely you will be able to do most of any pre-processing with these tools. Do not forget to always check step by step what each function is doing.