# Using the terminal for NetCDF files

Using the terminal is a very handy way to do a quick preliminary analysis of a NetCDF file. 
You can do run the following examples and exercises either using your terminal, or you can run shell commands by writing `!command` in a Jupyter Notebook cell (as is done below). If you use you terminal, remember to remove the `!` in front of each instruction.

In [None]:
# Example of running shell commands in Jupyter: Display the content of you home
!ls ~

In [None]:
# Display the list of file(s) available in the data folder (../../data_samples/netcdf/E-OBS/)
!ls ../../data_samples/netcdf/E-OBS/

Files with extension `.nc` are "NetCDF" files. It is a standardized binary format suitable for multi-dimensional data. Binary files cannot be read as easily as the text files you have seen previously (you can try to run `head file.nc`, it will yield something, but nothing you can read with a human brain). You need specific tools to read this type of files. 

## 1. Reading the file: ncdump
`ncdump` is the most basic command to check what is in a NetCDF file. Use the `-h` or `-c` option to display the "header" of the file, which contains all the essential information. Use the `-v <var>` option to display the content of a specific variable. If you do not specify any option, the whole content of the file is displayed, which, in most cases, will be very long. If you make this mistake, you'll likely need to kill the process (Ctrl+C in the terminal, Stop button in Jupyter). 

In [None]:
# Display the header of one of the UK_monthly.nc file in the E-OBS folder
!ncdump -h ../../data_samples/netcdf/E-OBS/UK_monthly.nc

**Question: Which variable does each file contain? What are the associated units? What are the dimensions of the variables?**

**Question: What is the difference between using the `-c` and `-h` commands?**

In [None]:
# Display the latitude and longitude variables for one of UK_monthly.nc file in the E-OBS folder
!ncdump -v latitude ../../data_samples/netcdf/E-OBS/UK_monthly.nc
!ncdump -v longitude ../../data_samples/netcdf/E-OBS/UK_monthly.nc

**Question: Over which coordinate box is the data provided?**

## 2. Visualising the file: ncview
`ncview` is a graphical tool to check the content of a NetCDF file. When you run `ncview file.nc` a new window opens with a graphical interface.

In [None]:
# Run ncview for one of the E-OBS file, and watch how the variable changes over time. 
# (You may need to click on the variable name to display it)
# NB: To get back control over your terminal or notebook, close the window. 
!ncview ../../data_samples/netcdf/E-OBS/UK_monthly.nc

**Question: What is the range of values each variable takes? Does it seem sensible?**

**Question: Over which range of dates is the data provided? What is the frequency of the data**

## 3. nco

`nco` provides a suite of commands that can be used to manipulate NetCDF files. Here we introduce the most common ones. You may find all the functions <HERE> for future reference.

* `ncks` (NetCDF kitchen sink) is used to subset from NetCDF file.
* `ncrcat` is used to concatenate files along time dimension.
* `ncra` is used to average variables over time.

Command line tools require the creation of intermediary files, that we will save in the `tmp` folder.

### Subsetting particular point or slice with `ncks`
`ncks -d dim_name,value(,value2) file_in.nc file_out.nc`
NB: If value is a integer, `ncks` will read it as an index (look for the nth value), whereas if value is a float, it will look for the closest value. Therefore, if you want to extract a specific round latitude or longitude, remember to still write `35.0` to be sure to get the value closest to 35, and not the 35th value.

In [None]:
# Extract a given time step
!ncks -O -d time,10 ../../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/time_step_10.nc
# Extract a time slice
!ncks -O -d time,10,20 ../../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/time_step_10_20.nc

In [None]:
# Extract the values for Oxford
!ncks -O -d latitude,51.75 -d longitude,-1.26 ../../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Oxford_nco.nc

In [None]:
# Extract Temperature over Ireland
!ncks -O -d latitude,51.0,55.5 -d longitude,-11.0,-5.0 ../../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Ireland_nco.nc

In [None]:
# Explore the content of the new files with ncdump and ncview
# Remark the changes in dimensions. Check that it is indeed Ireland that has been selected.
!ncdump -h tmp/Oxford_nco.nc
!ncview tmp/Oxford_nco.nc

### Statistical operations
With `ncra` you can average over the whole time of a file, by running `ncra file_in.nc file_out.nc`. It will compute the average for all variables in the file, unless you specify a specific one with `-v`: `ncra -v var file_in.nc file_out.nc`

In [None]:
# Compute the average variables over Ireland
!ncra -O tmp/Ireland_nco.nc tmp/Ireland_avg_nco.nc

In [None]:
# Task: Compute the average for only precipitation over Ireland
!ncra -O -v rr tmp/Ireland_nco.nc tmp/Ireland_rr_avg_nco.nc

In [None]:
# Explore the content of your new file with ncdump and ncview
# Remark the time dimension has been reduced to 1.
!ncdump -h tmp/Ireland_avg_nco.nc
!ncview tmp/Ireland_avg_nco.nc

**Question: What are the rainiest and driest places in Ireland?**

## 4. cdo
`cdo` is another suite of command line tools to manipulate NetCDF files. It is more comprehensive than `nco`, but, as such, also less simple. A comprehensive documentation can be found here: http://www.idris.fr/media/ada/cdo.pdf . 
Here again, we introduce basic cdo functions. 

cdo command always start with `cdo`, then you can call one or several commands, then you name the input file(s) and finally the output file.

### Exploring the file

In [None]:
# Check the dimensions of a file using cdo sinfo
!cdo sinfo ../../data_samples/netcdf/E-OBS/UK_monthly.nc

In [None]:
# Check the grid attributes
!cdo griddes ../../data_samples/netcdf/E-OBS/UK_monthly.nc

In [None]:
# Check the variables contained in a file using cdo showname
!cdo showname ../../data_samples/netcdf/E-OBS/UK_monthly.nc

### Subsetting

In [None]:
# Select values for Oxford
!cdo -remapnn,lon=51.75/lat=1.26 ../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Oxford_cdo.nc

In [None]:
# Select the same box as before with cdo sellonlatbox
!cdo sellonlatbox,-11,-5,51,55.5 ../data_samples/netcdf/E-OBS/UK_monthly.nc tmp/Ireland_cdo.nc

In [None]:
# Explore the content of your new files with ncdump and ncview
!ncdump -h tmp/Oxford_cdo.nc
!ncview tmp/Oxford_cdo.nc

**Question: Can you see an increase in temperature over the period in Oxford?**

### Statistical operations

In [None]:
# Compute the yearly averaged time series using cdo yearmean
!cdo yearmean tmp/Oxford_cdo.nc tmp/Oxford_yearly_cdo.nc

In [None]:
# Visualize the new file with ncview
!ncview tmp/Oxford_yearly_cdo.nc

**Question: Can you see an increase in yearly temperature over the 1950-2023 period in Oxford?**

In [None]:
# Compute the average (over time) temperature in Ireland
!cdo timmean tmp/Ireland_cdo.nc tmp/Ireland_avg_cdo.nc

In [None]:
# Explore the content of your new file with ncdump and ncview. Check that you get the same values as with nco before.
!ncdump -h tmp/Ireland_avg_cdo.nc
!ncview tmp/Ireland_avg_cdo.nc

In [None]:
# Explore the content of your new file with ncdump and ncview
# Remark the time dimension now shows 74 time steps for the 74 years of the dataset.
!ncdump -h tmp/T_France_fldmean_yearmean.nc
!ncview tmp/T_France_fldmean_yearmean.nc

### Chaining commands

`cdo` commands can be chained, but make sure to use the dashes, and be mindful of the order of processes.

In [None]:
# Run the previous workflow chaining all the commands
!cdo -yearmean -fldmean -sellonlatbox,-4.7,7.8,42.5,51.0 ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc tmp/chain.nc

In [None]:
# Explore the content of your new file with ncdump and ncview
!ncdump -h tmp/chain.nc
!ncview tmp/chain.nc

In [None]:
# Compare the file built step by step and the one with command chaining using cdo diff
# Remark some records might differ but the small values indicate averaging errors more than a real difference in the outcomes.
!cdo diff tmp/T_France_fldmean_yearmean.nc tmp/chain.nc

### Wrap-up on terminal use
There are two reasons for using the terminal to explore and manipulate your files before going to Python: 
1. To get a quick look at a file and checking that it contains what you want before you open Python
2. Data manipulation with `nco` and `cdo` (see below) are much more efficient than in Python. For heavy file, it is recommended to first reduce data dimensionality and weight with command-line tools before you open the files in Python. 

In [None]:
# Compare the weight of the full temperature file versus the one where you selected only one country
# Remark: Pre-processing the file (in that case extracting the region of interest) can greatly reduce file size, 
# making it faster to load and manipulate in Python.
!ls -lhS ../data_samples/netcdf/E-OBS/tg_ens_mean_0.25deg_reg_v29.0e.nc
!ls -lhS tmp/T_France.nc

In [None]:
# Remove the files created in the tmp folder
!rm -f tmp/*

As you can see, some tools are redundant, and it is up to you to decide which tool works best for you. `nco` and `cdo` contain many more functions, and it is very likely you will be able to do most of any pre-processing with these tools. Do not forget to always check step by step what each function is doing.