# Climate Model and Climate data workshop, hands-on session

This jupyter notebook is part of the climate model and climate data workshop. It showcases how to use climate model data in R. However, it is not a R course!

Use kernel Teaching (R SLAC 2021).

Here we will use the ncdf4 (https://search.r-project.org/CRAN/refmans/ncdf4/html/00Index.html) library.

## 1 Loading and inspecting data

In [None]:
#-----------------------------------------------------------------------------#
# Load libraries
#-----------------------------------------------------------------------------#
library(ncdf4)

In [None]:
# get working directory
WD <- getwd()

# define directory where data is
indir <- "/net/co2/c2sm-data/rlorenz/climate_model_data_workshop/"
# define output directory in case you want to save any data
outdir <- paste(WD, "/data/", sep="")
print(outdir)

In [None]:
#-----------------------------------------------------------------------------#
# Read data
#-----------------------------------------------------------------------------#
file <- paste(indir, "tas_Amon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc", sep="")
ex.nc <- nc_open(file)
print(paste("The file has",ex.nc$nvars,"variables"))
print(ex.nc)

In [None]:
# This illustrates how to read all the data from a variable (the 5th) if you do not know the variable name
v5 <- ex.nc$var[[5]]
data1 <- ncvar_get( ex.nc, v5 )	# by default, reads ALL the data

print(paste("Var 5 has name",v5$name,"and",v5$ndim, "dimensions.",
    "Dimensions are:"))
print(v5$varsize)

In [None]:
# Read variable "tas" directly if variable name is known
tas <- ncvar_get(ex.nc, "tas")
print(dim(tas))

❓ **Questions**

1. How many years are covered in the dataset?
2. What is the data frequency?
3. How many variables does the dataset contain?

### 1.1 Exercise
Calculate the grid spacing in latitudinal and longitudinal direction.

In [None]:
# Read longitude and latitude dimension
lon <- ncvar_get(ex.nc, "lon")
lat <- ncvar_get(ex.nc, "lat")

lat_spacing <- lat[2] - lat[1] 
lon_spacing <- lon[2] - lon[1]

print(paste("The latitudinal spacing is", lat_spacing))
print(paste("The longitudinal spacing is", lon_spacing))

## 2. Subsetting data

In [None]:
#Read time dimension
time <- ncvar_get(ex.nc, "time")
time_units <- ncatt_get(ex.nc, "time", "units") #time unit, for how defined
# extract base date, third bit of time_units "days since 1950-01-01 00:00"
base_date <- as.character(lapply(strsplit(as.character(time_units$value),
                                          split = " "), "[", 3))
time_d <- as.Date(time, format = "%j",
                  origin = as.Date(base_date)) # define time as date
time.y <- format(time_d, "%Y") #extract years only from time_d

Y.Start <- time.y[1] # first year in timeseries
                     # -> can be used as info for output file
Y.End <- time.y[length(time.y)] # last year in timeseries

In [None]:
# print start and end years, convert to integers
print(Y.Start)
print(Y.End)
first_year <- strtoi(Y.Start, base = 0L)
last_year <- strtoi(Y.End, base = 0L)

### Subset data for a specific year and print the dataset

#### Hint
R's which method can be used to find indeces where a specific condition is met, see:

- https://stat.ethz.ch/R-manual/R-devel/library/base/html/which.html

check ncvar_get and what kind of parameters can be used:
- https://www.rdocumentation.org/packages/ncdf4/versions/1.21/topics/ncvar_get

In [None]:
# Select the time slice you are interested in (e.g., time = 2012)
time_index <- which(time.y == 2012)
print(time_index)

# Read the variable data for the selected time slice
variable_name <- "tas"
data <- ncvar_get(ex.nc, variable_name, start = c(1, 1, time_index[1]), count = c(-1, -1, length(time_index)))
print(dim(data))

In [None]:
# this is an example how a netcdf can be saved from R
print("save monthly data to netcdf")

#define dimensions for output file
dim1 <- ncdim_def("longitude", "degrees_east", as.double(lon))
dim2 <- ncdim_def("latitude", "degrees_north", as.double(lat))
dimT <- ncdim_def("time", "months", time_index, unlim = FALSE)
missval <- -9999

# define the EMPTY netcdf variable "tas"
var1 <- ncvar_def("tas", "K", list(dim1, dim2, dimT), missval,
                  longname = "temperature")

# associate the netcdf variable with a netcdf file  
file.out <- paste(outdir, "T_2012.nc", sep = "")
print(file.out)
# in case the output directory does not exist yet, create it
if (!file.exists(outdir)){
   dir.create(file.path(outdir), showWarnings = FALSE)
}

# create the netcdf file
nc.ex = nc_create(file.out, list(var1))

# put data into file
ncvar_put(nc.ex, var1, data) # one could write a subset of the data using
                           # start = c(1, 1, 1),  count = c(nlon, nlat, t2))
ncatt_put(nc.ex , var1, '_FillValue', missval )

#### Exercise 2.1

Calculate the average temperature for the year 2012 and print the result.

In [None]:
data_mean <- mean(data)
print(paste("The mean temperature for 2012 is ", round(data_mean, digits = 2),".", sep=""))

### Subset data for a specific region using latitude and longitude bounds

In [None]:
# read data but only for polygon
min_lon <-  5. 
max_lon <- 16. 
# latitude is -90 to 90
min_lat <- 45.
max_lat <- 55.

# find start indices for these longitudes and latitudes
lon_index <- which(lon > min_lon & lon < max_lon)
print(lon_index)

lat_index <- which(lat > min_lat & lat < max_lat)
print(lat_index)

print(time_index)

In [None]:
# Read the variable data for the selected latitudes and longitudes for the time of interest
data_reg <- ncvar_get(ex.nc, variable_name, start = c(lon_index[1], lat_index[1], time_index[1]), count = c(length(lon_index), length(lat_index), length(time_index)))
print(data_reg)

In [None]:
# Close the NetCDF file
nc_close(ex.nc)

#### Exercise 2.2

Calculate and print the mean temperature over space and time for the selected region.

In [None]:
data_reg_mean <- mean(data_reg)
print(paste("The mean temperature for the selected region is ", round(data_reg_mean, digits = 2)," K.", sep=""))

#### Exercise 2.3

Calculate and print the mean temperature for the selected region for the full timeseries.

In [None]:
data_reg_mean_timeseries <- apply(data_reg, 3, mean, na.rm = TRUE)
print(data_reg_mean_timeseries)

#### You can also use packages like "raster" to work with netcdf data

In [None]:
# load necessary packages
library(raster)

# get working directory and load function eurocentric.r
getwd()
source("eurocentric.r")

In [None]:
nc_file <- paste(indir, "tas_Amon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_201001-201412.nc", sep="")
nc_data <- brick(nc_file) + 0
print(nc_data)

In [None]:
nc_data = convert.to.eurocentric(nc_data)
print(nc_data)

❓ **Question**

What does the function convert.to.eurocentric do?

In [None]:
# Extract the time dimension
nc <- nc_open(nc_file)

time <- ncvar_get(nc, "time")
time_units <- ncatt_get(nc, "time", "units")$value
print(time_units)
nc_close(nc)

# extract base date, third bit of time_units "days since 1950-01-01 00:00"
base_date <- as.character(lapply(strsplit(as.character(time_units),
                                          split = " "), "[", 3))
print(base_date)
time_dates <- as.Date(time, format = "%j",
                  origin = as.Date(base_date)) # define time as date
print(time_dates)

In [None]:
# add time as date to rasterbrick
raster_ts <- setZ(nc_data, time_dates)
print(raster_ts)