# Python example for Climatic Research Unit (CRU) time-series (TS) data

## CRU TS data

The following guide will assist with the manipulation of the Climatic Research Unit (CRU) gridded time-series (TS) dataset. For more information on this data, or to understand the ways in which it can be downloaded, please see the CEDA CRU data user guide.

The code below is put at the beginning of the script to add the specific packages (tools) that are needed in python to achieve the outcome

In [22]:
import numpy as np
import matplotlib.pyplot as plt
import netCDF4

## Manipulation of temperature data

Now we can read in the data to the program so it can be viewed, manipulated and displayed as required. The text within the quotation marks is the file path. As we know the data is on the CEDA Archive, we can read it directly from the path we used above

In [23]:
filename = "/badc/cru/data/cru_ts/cru_ts_4.02/data/tmp/cru_ts4.02.1901.2017.tmp.dat.nc"
data = netCDF4.Dataset(filename)

To understand a bit more about what's inside the NetCDF file, you can print the variable names:

In [24]:
print(data.variables.keys())

odict_keys(['lon', 'lat', 'time', 'tmp', 'stn'])


The .keys() method provides only the variable names, without this you will get additional metadata. Each dimension in the file also has a variable, so you will see a variable for each dimension, in this case:
<ul>
<li>'lat' for latitudes<\li>
<li>'lon' for longitudes<\li>
<li>'time' for time<\li><\li>
<li>'tmp' for near surface temperature<\li>

Next, the temperature, longitude, latitude and time variables are set. This allows the temperature data to be used within the script.

In [25]:
temp = data.variables['tmp'][:]
lon = data.variables['lon'][:]
lat = data.variables['lat'][:]
time = data.variables['time']

#### 1. Climatology - global temperature data averaged over time (1901-2017)

The temperature variable that has been set as 'temp' in the code above, is a function of latitude, longitude and time as it is 3-dimensional. To produce a map plot, the temperature values need to be averaged across the entire time period of the dataset. This will give 1 average value per grid point. The line below averages the temperature variable by the time axis. You could pick specific years that you want to view by subsetting the data, the below is similar to the climatology.

In [26]:
temp_av_1901_2017= np.mean(temp[:,:,:],axis = 0)

#### 2. Global temperature anomaly of 2017 (with respect to 1961-1990 reference period)

The code below slices the temperature data so we now have the temperature data between 1961 and 1990 and the temperature data for just 2017 only. This allows us to calculate a 2017 anomaly value, compared to our reference period (1961-1990). Firstly, we will use a time conversion to more easily identify the index values we need to slice the data with to obtain 2017 (the same can be done to find 1961-1990 or any other year).

In [27]:
time_convert = netCDF4.num2date(time[:], time.units, time.calendar)
print(time_convert[1392:1404])

[datetime.datetime(2017, 1, 16, 0, 0) datetime.datetime(2017, 2, 15, 0, 0)
 datetime.datetime(2017, 3, 16, 0, 0) datetime.datetime(2017, 4, 16, 0, 0)
 datetime.datetime(2017, 5, 16, 0, 0) datetime.datetime(2017, 6, 16, 0, 0)
 datetime.datetime(2017, 7, 16, 0, 0) datetime.datetime(2017, 8, 16, 0, 0)
 datetime.datetime(2017, 9, 16, 0, 0)
 datetime.datetime(2017, 10, 16, 0, 0)
 datetime.datetime(2017, 11, 16, 0, 0)
 datetime.datetime(2017, 12, 16, 0, 0)]


In [28]:
temp_1961_1990 = np.mean(temp[720:1080,:,:],axis = 0)

temp_2017 = np.mean(temp[1392:1404,:,:],axis=0)

temp_2017_anom = temp_2017 - temp_1961_1990

#### 3. Annual mean temperature anomaly averaged globally

For a global time series graph, the data needs to be averaged in a different way. To create a time series plot, the data needs to be averaged across all grid points, so there is 1 global average value vs time. The line of code below does exactly this.

In [29]:
global_average= np.mean(temp[:,:,:],axis=(1,2))

To reduce the seasonal noise, an annual average needs to be calculated from the monthly data. The code below reshapes the global average into [117,12] as there are 117 years in the dataset, each with 12 months. Then the average is calculated for each year. These new annual average values are saved as 'annual_temp'.

In [30]:
annual_temp = np.mean(np.reshape(global_average, (117,12)), axis = 1)

It is useful to look at the temperature values as an anomaly compared to a certain temperature period. The following code calculates the annual temperature anomaly in comparison to the average temperature in 1961-1990. The first line calculates the average temperature value for this time period (1961-1990). This is done by slicing the data with the indices 60:89 as this gives the values from 1961-1990, then averaging these values. The second line then deducts the average temperature value between 1961-1990 from each of the annual temperature values calculated above, saving it as 'temp_anomaly'.

In [31]:
av_1961_1990=np.mean(annual_temp[60:90])

temp_anomaly = annual_temp - av_1961_1990

## Manipulation of precipitation data

#### 1. Extracting a spatial subset

Now we are going to use precipitation data. We can see that in the archive using the code below:

In [32]:
! ls /badc/cru/data/cru_ts/cru_ts_4.02/data/pre/cru_ts4.02.1901.2017.pre.dat.nc

/badc/cru/data/cru_ts/cru_ts_4.02/data/pre/cru_ts4.02.1901.2017.pre.dat.nc


Using Python we will now read in precipitation data and subset the data to obtain Europe region only.There are many different ways to obtain a regional subset of the data. This is just one method.

First we read in the precipitation data and name variables as we have done before.

In order to create a regional subset, specifically Europe here, the code below sets latitude (lat_bnds) and longitude (lon_bnds) boundaries (range). These boundaries are then used to identify the index within the precipitation data.

When we select this data (pre), we slice the latitude and longitude values using our boundaries for Europe. This ensures the precipitation data is restricted to only the area we want. So this data can be exported to a csv file we are only going to abstract 1 month of data. Within the final line of code, weare slicing the data for 1 month - Jan 1901

In [33]:
rain_ds = netCDF4.Dataset('/badc/cru/data/cru_ts/cru_ts_4.02/data/pre/cru_ts4.02.1901.2017.pre.dat.nc')

latitude = rain_ds.variables['lat'][:]
longitude = rain_ds.variables['lon'][:]

lat_bnds, lon_bnds = [35, 70], [-10, 30]

lat_index = np.where((latitude > lat_bnds[0]) & (latitude < lat_bnds[1]))[0]
lon_index = np.where((longitude > lon_bnds[0]) & (longitude < lon_bnds[1]))[0]

rain = rain_ds.variables['pre']

rain_eu = rain[0, lat_index, lon_index]

#### 2. Extracting a temporal subset

This is similar to some of the data manipulation we have done previously.

Using the precipitation data from above, the code below calculates the average rainfall in 2017 (Jan-Dec) for the whole globe.

In [34]:
rain_2017 = np.mean(rain[1392:1404,:,:],axis = 0)