# Download and preprocess using NCEP and CMIP6 testing data

In [1]:
import os
import iris
import xarray
import urllib

## Downloading data

In [2]:
DATADIR = "ncep_data"
os.mkdir(DATADIR)

Let's download and preprocess some data. We will use CMIP6 data from the Climate Data Store (CDS) as climate model input and the NCEP/DOE Reanalysis II on daily resolution as observations.

### 1. Download climate model data

We will request data climate from the Climate Data Store (CDS) using the CDS API. Let us make use of the option to manually set the CDS API credentials. First, you have to define two variables: URL and KEY which build together your CDS API key. The string of characters that make up your KEY include your personal User ID and CDS API key. To obtain these, first register or login to the CDS (http://cds.climate.copernicus.eu), then visit https://cds.climate.copernicus.eu/api-how-to and copy the string of characters listed after "key:". Replace the ######### below with this string

In [3]:
URL = 'https://cds.climate.copernicus.eu/api/v2'
KEY = '140375:f24956b6-43ca-40ed-9563-be7031a4b2c3' # enter your key instead

Let's choose a model and variables we are interested in. Here we decided to use a CMIP6 model. Unfortunately all variables are available on daily temporal resolution in CMIP6 on the CDS. Feel free to change the API request below to CMIP5 for example which has more variables on daily resolution: https://cds.climate.copernicus.eu/cdsapp#!/dataset/projections-cmip5-daily-single-levels?tab=overview

In [12]:
# choose model
model = 'mpi_esm1_2_lr'

# choose variables to extract (not all variables available at daily resolution for all cmip6 models at the moment)
variables = ['near_surface_air_temperature', 'daily_maximum_near_surface_air_temperature', 'daily_minimum_near_surface_air_temperature', 'precipitation', 'near_surface_specific_humidity']

# choose area to extract
area = [80, 3, 20, 30]

# choose a historical period to extract
period_hist = '1979-01-01/2015-12-31'

# choose a future period to extract:
period_fut = '2050-01-01/2070-12-31'

Let's install the cdsapi:

In [13]:
#!pip install cdsapi
import cdsapi

### 1.1. Download historical climate model data

In [14]:
# download historical climate model data

c = cdsapi.Client(url=URL, key=KEY)

for v in variables:
    c.retrieve(
        'projections-cmip6',
        {
            'temporal_resolution': 'daily',
            'experiment': 'historical',
            'level': 'single_levels',
            'variable': v,
            'model': model,
            'date': period_hist,
            'area': area,
            'format': 'zip',
        },
        f'{DATADIR}/cmip6_daily_1979-2015_ipsl_historical_{v}.zip')

2022-08-20 23:53:17,115 INFO Welcome to the CDS
2022-08-20 23:53:17,116 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cmip6
2022-08-20 23:53:17,196 INFO Request is queued
2022-08-20 23:53:18,253 INFO Request is running
2022-08-20 23:53:49,817 INFO Request is completed
2022-08-20 23:53:49,820 INFO Downloading https://download-0018.copernicus-climate.eu/cache-compute-0018/cache/data7/adaptor.esgf_wps.retrieve-1661036025.6290557-30698-19-7d30f7e4-0dc8-4872-a6da-089802265007.zip to ncep_data/cmip6_daily_1979-2015_ipsl_historical_daily_minimum_near_surface_air_temperature.zip (12.7M)
2022-08-20 23:53:51,339 INFO Download rate 8.4M/s                               


### 1.2. Download future climate model data

In [15]:
# download future climate model data

c = cdsapi.Client(url=URL, key=KEY)

for v in variables:
    c.retrieve(
        'projections-cmip6',
        {
            'temporal_resolution': 'daily',
            'experiment': 'ssp5_8_5',
            'level': 'single_levels',
            'variable': v,
            'model': model,
            'date': period_fut,
            'area': area,
            'format': 'zip',
        },
        f'{DATADIR}/cmip6_daily_2050-2070_ipsl_ssp5_8_5_{v}.zip')

2022-08-20 23:53:51,471 INFO Welcome to the CDS
2022-08-20 23:53:51,473 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cmip6
2022-08-20 23:53:51,563 INFO Request is queued
2022-08-20 23:53:52,623 INFO Request is running
2022-08-20 23:54:12,704 INFO Request is completed
2022-08-20 23:54:12,706 INFO Downloading https://download-0002-clone.copernicus-climate.eu/cache-compute-0002/cache/data2/adaptor.esgf_wps.retrieve-1661036051.087838-16171-20-0a86fd39-5382-45c4-802b-6af1f6634fca.zip to ncep_data/cmip6_daily_2050-2070_ipsl_ssp5_8_5_daily_minimum_near_surface_air_temperature.zip (6.7M)
2022-08-20 23:54:13,873 INFO Download rate 5.7M/s                               


### 1.3. Download observations

As observational reference we have decided on the NCEP/DOE Reanalysis II on daily temporal resolution:

In [16]:
# Variable name. Needs to be one of the NCEP-names in https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis2/Dailies/gaussian_grid/
variables = ["air.2m.gauss", "tmax.2m.gauss", "tmin.2m.gauss", "prate.sfc.gauss", "shum.2m.gauss"]

lat = [20, 80]
lon = [3, 30]

# Range of years to download
years = list(range(1979, 1981))

for variable in variables:
    print(f"Requesting {variable}")
    
    # Download data year by year
    filenames_for_cleanup = []
    for year in years:
        url = f"https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis2/Dailies/gaussian_grid/{variable}.{str(year)}.nc"
        filename = f"{DATADIR}/{variable}_{str(year)}.nc"
        # Download nc file
        urllib.request.urlretrieve(url, filename)
        # Append filename to list of filenames to cleanup
        filenames_for_cleanup.append(filename)

    # Combine data for variable
    combined_data = xarray.open_mfdataset(f"{DATADIR}/{variable}_*.nc", combine = 'nested', concat_dim="time")
    combined_data = combined_data.sel(lon=slice(lon[0], lon[1]),lat=slice(lat[1], lat[0]))
    combined_data.to_netcdf(f"{DATADIR}/{variable}_{str(min(years))}_{str(max(years))}.nc")

    # Cleanup
    for filename in filenames_for_cleanup:
        os.remove(filename)

Requesting tmin.2m.gauss


## 2. Regrid data 

Now that we have data on the same temporal resolution for both the climate model and observations we need to make sure they are also on the same spatial one and regrid the datasets. We will use iris for that, however there are also xarray solutions.