<img width='50' src='https://carbonplan-assets.s3.amazonaws.com/monogram/dark-small.png' style='margin-left:0px;margin-top:20px'/>

# Accessing CarbonPlan CMIP6 downscaled climate datasets

Authors: Oriana Chegwidden and Max Jones

This notebook offers users examples of accessing and working with CarbonPlan's downscaled climate datasets. The dataset collection is further described in an [explainer article](https://carbonplan.org/research/cmip6-downscaling-explainer). Monthly and annual summaries of the data products are visible in an [interactive mapping tool](https://cmip6.carbonplan.org/). We recommend using Python to interact with the datasets. Below we show examples of reading the data, performing basic visualization, and downloading subsets in space and time. We welcome further requests for interaction and encourage [feedback via GitHub](https://github.com/carbonplan/cmip6-downscaling/issues)!

In [None]:
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
import regionmask
import cartopy.crs as ccrs
from carbonplan import styles  # noqa: F401
import intake

In [None]:
from cmip6_downscaling.analysis.analysis import (
    grab_big_city_data,
    load_big_cities,
)
from cmip6_downscaling.analysis.plot import plot_city_data

xr.set_options(keep_attrs=True)

## Loading the data

Let's load in the catalog of datasets available in this release.

In [None]:
cat = intake.open_esm_datastore(
    "https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/catalog/osn-rechunked-global-downscaled-cmip6.json"
)

We can inspect the contents according to different methods.

In [None]:
cat_subset = cat.search(method="GARD-SV")
cat_subset.df.head()

Now let's specify what models we're interested in. We're going to select a daily maximum temperature run from the `MRI-ESM2-0` GCM and the `SSP2-4.5` future scenario, downscaled using the `GARD-MV` method.

In [None]:
cat_subset = cat.search(
    method="GARD-MV",
    source_id="MRI-ESM2-0",
    experiment_id="ssp245",
    variable_id="tasmax",
)

In [None]:
dsets = cat_subset.to_dataset_dict()
dsets

And now let's load that dataset into the notebook.

In [None]:
ds = dsets["ScenarioMIP.MRI.MRI-ESM2-0.ssp245.day.GARD-MV"]

In [None]:
china_region = {'lat': slice(18, 54), 'lon': slice(17, 135)}

In [None]:
ds.tasmax.isel(time=0).sel(china_region).plot()

In [None]:
ds.sel(china_region).tasmax.plot()

## Visualizing the data

The temperature data are in units of Kelvin. Let's convert to Celcius to make it easer to interpret.

In [None]:
ds -= 273.15

### Plotting maps

Let's load in a land mask and a projection for mapping.

In [None]:
land = regionmask.defined_regions.natural_earth_v5_0_0.land_110
projection = ccrs.PlateCarree()

We'll mask out the ocean values and load data for a single timestep (August 1, 2089) and a region of interest. We'll start with the East Africa region we reference in Figure 1 of the companion web article.

In [None]:
east_africa_region = {"lat": slice(-3, 17), "lon": slice(17, 57)}
east_africa_tasmax = ds.tasmax.sel(time="2089-08-01").sel(**east_africa_region).load()
east_africa_tasmax = east_africa_tasmax.where(land.mask(east_africa_tasmax) == 0)

And now let's plot that single timeslice

In [None]:
fig, ax = plt.subplots(subplot_kw=dict(projection=projection), figsize=(10, 4))
east_africa_tasmax.plot(
    cbar_kwargs=dict(label=r"Maximum temperature $^\circ$C"),
    cmap="warm_dark",
)
ax.coastlines()
ax.set_xticks([20, 30, 40, 50], crs=projection)
ax.set_xlabel(r"Longitude ($^\circ$E)")
ax.set_yticks([0, 5, 10, 15], crs=projection)
ax.set_ylabel(r"Latitude ($^\circ$N)")
plt.show()

Now let's do the same thing but for the whole globe. And while we're at it, let's grab the annual mean temperatures of that daily data for a 30 year period at the end of the century (the '2080s'). Caution - this could take a while (i.e., minutes)- it's a lot of data! We'll define two timeslices (we'll use the second one later in the notebook).

In [None]:
time_slices = {"2030s": slice("2020", "2049"), "2080s": slice("2070", "2099")}

In [None]:
# Calculate the 30-year mean bracketing the 2080s.
tasmax_2080s = ds.tasmax.sel(time=time_slices["2080s"]).mean(dim="time").load()
# mask as above
tasmax_2080s = tasmax_2080s.where(land.mask(tasmax_2080s) == 0).load()

In [None]:
fig, ax = plt.subplots(subplot_kw=dict(projection=projection), figsize=(10, 4))
tasmax_2080s.plot(
    cbar_kwargs=dict(label=r"Mean daily maximum temperature $^\circ$C"),
    cmap="warm_dark",
)
ax.coastlines()
plt.show()

### Plotting timeseries

Let's look at the data at some individual points! After all, one of the main goals of downscaling is to provide more local information. We'll grab timeseries from 20 big cities around the world to explore what climate change might look like for them.

In [None]:
big_cities = load_big_cities(num_cities=20, add_additional_cities=False, plot=True)

In [None]:
[downscaled_cities] = grab_big_city_data([ds], big_cities)

Let's plot a timeseries of the daily data at just one of those 20 cities. Let's look at Tokyo.

In [None]:
ts = downscaled_cities.sel(cities="Tokyo").tasmax
ts.plot()
plt.ylabel(r"Daily maximum temperature $^\circ$C")
plt.show()

That's a lot of daily data though- we can make it clearer by summarizing it into a seasonal cycle. Let's comparing 30 year periods near the start and end of the 21st century to see this model's projection of changes in temperature at this location.

In [None]:
fig, ax = plt.subplots()
for label, time_slice in time_slices.items():
    ts.sel(time=time_slice).groupby("time.month").mean().plot(label=label)
ax.set_xticks(np.arange(1, 13))
ax.set_xticklabels(["J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D"])
plt.legend()
plt.ylabel(r"Mean daily maximum temperature $^\circ$C")
plt.xlabel("")
plt.show()

Now let's do that for all of the cities we grabbed to get a sense of how popualtion centers around the world will fare. We'll first plot the seasonal cycle to show the projected change from the 2030s to the 2080s.

In [None]:
plot_city_data(
    downscaled_cities.tasmax,
    time_slices=time_slices,
    aggregation="seasonal_cycle",
    ylabel=r"Mean daily maximum temperature $^\circ$C",
)

Now let's look at the annual means for the entire 21st century.

In [None]:
plot_city_data(
    downscaled_cities.tasmax,
    aggregation="annual",
    ylabel=r"Mean daily maximum temperature $^\circ$C",
)

## Downloading the data

And now let's download one of these daily timeseries to work with on our own computer.

In [None]:
ts.to_dataframe().reset_index().drop(columns=["member_id", "cities", "lat", "lon"]).set_index(
    "time"
).to_csv("tokyo.csv")

We can also download a small regional subset as a netcdf file locally. Note that this might be quite large. We'll first check out how big it is.

In [None]:
print("Dataset is {} GB".format(ds.sel(**east_africa_region).nbytes * 1e-9))

If you want to save it locally switch the flag to `True`.

In [None]:
save_subset = False
if save_subset:
    ds.sel(**east_africa_region).to_netcdf("region.nc")