# Understanding Climate Data on the Analytics Engine
This notebook is a walkthrough of how to utilize different kinds of climate data, including weather observations, reanalysis products, and model output available on the Analytics Engine. 
* Weather observations are inherently point locations, tied to a single station location, and represent the actual values of weather variables. Weather observations are highly localized weather information, and are limited by instrumentation constraints.
* Reanalysis products are reconstructions of the historical weather observation period. Limitations...
* Climate model output is a representation of a time period, and are intentionally not designed to recreate the trends seen in weather observations, but capture one possible realization of those trends. The variability in individual climate model realizations is how we are able to determine the range of potential future realities.  

**Intended Application**: As a user, I want to be able to understand the **strengths and weaknesses of comparing observations, reanalysis, and model output** by:
1. Visualizing observations to reanalysis
2. Visualizing observations to climate model output

**Runtime**: This notebook is in active development, and the runtime may vary as complexity is added. 

### Step 0: Set-up

In [None]:
import climakitae as ck
import xarray as xr
import pandas as pd
import numpy as np

import hvplot.pandas  # noqa
pd.options.plotting.backend = 'holoviews'

### Step 1: Select data
First we will retrieve precipitation datay using LOCA2-Hybrid models: FOGALS-g3, EC-Earth3-Veg, and CNRM-ESM2-1 for a single location. 

In [None]:
selections = ck.Select()

selections.downscaling_method = 'Statistical'
selections.variable = 'Precipitation (total)'
selections.timescale = 'monthly'
selections.units = 'inches'
selections.resolution = '3 km'
selections.time_slice = (1950, 2002)
selections.area_subset = 'lat/lon'
selections.cached_area = ['coordinate selection']
selections.area_average = 'No'
selections.latitude = (34.067 - 0.02, 34.067 + 0.02) # specifically at station coordinates, with small buffer
selections.longitude = (-117.65 - 0.02, -117.65 + 0.02) # specifically at station coordinates, with small buffer

In [None]:
# retrieving data
ds = selections.retrieve()

# subset for models of interest
mdls = ['LOCA2_EC-Earth3_r1i1p1f1', 'LOCA2_FGOALS-g3_r1i1p1f1', 'LOCA2_CNRM-ESM2-1_r1i1p1f2']
ds = ds.sel(simulation = mdls)

# loading into memory -- will take a few minutes! 
ds = ck.load(ds)
ds = ds.squeeze()
ds

Read in weather observations for comparison. In this example, we are looking at observations from a weather station near Ontario, in San Bernadino County.

In [None]:
wx_obs = pd.read_csv('1026_data_cleaned.csv') ## we'll use the "total_precipitation_in" column for comparison

# adding an easy to interpret time (month-year) column so we can compare side by side
wx_obs['day'] = 1 # using first of the month for ease
wx_obs['time'] = pd.to_datetime(wx_obs[['year', 'month', 'day']])
wx_obs = wx_obs.drop(columns=['year', 'month', 'day']) # minor cleanup
wx_obs

### Step 2: Visualize trends between observations and model output

In [None]:
models_to_plot = ds.hvplot.line(x='time', by='simulation', title='Observations to Model Comparison');
obs = wx_obs.hvplot(x='time', y='total_precipitation_in', color='black', label='Observations');

models_to_plot * obs

In [None]:
# lets zoom in on a particular year to really focus our analysis
# you can always use the zoom tool too!

Key takeaways here:
* Climate model output trends may look very different from observations! This is okay -- it's by design!
* An ordered timeseries from a climate model will never match the observational timeseries.
* *Coming soon* Looking at reanalysis will illustrate those historical observed trends better, because that is how reanalysis datasets are intended to be used!

In [None]:
# first we'll subset for precipitation events above 1mm to remove the "drizzle signal"
ds = ds.clip(min=1.)

# mask <1mm in weather obs
valid_obs = wx_obs.loc[wx_obs['total_precipitation_in'] > 1]

In [None]:
# manual version, ugly
m1 = ds.sel(simulation='LOCA2_EC-Earth3_r1i1p1f1').hvplot.hist(title='Observations to Model Comparison', alpha=0.3)
m2 = ds.sel(simulation='LOCA2_FGOALS-g3_r1i1p1f1').hvplot.hist(alpha=0.3)
m3 = ds.sel(simulation='LOCA2_CNRM-ESM2-1_r1i1p1f2').hvplot.hist(alpha=0.3)

obs_hist = valid_obs['total_precipitation_in'].hvplot.hist(color='black', label='Observations')

m1 * m2 * m3 * obs_hist

In [None]:
# slight improvement, looking to add model labels if possible
plot = None
for sim in ds.simulation:
    curr_plot = ds.sel(simulation=sim).hvplot.hist(alpha=0.3, title='Obs to Model Comparison', label=sim.item())
    plot = curr_plot if plot is None else plot * curr_plot
    
plot * obs_hist

Some final key takeaway messages -- *in development*