Here we will demonstrate using a WRF datafile with the Xarray module and the manipulations necessary to get the dataset into CF conventions understood by Xarray.

In [ ]:
import xarray as xr
import matplotlib.pyplot as plt
import pandas as pd
from ngallery_utils import DATASETS

Take a look at the WRF dataset

In [ ]:
path = DATASETS.fetch("T2_RR_F_2014_08.nc ")
ds_wrf = xr.open_dataset(path)
ds_wrf

# Problem #1: Time in bytes.

Currently our time information is stored in the 'Times' variable in string byte formatting. We need to create a time coordinate in datetime64 formatting.

Let's take a look at our time variable so far:

In [ ]:
da_time = ds_wrf['Times']
da_time

If the time coordinate is not in datetime64 formatting you cannot use some of the time-aware functionality in Xarray (see blog post [here](https://ncar.github.io/xdev/posts/time/)).

For example, the `xarray.DataArray.sel` functionality will fail.
```
prc_oct14_oct15 = ds_wrf['PREC_ACC_NC'].sel(Times=slice('2014-10-01', '2015-09-30'))
```
returns the error:
```
ValueError: dimensions or multi-index levels ['Times'] do not exist
```

This isn't that surprising. `.sel` works along a dimension, and our variable `Times` is just that, a variable, with dimension `Time`. If we use `.sel` along the `Time` dimension we see:

```
prc_oct14_oct15 = ds_wrf['PREC_ACC_NC'].sel(Time=slice('2014-10-01', '2015-09-30'))
```
returns the error:
```
TypeError: 'str' object cannot be interpreted as an integer
```

This is because our `Time` dimension is an index list.

In [ ]:
ds_wrf['Time']

So let's convert our `Times` variable to datetime64 using `pandas.to_datetime`:

```
time_datetime = pd.to_datetime(da_time)
```
returns the error:
```
TypeError: <class 'bytes'> is not convertible to datetime
```

Available arguments for `to_datetime` are: 
- integer
- float
- string
- datetime
- list
- tuple
- 1-d array
- Series

So let's convert to a string!

In [ ]:
time_strs = [str(i.values)[1:] for i in da_time]
time_strs

But it isn't that simple!
```
time_datetime = pd.to_datetime(time_strs)
```

returns the error:
```
ValueError: ('Unknown string format:', "'2014-08-01_00:00:00'")
```

We need to remove those pesky underscores.

In [ ]:
time_strs = [str(i.values)[1:].replace("_"," ") for i in da_time]
time_strs

In [ ]:
time_datetime = pd.to_datetime(time_strs)
time_datetime

Now we have our time values. Let's rename our dimension `Time` to `time` to match conventions, assign our new `time` coordinate, and drop the `Times` variable.

In [ ]:
ds_wrf_timedim = ds_wrf.rename({'Time':'time'})
ds_wrf_timedim

In [ ]:
ds_wrf_timecoord = ds_wrf_timedim.assign(time=time_datetime)
ds_wrf_timecoord

In [ ]:
ds_wrf_dropped_Times = ds_wrf_timecoord.drop('Times')
ds_wrf_dropped_Times

A review of the steps for dealing with WRF time:
```
da_time = ds_wrf['Times']
time_strs = [str(i.values)[1:].replace("_"," ") for i in da_time]
time_datetime = pd.to_datetime(time_strs)

ds_wrf_timedim = ds_wrf.rename({'Time':'time'})
ds_wrf_timecoord = ds_wrf_timedim.assign(time=time_datetime)
ds_wrf_dropped_Times = ds_wrf_timecoord.drop('Times')
```

# Problem #2: NO COORDINATES

We need to pull in our lat/lon information from a separate geo file.

In [ ]:
file_geo = 'wrfinput_d02'
ds_geo = xr.open_dataset(file_geo)
ds_geo

What are our coordinates here? 
 - XLAT and XLONG -- these are our Latitude and Longitude values
 - XLAT_U and XLONG_U -- Lat and Long with a staggered west-east grid
 - XLAT_V and XLONG_V -- Lat and Long with a staggered north-south grid
 
We're going to use `XLAT` and `XLONG`.

In [ ]:
ds_geo.coords['XLAT']

`XLAT` and `XLONG` have 3 dimensions (a unit dimension `Time` which we will squeeze out, `south_north` which we will rename `y`, `west_east` which we will rename `x`).

If we assign the coords as is:
```
ds_wrf_w_latlon = ds_wrf.assign_coords(lat=ds_geo.coords['XLAT'], lon=ds_geo.coords['XLONG'])
```
we get the error:
```
ValueError: conflicting sizes for dimension 'Time': length 1 on 'XLAT' and length 38375 on 'PREC_ACC_NC'
```
So we remote the unit dimension `Time` with `.squeeze`

In [ ]:
ds_wrf_w_latlon = ds_wrf_dropped_Times.assign_coords(lat=ds_geo.coords['XLAT'].squeeze('Time'), lon=ds_geo.coords['XLONG'].squeeze('Time'))
ds_wrf_w_latlon

In [ ]:
da_land = ds_geo.LANDMASK.squeeze('Time')
da_lake = ds_geo.LAKEMASK.squeeze('Time')

ds_wrf_w_masks = ds_wrf_w_latlon.assign_coords(landmask=da_land, lakemask=da_lake)
ds_wrf_w_masks

To follow conventions let's rename `south_north` to `y` and `west_east` to `x`.

In [ ]:
ds_wrf_rename_latlon = ds_wrf_w_masks.rename({'south_north':'y', 'west_east':'x'})
ds_wrf_rename_latlon

We now have duplicate coordinates (`XLAT` and `lat`, `XLONG` and `lon`), let's drop the `XLAT` and `XLONG` coordinates.

In [ ]:
ds_wrf_dropxlatlon = ds_wrf_rename_latlon.drop(['XLAT', 'XLONG'])
ds_wrf_dropxlatlon

And voila we have our WRF dataset in a format usable with the rest of the xarray and Pangeo tools.
Let's review the steps:

```
ds_wrf_w_latlon = ds_wrf_dropped_Times.assign_coords(
        lat=ds_geo.coords['XLAT'].squeeze('Time'),
        lon=ds_geo.coords['XLONG'].squeeze('Time'),
        landmask=ds_geo.LANDMASK.squeeze('Time'), 
        lakemask=ds_geo.LAKEMASK.squeeze('Time'))
ds_wrf_rename_latlon = ds_wrf_w_latlon.rename({'south_north':'y', 'west_east':'x'})
ds_wrf_dropxlatlon = ds_wrf_rename_latlon.drop(['XLAT', 'XLONG'])
```


    

Let's plot masked mean precipitation values between October '14 and October '15.

In [ ]:
ds_wrf_cf = ds_wrf_dropxlatlon

prc = ds_wrf_cf['PREC_ACC_NC'].sel(time=slice('2014-10-01', '2015-09-30'))
prc_mean = prc.mean('time')

da_mask = ds_wrf_cf['landmask'].where(ds_wrf_cf['lakemask'].values == 0, other=1)

In [ ]:
fig = plt.figure(figsize=(10, 8))

prc_mean.where(da_mask.values == 1).plot(x='lon', y='lat')

Further reading:
https://www.unidata.ucar.edu/blogs/developer/en/entry/wrf_goes_cf_two