# Access CCMP data on Pangeo

## [More info](https://podaac.jpl.nasa.gov/MEaSUREs-CCMP?sections=about)

The Cross-Calibrated Multi-Platform (CCMP) Ocean Surface Wind Vector Analyses is part of the NASA Making Earth System Data Records for Use in Research Environments (MEaSUREs) Program. MEaSUREs, develops consistent global- and continental-scale Earth System Data Records by supporting projects that produce data using proven algorithms and input.

CCMP (Atlas et al., 2011) provides a consistent, gap-free long-term time-series of ocean surface wind vector analysis fields from July 1987 through June 2011. The CCMP datasets combine cross-calibrated satellite winds using a Variational Analysis Method (VAM) to produce a high-resolution (0.25 degree) gridded analysis.

Reference: Atlas, R., R. N. Hoffman, J. Ardizzone, S. M. Leidner, J. C. Jusem, D. K. Smith, D. Gombos, 2011: A cross-calibrated, multiplatform ocean surface wind velocity product for meteorological and oceanographic applications. Bull. Amer. Meteor. Soc., 92, 157-174. doi: 10.1175/2010BAMS2946.1

Data is NRT from 4/1/2019 - present.

## Credit:
- [Chelle Gentemann](mailto:cgentemann@faralloninstitute.org), [Farallon Institute](http://www.faralloninstitute.org/), [Twitter](https://twitter.com/ChelleGentemann) - creation of Zarr data store and tutorial
- [Charles Blackmon Luca](mailto:blackmon@ldeo.columbia.edu), [LDEO](https://www.ldeo.columbia.edu/) - help with moving to Pangeo storage and intake update
- [Willi Rath](mailto:wrath@geomar.de), [GEOMAR](https://www.geomar.de/en/), [Twitter](https://twitter.com/RathWilli) - motivated CG to move data to Pangeo!

In [None]:
#libs for reading data
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import intake
import dask

#libs for dask gateway
from dask_gateway import Gateway
from dask.distributed import Client

### Start a cluster, a group of computers that will work together.

(A cluster is the key to big data analysis on on Cloud.)

- This will set up a [dask kubernetes](https://docs.dask.org/en/latest/setup/kubernetes.html) cluster for your analysis and give you a path that you can paste into the top of the Dask dashboard to visualize parts of your cluster.  
- You don't need to paste the link below into the Dask dashboard for this to work, but it will help you visualize progress.
- Try 20 workers to start (during the tutorial) but you can increase to speed things up later

In [None]:
gateway = Gateway()
cluster = gateway.new_cluster()
cluster.adapt(minimum=1, maximum=80)
client = Client(cluster)
cluster

** ☝️ Don’t forget to click the link above or copy it to the Dask dashboard ![images.png](attachment:images.png) on the left to view the scheduler dashboard! **

### Initialize Dataset

Here we load the dataset from the zarr store. Note that this very large dataset (273 GB) initializes nearly instantly, and we can see the full list of variables and coordinates.

### Examine Metadata

For those unfamiliar with this dataset, the variable metadata is very helpful for understanding what the variables actually represent
Printing the dataset will show you the dimensions, coordinates, and data variables with clickable icons at the end that show more metadata and size.

In [None]:
%%time
cat_pangeo = intake.open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml")

ds_ccmp = cat_pangeo.atmosphere.nasa_ccmp_wind_vectors.to_dask()

ds_ccmp['wspd'] = np.sqrt(ds_ccmp.uwnd**2 + ds_ccmp.vwnd**2)

ds_ccmp

# Plot a timeseries of the average wind speed over a region

In [None]:
%%time

ds_ccmp.sel(latitude=slice(30,50),longitude=slice(200,210)).mean({'latitude','longitude'}).wspd.plot()

# Plot a map of the annual average wind speed

In [None]:
%%time

ds_ccmp.sel(time=slice('2000-01-01','2000-12-31')).mean({'time'}).wspd.plot()

# Make a Hovmoller type plot

In [None]:
%%time

ds_ccmp.sel(latitude=0.125,longitude=slice(120,275)).wspd.plot(vmin=3,vmax=15,cmap='seismic')

# Make timeseries of global wind speed trends
- create climatology
- remove annual cycle
- calculate anomaly
- calculate trends

In [None]:
ccmp_climatology = ds_ccmp.groupby('time.dayofyear').mean('time',keep_attrs=True,skipna=False)
ccmp_anomaly = ds_ccmp.groupby('time.dayofyear')-ccmp_climatology


In [None]:
ccmp_anomaly

In [None]:
ccmp_anomaly.wspd[:,350,720].plot()

from sklearn.linear_model import LinearRegression
#linear regression to PSD
XX = ccmp_anomaly.time.data
YY = 
reg = LinearRegression().fit(XX.reshape(-1, 1), YY)
a = float(reg.coef_)
b = -1*float(reg.intercept_)
plt.loglog(f[istart:iend], f[istart:iend]**(a)/np.exp(b),'r') #test from fit


In [None]:
# Do a first-degree polyfit
regressions = np.polyfit(np.arange(48208), ccmp_anomaly.wspd, 1)
# Get the coefficients back
trends = regressions[0,:].reshape(vals.shape[1], vals.shape[2])


In [None]:
ccmp_anomaly

# Compare to buoy data
Data from NDBC buoys, which measure wind speed are [here](https://dods.ndbc.noaa.gov/) and can be read via an THREDDS server.
- read in NDBC buoy data
- find closest CCMP data and linearly interpolate to buoy location
- examine a timeseries, caluclate bias and STD

In [None]:
url='https://dods.ndbc.noaa.gov/thredds/dodsC/data/cwind/51000/51000.ncml'
ds = xr.open_dataset(url)
# The longitude is on a -180 to 180, CCMP is 0-360, so make sure to convert
ds.coords['longitude'] = np.mod(ds['longitude'], 360)
ds_buoy = ds
ds_buoy

In [None]:
ccmp_buoy = ds_ccmp.interp(latitude=ds_buoy.latitute,longitude=ds_buoy.longitude,method='linear')
data_bias = (ds_buoy.wind_spd - ccmp_buoy.wspd).mean()
data_std = (ds_buoy.wind_spd - ccmp_buoy.wspd).mean()
print(data_bias,data_std)

In [None]:
ds_buoy.wind_spd.plot(color='b')
ccmp_buoy.wspd.plot(color='r')

In [None]:
plt.scatter(ds_buoy.wind_spd - ccmp_buoy.wspd,c=ds_buoy.wind_spd)