# MERRA2 Analysis Process

This Jupyter notebook provides a brief overview of how to use the **geodata** package to download MERRA2 climate data, create geographic-temporal subsets called cutouts, and use those cutouts to generate standalone datasets for separate analysis.

*The following guide assumes you have installed and configured **geodata** and all required dependencies.*

## Step 1 - Setup

Import the package first.

In [None]:
import geodata

Notifications in **geodata** are implemented using `loggers` from the `logging` library.
It is recommended to always launch a logger to get information on what is going on. For debugging, you can use the more verbose `level=logging.DEBUG`:

In [None]:
import logging

logging.basicConfig(level=logging.INFO)

## Step 2 - Download

Assuming you have previously created an Earthdata Login profile and approved the GES DISC app, you can download MERRA2 data from the source as follows.

First, define a dataset object for the data you wish to download:

In [None]:
DS = geodata.Dataset(
    module="merra2",
    weather_data_config="surface_flux_monthly",
    years=slice(2010, 2010),
    months=slice(1, 7),
)

* Use `module` to specify the data source. In this example, it is "merra2".
* Use `weather_data_config` to specifiy the dataset.  In this example, it is the [MERRA2 monthly mean, single-level surface flux diagnostics](https://disc.gsfc.nasa.gov/datasets/M2TMNXFLX_5.12.4/summary)
    * To download the [MERRA2 hourly, single-level surface flux diagnostics](https://disc.gsfc.nasa.gov/datasets/M2T1NXFLX_5.12.4/summary), specify `weather_data_config = "surface_flux_hourly"`.
* Use `years=slice()` and `months=slice()` to specify the years and months for download.  In each parameter, the first value indicates the start period, and the second value the end period.

Use the code block below to begin the download.

When a `dataset` object is created, **geodata** performs a check to see if the data specified has already been downloaded by checking for the existence of MERRA2 datafiles in the `merra2` directory configured in `src/geodata/config.py` (downloaded data is placed into subdirectories by year and then - for daily files - by month, ie `2011/01, 2011/02, 2012/01`, etc).  Monthly files are simply placed in the month's folder.  If downloaded data is found, the `prepared` attribute is set to `True` upon `dataset` object declaration.

Accordingly, the snippet below saves you the trouble of accidentally redownloading data if it is already present in the correct subdirectories.

In [None]:
if DS.prepared == False:
    DS.get_data()

Finally, in order to use the downloaded MERRA2 data with **geodata**, run:

In [None]:
DS.trim_variables()

`trim_variables()` subsets and resaves the downloaded files so that only those variables needed to generate **geodata** outputs are kept.

## Step 3 - Create Cutout

A cutout is a subset of downloaded data based on specified time periods and geographic coordinates.  Cutouts are saved to the cutout directory specified in `src/geodata/config.py` and can be used to generate multiple outputs.

*Note: 04/02/2020 - There is a known issue with MERRA2-based cutouts where running `cutout.prepare(overwrite=True)` on an existing cutout prevents the cutout from being used to generate outputs.  A workaround is to manually delete the problem cutout and recreate it from scratch.  A fix is planned pending investigation.

To create a cutout, run the following:

In [None]:
cutout = geodata.Cutout(
    name="tokyo-2010-test",
    module="merra2",
    weather_data_config="surface_flux_monthly",
    xs=slice(138.5, 139.5),
    ys=slice(35, 36),
    years=slice(2010, 2010),
    months=slice(7, 7),
)
cutout.prepare()

The above code creates a cutout for July 2010 for a geographic area roughly corresponding to the Tokyo metropolitan area. Walking through the parameters:

* `name` will be the name of the directory created in the cutouts folder where **geodata** will place the data files corresponding to the cutout.
* `module` indicates the source for the data from which the cutout is created.
* `weather_data_config` indicates the specific dataset from the source.  For MERRA2, the available options are `surface_flux_hourly` and `surface_flux_monthly`.
* Use `xs=slice()` and `ys=slice()` to define a geographical range for the cutout.
* Use `years=slice()` and `months=slice()` to define a temporal range for the cutout.  Naturally, the indicated time range must be present within the source data.

`geodata.Cutout()` only defines the cutout object in memory.  To actually create the cutout files, run `prepare()`.  
As with `get_data()`, `prepare()` will first perform a check to see if a cutout has already been created at the same specified, and will exit the creation process if a cutout already exists.  To override this behavior and force a recalculation of the cutout, run `prepare(overwrite=True)`.

To verify the results of the cutout, you can print some attributes to the console as follows.

Basic information:

In [None]:
cutout

Name:

In [None]:
cutout.name

Coordinates:

In [None]:
cutout.coords

All metadata:

In [None]:
cutout.meta

Information about the variable config used to download the data:

In [None]:
cutout.dataset_module.weather_data_config

For Merra2, you can confirm variables downloaded this way:

In [None]:
cutout.dataset_module.weather_data_config["surface_flux_monthly"]["variables"]

## Step 4 - Generate Outputs

**geodata** currently supports the following wind outputs using MERRA2 surface flux diagnostic data.
* Wind generation time-series (`wind`)
* Wind speed time-series (`windspd`)
* Wind power density time-series (`windpwd`)

### Wind Generation Time-series
Convert wind speeds for turbine to wind energy generation using the following code:

In [None]:
ds_wind = geodata.convert.wind(
    cutout, turbine="Suzlon_S82_1.5_MW", smooth=True, var_height="lml"
)

Going over the parameters:

* `cutout` - **string** -  A cutout created by `geodata.Cutout()`
* `turbine` - **string or dict** - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys 'hub_height' for the hub height and 'V', 'POW' defining the power curve.  For a full list of currently supported turbines, see [the list of Turbines here.](https://github.com/east-winds/geodata/tree/master/geodata/resources/windturbine)
* `smooth` - **bool or dict** - If True smooth power curve with a gaussian kernel as determined for the Danish wind fleet to Delta_v = 1.27 and sigma = 2.29. A dict allows to tune these values.

*Note* - 
You can also specify all of the general conversion arguments documented in the `convert_and_aggregate` function (e.g. `var_height='lml'`).

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

In [None]:
ds_wind

To convert this array to a more conventional dataframe, run:

In [None]:
df_wind = ds_wind.to_dataframe(name="wind")

which converts the xarray dataset into a pandas dataframe:

In [None]:
df_wind

To output the data to a csv for separate analysis:

In [None]:
df_wind.to_csv("merra2_wind_data.csv")

Extract wind speeds at given height (ms-1)

In [None]:
ds_windspd = geodata.convert.windspd(
    cutout, turbine="Vestas_V66_1750kW", var_height="lml"
)

Going over the parameters:

* `cutout` - **string** -  A cutout created by `geodata.Cutout()`
* `**params` - Must have 1 of the following:
    - `turbine` - **string or dict** - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys 'hub_height' for the hub height and 'V', 'POW' defining the power curve.  For a full list of currently supported turbines, see [the list of Turbines here.](https://github.com/east-winds/geodata/tree/master/geodata/resources/windturbine)
    - `hub-height` - **num** - Extrapolation height (m)
    
*Note* - 
You can also specify all of the general conversion arguments documented in the `convert_and_aggregate` function (e.g. `var_height='lml'`).

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

In [None]:
ds_windspd

To convert this array to a more conventional dataframe, run:

In [None]:
df_windspd = ds_windspd.to_dataframe(name="windspd")

which converts the xarray dataset into a pandas dataframe:

In [None]:
df_windspd

To output the data to a csv for separate analysis:

In [None]:
df_windspd.to_csv("merra2_windspd_data.csv")

### Wind Power Density Time-series

Extract wind power density at given height, according to:
**WPD = 0.5 * Density * Windspd^3**

In [None]:
ds_windwpd = geodata.convert.windwpd(
    cutout, turbine="Vestas_V66_1750kW", var_height="lml"
)

Going over the parameters:

* `cutout` - **string** -  A cutout created by `geodata.Cutout()`
* `**params` - Must have 1 of the following:
    - `turbine` - **string or dict** - Name of a turbine known by the reatlas client or a turbineconfig dictionary with the keys 'hub_height' for the hub height and 'V', 'POW' defining the power curve.  For a full list of currently supported turbines, see [the list of Turbines here.](https://github.com/east-winds/geodata/tree/master/geodata/resources/windturbine)
    - `hub-height` - **num** - Extrapolation height (m)
    
*Note* - 
You can also specify all of the general conversion arguments documented in the `convert_and_aggregate` function (e.g. `var_height='lml'`).

The convert function returns an xarray dataset, which is an in-memory representation of a NetCDF file.

In [None]:
ds_windwpd

To convert this array to a more conventional dataframe, run:

In [None]:
df_windwpd = ds_windwpd.to_dataframe(name="windwpd")

which converts the xarray dataset into a pandas dataframe:

In [None]:
df_windwpd

To output the data to a csv for separate analysis:

In [None]:
df_windwpd.to_csv("merra2_windwpd_data.csv")