# Calculating a Typical Meteorological Year
<br>This notebook walks through the process of calculating a [Typical Meteorological Year](https://nsrdb.nrel.gov/data-sets/tmy), an hourly dataset used for applications in energy and building systems modeling. Because this represents average rather than extreme conditions, an TMY dataset is not suited for designing systems to meet the worst-case conditions occurring at a location. 

The TMY methodology here mirrors that of the Sandia/NREL TMY3 methodology, and uses historic and projected downscaled climate data available through the Cal-Adapt: Analytics Engine catalog. As this methodology heavily weights the solar radiation input data, be aware that the final selection of "typical" months may not be typical for other variables. 

**Intended Application** As a user, I want to <span style="color:#FF0000">**generate a typical meteorological year file**</span> for a location of interest:
- Understand the methods that are involved in generating a TMY dataset
- Visualize the TMY dataset across all input variables
- Export the TMY dataset for available models for input into my workflow

**Note**: 
1. This notebook is a **full demonstration** of the Typical Meteorological Year methodology, for full transparency. For practical generation of a TMY dataset, a user <span style="color:#FF0000">**only needs to provide 2 elements**</span>: the **location**, and **reference time period**. These selections are highlighted below for you. 
2. The Analytics Engine at present has an *Average Meteorological Year* functionality. The methods shown throughout this notebook will soon replace the underlying backend `climakitae` code for the AMY in order to better address our user needs, i.e., we are working to replace the AMY with the TMY methods. We provide this walkthrough to demonstrate confidence in the "AMY to TMY" conversion process for our users in the meantime. 

**Runtime**: With the default settings, this notebook takes approximately **50 minutes** to run from start to finish. Modifications to selections may increase the runtime.

In [1]:
%reload_ext autoreload
%autoreload 2

### Step 0: Set-up

Import the [climakitae](https://github.com/cal-adapt/climakitae) library and other dependencies.

In [2]:
from climakitae.explore.typical_meteorological_year import TMY
from climakitaegui.explore.typical_meteorological_year import plot_one_var_cdf
from climakitae.util.utils import read_csv_file
import pandas as pd

import warnings

warnings.filterwarnings("ignore")

### Step 1: Grab and process all required input data

The [TMY3 method](https://www.nrel.gov/docs/fy08osti/43156.pdf) selects a "typical" month based on ten daily variables: max, min, and mean air and dew point temperatures, max and mean wind speed, global irradiance and direct irradiance.  

#### Step 1a: Select location of interest
TMYs are calculated for a specific location of interest, like a building or power plant. Here, we will use a known weather station location, via their latitude and longitude to extract the data that we need to calculate the TMY. In the example below, we will look specifically at Los Angeles International Airport, but will note in the code below how you can provide your own location coordinates too. 

First we display a list of available stations:

In [3]:
# read in station file of CA HadISD stations
stn_file=read_csv_file("data/hadisd_stations.csv")
# Display station names
list(stn_file["station"])

['Bakersfield Meadows Field (KBFL)',
 'Blythe Asos (KBLH)',
 'Burbank-Glendale-Pasadena Airport (KBUR)',
 'Needles Airport (KEED)',
 'Fresno Yosemite International Airport (KFAT)',
 'Imperial County Airport (KIPL)',
 'Los Angeles International Airport (KLAX)',
 'Long Beach Daugherty Field (KLGB)',
 'Modesto City-County Airport (KMOD)',
 'San Diego Miramar Wscmo (KNKX)',
 'Oakland Metro International Airport (KOAK)',
 'Oxnard Ventura County Airport (KOXR)',
 'Palm Springs Regional Airport (KPSP)',
 'Riverside Municipal Airport (KRAL)',
 'Red Bluff Municipal Airport (KRBL)',
 'Sacramento Executive Airport (KSAC)',
 'San Diego Lindbergh Field (KSAN)',
 'Santa Barbara Municipal Airport (KSBA)',
 'San Luis Obispo Airport (KSBP)',
 'Gillespie Field Airport (KSEE)',
 'San Francisco International Airport (KSFO)',
 'San Jose International Airport (KSJC)',
 'Santa Ana John Wayne Airport (KSNA)',
 'Desert Resorts Regional Airport (KTRM)',
 'Ukiah Municipal Airport (KUKI)',
 'Lancaster William J F

In the following cell we set the `stn_name` variable to the chosen station. The name must match the version that appears in the station list above.

In [4]:
stn_name = "Los Angeles International Airport (KLAX)"

Alternatively, we can choose a latitude and longitude. In this case, we do not set the `stn_name` variable.

In [5]:
# Uncomment (delete the # character) to set latitude and longitude.
# latitude = 37.9
# longitude = -122.06

#### Step 1b: Select time frame of interest
The second required input for generating a TMY dataset is the **time frame of interest**. The recommended minimum number of input years for a TMY dataset is 15-20 years worth of daily data; we will use 30 years to represent a standard climatological period. For data post-2014, the available scenario is SSP 3-7.0, although scenario selection in the near-future is relatively independent.

We will also process the data for our designated station location (latitude, and longitude) at 9 km over the <span style="color:#FF0000">1990-2020 period</span> as an example.

In [6]:
# selected reference period
start_year = 1990
end_year = 2020

### Step 2: Generate the TMY data outputs

Generally, the following data is outputted using the TMY months:
- Date & time (UTC)
- Air temperature at 2m [°C]
- Dew point temperature [°C]
- Relative humidity [%]
- Global horizontal irradiance [W/m2]
- Direct normal irradiance [W/m2]
- Diffuse horizontal irradiance [W/m2]
- Downwelling infrared radiation [W/m2]
- Wind speed at 10m [m/s]
- Wind direction at 10m [°]
- Surface air pressure [Pa]

We can use the TMY object to set up, run, and output TMY results to file. The first step is to initialize the object with your desired reference period (`start_year` and `end_year`) and location (`stn_name`). `True` is the default value for `verbose` but we have also set it explicitly for demonstration.

In [7]:
# Initialize TMY object
#tmy = TMY(start_year, end_year, station_name=stn_name, verbose=False)
tmy = TMY(gwl=1.0, station_name=stn_name, verbose=False)
# Or, using latitude and longitude (the nearest lat/lon cell on the data grid will be selected):
# tmy = TMY(start_year, end_year, latitude=latitude, longitude=longitude, verbose=True)

Initializing TMY object for Los Angeles International Airport (KLAX)


We can run the entire TMY workflow with a single command, as shown below. This will write 4 TMY files, one for each model.  
The runtime for this command can reach up to **30 minutes**. Because we set `verbose` to True, the TMY object will print updates as different parts of the workflow initialize.

In [9]:
%%time
tmy.generate_tmy()

Running TMY workflow. Expected overall runtime: 40 minutes
Loading data from catalog. Expected runtime: 7 minutes
  Getting air temperature... Data converted to America/Los_Angeles timezone.
  Getting dew point temperature... Data converted to America/Los_Angeles timezone.
  Getting wind speed... Data converted to America/Los_Angeles timezone.
  Getting global irradiance... Data converted to America/Los_Angeles timezone.
  Getting direct normal irradiance... Data converted to America/Los_Angeles timezone.
Assembling TMY data to export. Expected runtime: 30 minutes
  Getting Air Temperature at 2m... Data converted to America/Los_Angeles timezone.
  Getting Dew point temperature... Data converted to America/Los_Angeles timezone.
  Getting Relative humidity... Data converted to America/Los_Angeles timezone.
  Getting Instantaneous downwelling shortwave flux at bottom... Data converted to America/Los_Angeles timezone.
  Getting Shortwave surface downward direct normal irradiance... Data co

  0%|          | 0/12 [00:00<?, ?it/s]

Calculating TMY for simulation: WRF_MPI-ESM1-2-HR_r3i1p1f1_historical+ssp370


  0%|          | 0/12 [00:00<?, ?it/s]

Calculating TMY for simulation: WRF_TaiESM1_r1i1p1f1_historical+ssp370


  0%|          | 0/12 [00:00<?, ?it/s]

Calculating TMY for simulation: WRF_MIROC6_r1i1p1f1_historical+ssp370


  0%|          | 0/12 [00:00<?, ?it/s]

Exporting TMY to file.


KeyError: 'scenario'

Let's observe what the TMY data looks like for one of the simulations. The data we want is saved in the class attribute `tmy_data_to_export`.

In [None]:
simulation = "WRF_MPI-ESM1-2-HR_r3i1p1f1"
tmy.tmy_data_to_export[simulation].head(5)

Next, we visualize the TMY data itself. We show the results for the same simulation:

In [None]:
tmy.show_tmy_data_to_export(simulation)

### 3. Explore the TMY process

### 3a. Models
It is important to note that not all models in the Cal-Adapt: Analytics Engine have the solar variables critical for TMY file generation - in fact, only 4 do! The TMY class subsets our variables to ensure that the same 4 models are selected for consistency. The simulations are shown below:

In [None]:
tmy.simulations

### 3b. How top months are chosen

Now that we've run the TMY workflow, we can investigate the Cumulative Distribution Function results that were used to select top months.

In the plot below, we'll display maximum air temperature to assess the climatological CDF pattern, but you can modify the variable here to one of your choosing to see the pattern too! Also select a different month by moving the slider bar to see the pattern throughout the year. 

In [None]:
# Choose your desired variable
var = "Daily max air temperature"

# Make the plot
cdf_plot = plot_one_var_cdf(tmy.cdf_climatology, var)
display(cdf_plot)

Like the climatology CDF figure above, let's check out the individual months next. You can modify the variable, and month-year to display too. 

In [None]:
# Choose your desired variable
var = "Daily max air temperature"

# Make the plot
cdf_plot_mon_yr = plot_one_var_cdf(tmy.cdf_monthly, var)
display(cdf_plot_mon_yr)

If you want to inspect the top months without running the entire TMY workflow, you can! The TMY workflow can be run in a step-by-step manner.

We'll demonstrate this with a fresh TMY object that does not have data loaded.

In [None]:
start_year = 1990
end_year = 2020
latitude = 37.9
longitude = -122.06
tmy = TMY(start_year, end_year, latitude=latitude, longitude=longitude, verbose=False)

The method load_all_variables() will load the five variables needed for the TMY analysis into memory. The loaded data can be accessed via the class variable `all_vars`.

The following variables from the Analytics Engine catalog are used:  
* Air Temperature at 2m
* Wind speed at 10m  
* Dew point temperature  
* Instantaneous downwelling shortwave flux at bottom  
* Shortwave surface downward direct normal irradiance  

In [None]:
tmy.load_all_variables()  # Load model datasets
tmy.all_vars

The `set_cdf_climatology` method uses the loaded data to calculate the CDF for the baseline period climatology. The results can be accessed via the class variable `cdf_climatology`. See these links to the source code for <span style="color: blue;">[set_cdf_climatology](https://github.com/cal-adapt/climakitae/blob/improve/tmy-refactor/climakitae/explore/typical_meteorological_year.py#L641)</span> and <span style="color: blue;">[get_cdf](https://github.com/cal-adapt/climakitae/blob/improve/tmy-refactor/climakitae/explore/typical_meteorological_year.py#L90)</span>.

In [None]:
tmy.set_cdf_climatology()
tmy.cdf_climatology

After the climatology CDF is obtained, we calculate the monthly CDF for all months and simulations. This is done in the method `set_cdf_monthly`. The results can be accessed via the class variable `cdf_monthly`. See these links to the source code for <span style="color: blue;">[set_cdf_monthly](https://github.com/cal-adapt/climakitae/blob/improve/tmy-refactor/climakitae/explore/typical_meteorological_year.py#L651)</span> and <span style="color: blue;">[get_cdf_monthly](https://github.com/cal-adapt/climakitae/blob/improve/tmy-refactor/climakitae/explore/typical_meteorological_year.py#L105)</span>.

In [None]:
tmy.set_cdf_monthly()
tmy.cdf_monthly

When both the climatology CDF and monthly CDFs are available, we can get the weighted Finkelstein-Schafer (F-S) statistic in `set_weighted_statistic`. The results can be accessed via class variable `weighted_fs_sum`.  See this link to the source code for <span style="color: blue;">[set_weighted_statistic](https://github.com/cal-adapt/climakitae/blob/improve/tmy-refactor/climakitae/explore/typical_meteorological_year.py#L661)</span>.

In [None]:
tmy.set_weighted_statistic()
tmy.weighted_fs_sum

Finally, the weighted F-S statistic is used to choose the top months for each model to include in the TMY. This is done in method `set_top_months`. The resulting pandas dataframe table can be accessed via class variable `top_months`. See these links to the source code for <span style="color: blue;">[set_top_months](https://github.com/cal-adapt/climakitae/blob/improve/tmy-refactor/climakitae/explore/typical_meteorological_year.py#L673)</span> and <span style="color: blue;">[get_top_months](https://github.com/cal-adapt/climakitae/blob/improve/tmy-refactor/climakitae/explore/typical_meteorological_year.py#L222)</span>.

In [None]:
tmy.set_top_months()
tmy.top_months

A convenience function is provided to allow you to run the entire workflow for selecting top months without running the full TMY analysis. The call shown below will run every step between `load_all_variables` and `set_top_months` (uncomment to call).

In [None]:
# This method runs the whole workflow up to set_top_df()
# tmy.get_candidate_months()

Once the top months have been generated and examined, the analysis can be continued using the following function calls (uncomment to run):

In [None]:
# Run analysis
# tmy.run_tmy_analysis()

# Look at results
# tmy.tmy_data_to_export()

# Write to file
# tmy.export_tmy_data()

In [None]:
tmy.all_vars

In [None]:
data=tmy.all_vars

In [None]:
data.sel(time=data.time.dt.day.isin([1]))