# Exercise: Weather API

## Aim: Use a Weather API to create and graph NetCDF files

### Issues covered:

- Request and get data from a weather API service
- Read and retrieve information from a JSON response
- Write contents to a NetCDF file
- Read a collection of NetCDF files and plot a time series graph

## 1. Let's get data from a web API on the internet

We will use the NOAA National Weather Service in the US as our data source:

![](https://www.weather.gov/css/images/header.png)

The service has a web API that allows you to request forecast data for a given grid point in the USA. Details of the API are documented at:

https://www.weather.gov/documentation/services-web-api

Use the endpoint `https://api.weather.gov/` as the base URL.

Firstly, we want to get a station ID and based on some latitude/longitude coordinates. To do so we will use the `points/{latitude,longitude}` endpoint of the API.

**Choose the latitude and longitude of your favourite US location (this API is US only and in latitude North, longitude East). The extent of the USA is approximately:**
- Longitude: -120, -80
- Latitude:  30, 48

Once you have queried the `points` API you will get back a `station ID`. The `station ID`h can be used to get a weather forecast for your location of interest, using the `gridpoints/{station ID}/{grid co-ordinates}` endpoint.

Import the `requests` library which is great for downloading content from external URLs.

In [16]:
import requests

You can use the requests library to access the web API. Fill in the elipses with the `latitude` (degrees North) and `longitude` (degrees East, so use negative value) of a location in the US. 
If successful, the response code should be 200.

In [22]:
url = 'https://api.weather.gov/'
latitude = 40.7306
longitude = -73.9352

#Hint: use the requests library to GET from the url: https://api.weather.gov/points/{LAT},{LON}
response = requests.get(f'{url}points/{latitude},{longitude}')
response.status_code




200

With the requests library, the results from the webAPI can be extracted into in JSON format. A JSON document behaves exactly like a dictionary.

Use dictionary indexing to extract the values of the station ID and the X/Y coordinates:

- get `gridID`
- get `gridX`
- get `gridY`

In [23]:
# hint: you can view the JSON by pasting the URL directly into your browser address bar

response = response.json()

gridID = response['properties']['gridId']
gridX = response['properties']['gridX']
gridY = response['properties']['gridY']

With your `gridID`, `gridX`, and `gridY`, use the `gridpoints` API endpoint to request a weather forecast for that location. Print the status code.
If everything is working, you should get another 200 status code.

In [24]:
response = requests.get(f'{url}gridpoints/{gridID}/{gridX},{gridY}')
response.status_code

200

Can you use the JSON response data to get the forecast temperature values? Use dictionary indexing to get the `values` from `temperature` in `properties`.

In [26]:
data = response.json()
forecast = data['properties']['temperature']['values']


The below code extracts the coordinates of the station you have chosen.

In [27]:
coords = data['geometry']['coordinates'][0][0]
x = coords[1]
y = coords[0]

## 2. Let's format that data and write it to NetCDF

### Formatting the data

First, format your forecast data to get the datetime and air temperature as separate
lists.

In [28]:
from datetime import datetime as dt

Loop through your `forecast` values and get the temperatures (`value`) and datetimes (`validTime`) into a list.
`forecast` is a list of dictionaries, where each dictionary is of one time instance.
Fill in the ellipses to format each `validTime` string to a python `datetime` object and assign and set to the variable `date`. Get each `value` and assign to the variable `temp`. These values will then be appended to the `temps` and `timeseries` lists.

In [33]:
# Use the datetime module to convert the times from the data to a datetime object.
# Hint: look at the validTime string and see how you can turn the string to datetime
# using strptime, the format of the datetime is: '%Y-%m-%dT%H:%M:%Sz'.

timeseries = []
temps = []

for item in forecast:
    date = item['validTime']
    date = dt.strptime(date.split('/')[0],'%Y-%m-%dT%H:%M:%S%z')
    temp = item['value']
    timeseries.append(date)
    temps.append(temp)

Format the `timeseries` list and convert it to relative time in seconds from the start of the timeseries. When using NetCDF and the CF Metadata Conventions time is stored as an offset from a base time rather than an absolute times.

If you are stuck, take look at the 'Time series' slide in the [`logging data from serial ports`](https://github.com/ncasuk/ncas-isc/blob/master/python/presentations/logging-data-from-serial-ports/LDFSP_Slides.pdf) presentation.

In [38]:
base_time = timeseries[0]
time_values = []



for t in timeseries:
    value = t-base_time
    ts = value.total_seconds()
    time_values.append(ts)

time_units = "seconds since " + base_time.strftime('%Y-%m-%d %H:%M:%S')

print(time_values)
print(time_units)

[0.0, 7200.0, 10800.0, 18000.0, 21600.0, 28800.0, 36000.0, 39600.0, 43200.0, 46800.0, 50400.0, 54000.0, 57600.0, 61200.0, 68400.0, 75600.0, 82800.0, 90000.0, 104400.0, 108000.0, 111600.0, 115200.0, 118800.0, 122400.0, 126000.0, 133200.0, 136800.0, 140400.0, 144000.0, 151200.0, 158400.0, 162000.0, 169200.0, 176400.0, 180000.0, 187200.0, 190800.0, 194400.0, 198000.0, 201600.0, 205200.0, 208800.0, 212400.0, 223200.0, 230400.0, 244800.0, 252000.0, 255600.0, 262800.0, 266400.0, 277200.0, 280800.0, 284400.0, 288000.0, 291600.0, 295200.0, 306000.0, 309600.0, 313200.0, 316800.0, 320400.0, 327600.0, 331200.0, 338400.0, 342000.0, 360000.0, 363600.0, 367200.0, 370800.0, 374400.0, 378000.0, 381600.0, 399600.0, 403200.0, 406800.0, 410400.0, 414000.0, 428400.0, 435600.0, 439200.0, 450000.0, 453600.0, 457200.0, 460800.0, 464400.0, 468000.0, 478800.0, 482400.0, 486000.0, 489600.0, 493200.0, 500400.0, 507600.0, 518400.0, 525600.0, 536400.0, 540000.0, 543600.0, 547200.0, 550800.0, 554400.0, 565200.0, 56

Convert the `temps` list from degrees C to Kelvin. As per the CF Conventions, the canonical units for Air Temperature is K. Create a new list, called `temp_values`, which is the temperature in Kelvin.

In [40]:
temp_values = []

for temp in temps:
    t = temp + 273.15
    temp_values.append(t)
    
print(temp_values)
    

[280.92777777777775, 280.3722222222222, 280.92777777777775, 282.59444444444443, 281.4833333333333, 280.92777777777775, 280.3722222222222, 279.81666666666666, 279.26111111111106, 278.7055555555555, 278.15, 277.0388888888889, 276.4833333333333, 275.3722222222222, 274.81666666666666, 274.26111111111106, 273.7055555555555, 273.15, 274.26111111111106, 274.81666666666666, 275.92777777777775, 276.4833333333333, 277.59444444444443, 278.15, 278.7055555555555, 278.15, 277.59444444444443, 276.4833333333333, 275.3722222222222, 274.81666666666666, 274.26111111111106, 273.7055555555555, 273.15, 273.7055555555555, 273.15, 274.26111111111106, 274.81666666666666, 275.92777777777775, 276.4833333333333, 277.0388888888889, 277.59444444444443, 278.15, 278.7055555555555, 278.15, 277.59444444444443, 277.0388888888889, 276.4833333333333, 275.92777777777775, 275.3722222222222, 274.81666666666666, 275.3722222222222, 275.92777777777775, 276.4833333333333, 277.59444444444443, 278.15, 278.7055555555555, 278.15, 27

### Create a netCDF4 Dataset and write the contents to a file

Import the `Dataset` class from the `netCDF4` library. You can go on to create an *instance* of this class which will contain:
- variables
- coordinate variables
- dimensions
- global attributes

When you create the instance of `Dataset`, you will give it a file name which will be written to when you close the `Dataset`.

Also import `numpy` as `np`. This will be used to construct the data arrays from the existing lists that currently hold the weather data and coordinate information.


In [42]:
from netCDF4 import Dataset
import numpy as np

#### Quick aside, let's make sure we have a `DATA_DIR` to write to

Since this is a group exercise, everyone should be writing to the same output directory. Let's set some python variables that can be used below:
1. `USER` - used in the output file names to ensure every NetCDF file is unique.
2. `HOME_DIR` - your `$HOME` directory
2. `MY_DATA_DIR` - the directory where you will write your NetCDF file.
3. `GROUP_DATA_DIR` - the directory where all the NetCDF files will eventually be collected/available.

Since `GROUP_DATA_DIR` is not writeable directly from the Notebook Service, we have set up a job to replicate files from `MY_DATA_DIR` to `GROUP_DATA_DIR` (which runs once per minute).

In [44]:
import os
USER = os.environ["JUPYTERHUB_USER"]

HOME_DIR = f"/home/users/{USER}"
MY_DATA_DIR = os.path.join(HOME_DIR, "weather-api-outputs")

# Create MY_DATA_DIR if it doesn't exist
if not os.path.isdir(MY_DATA_DIR):
    os.mkdir(MY_DATA_DIR)

# All NetCDF will be automatically copied here (once per minute)
GROUP_DATA_DIR = "/gws/pw/j05/workshop/weather-api-data"

# The output file will initially be written to your HOME_DIR (then you will move
# it when complete)
filename = f"{gridID}-{USER}-temps.nc"
outfile = f"{HOME_DIR}/{filename}"

#### Back to our NetCDF file

Create the output file, as a `netCDF4 Dataset` instance, using the `outfile` defined above.

If you need help, have a look at the 'Create the NetCDF dimensions & variables' slide in the [`logging data from serial ports`](https://github.com/ncasuk/ncas-isc/blob/master/python/presentations/logging-data-from-serial-ports/LDFSP_Slides.pdf) presentation.

In [None]:
dataset = ...

#use link below for proper solutions
#https://github.com/ncasuk/ncas-isc/blob/master/python/notebooks/solutions/ex16_weather_api_solutions.ipynb

#### Start by defining some dimensions

Create NetCDF *dimensions*:
- `time_dim`: *unlimited* length
- `lat_dim`: length 1
- `lon_dim`: length 1

In [6]:
time_dim = ...
lat_dim = ...
lon_dim = ...

#### Now define the coordinate variables and then temperature variable

Create the `time` *variable* with the following properties:
- type: numpy float (`np.float64`)
- variable id: `time`
- dimensions: (`time`,)
- set the array using the `time_values` list
- `units`: `time_units` defined earlier
- `standard_name`: `time`
- `calendar`: `standard`

In [None]:
time_var = ...
time_var[:] = ...
time_var.units = ...
time_var.standard_name = ...
time_var.calendar = ...

Create the `lat` *variable* with the following properties:
- type: numpy float (`np.float64`)
- variable id: `lat`
- dimensions: (`lat`,)
- set the array of length 1 using the `gridY` value
- `units`: `degrees_north`
- `standard_name`: `latitude`

In [None]:
lat_var = ...
lat_var[:] = ...
lat_var.units = ...
lat_var.standard_name = ...

Create the `lon` *variable* with the following properties:
- type: numpy float (`np.float64`)
- variable id: `lon`
- dimensions: (`lon`,)
- set the array of length 1 using the `gridX` value
- `units`: `degrees_east`
- `standard_name`: `longitude`

In [None]:
lon_var = ...
lon_var[:] = ...
lon_var.units = ...
lon_var.standard_name = ...

Create the `temp` *variable* with the following properties:
- type: numpy float (`np.float64`)
- variable id: `temp`
- dimensions: (`time`,)
- set the array using the `temp_values` list
- `long_name`: `air temperature (K)`
- `units`: `K`
- `standard_name`: `air_temperature`
- `coordinates`: `lon lat` - to relate the longitude and latitude to this variable

In [None]:
temp_var = ...
temp_var[:] = ...
temp_var.var_id = ...
temp_var.long_name = ...
temp_var.units = ...
temp_var.standard_name = ...
temp_var.coordinates = ...

#### Add some global attributes

The [CF Metadata Conventions](https://cfconventions.org/cf-conventions/cf-conventions.html#_overview) recommends a set of global attributes to "provide human readable documentation of the file contents":
- title
- history
- institution
- source
- references
- comment

Add each of the above to your `Dataset` instance. Here are some suggested values (but you can say whatever you like):
- title: Air Temperature forecasts for `<gridID>`
- history: File created on: `<YYYY-MM-DD>`
- institution: NCAS-ISC
- source: NOAA Weather API Service
- references: https://www.weather.gov/documentation/services-web-api
- comment: The ISC course is teaching me about Python and NetCDF!

You can add any other global attributes that you wish to.

In [None]:
dataset.title = ...
dataset.history = ...
dataset.institution = ...
dataset.source = ...
dataset.references = ...
dataset.comment = ...

#### Finally, close the `Dataset` to save the file

Save your NetCDF file by closing the dataset.

In [None]:
dataset.close()

We can check it is there using `os.path.isfile(...)`:

In [None]:
os.path.isfile(outfile)

### IMPORTANT: Move the file to your MY_DATA_DIR so it gets copied to the GROUP_DATA_DIR

Since we cannot write directly to the `GROUP_DATA_DIR`, move the file from your `HOME_DIR` to your `MY_DATA_DIR`.

In [None]:
os.rename(outfile, f"{MY_DATA_DIR}/{filename}")

## 3. Find all the NetCDF files written during this exercise

To find all the `.nc` files in a group workspace, we will use the glob module in Python.
Glob let's us find all files matching a pattern, in our case:

`{GROUP_DATA_DIR}/*.nc`

In [None]:
from glob import glob

Can you use glob to make a list of file paths of all NetCDF files in the
group workspace?

In [None]:
filepaths = glob(f"{...}*temps.nc")

## 4. Create a time-series graph of all the forecasts

Now that we have a list of NetCDF file paths, we can open them and extract their data.

To start, let us make the a plot using matplotlib.

In [None]:
from netCDF4 import num2date
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
%matplotlib inline

Create a subplots figure with figure and axis

In [None]:
fig, ax = ...

Can you set the x-axis locator (ticks) using dates class from matplotlib?
- set the major locator to days.
- set the minor locator to every 6 hours.
- set the x-axis formatter to Day-Month for each day.

In [None]:
# In the matplotlib.dates module, as mdates, look at the DayLocator and HourLocator.
fmt_day = ...
fmt_six_hours = ...

ax.xaxis.set_major_locator(fmt_day)
ax.xaxis.set_minor_locator(fmt_six_hours)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m'))

Label the axis, `ax`, on the plot:
- label the x-axis as `Date`
- label the y-axis as `Air Temperature / K`
- set a title on your plot

In [None]:
...

Open each NetCDF file and extract the `temp`, `time`, `lat` and `lon` variables from the file. Then use the matplotlib `plot_date` function to plot the graph.

- set the label of plot to the `<lat>, <lon>` coordinates attribute of the `temp` variable.

Replace the elipses with your plotting, the `for` loop works through all the shared NetCDF files in the workspace, where `f` is the file path and `filepaths` is a list of data files.

If you need help, look at the 'Plotting data with matplotlib' slide in the [`logging data from serial ports`](https://github.com/ncasuk/ncas-isc/blob/master/python/presentations/logging-data-from-serial-ports/LDFSP_Slides.pdf) presentation.

Plot a line graph using matplotlib: 

- you will need to set the marker to `-` otherwise you will get a scatter graph.
- set the label of the plot to a string: `<lat>, <lon>`.

In [None]:
for f in filepaths:
    ...

Finally, show the plot with a legend, you might want to enable tight layout,
and save the plot to your `MY_DATA_DIR` directory.

In [None]:
...

### Save the graph to a PNG file

In [None]:
fig.savefig(f"{MY_DATA_DIR}/{gridID}-{USER}-temps.png")