## Extracting Monthly Temperature Data from Array

### Author: Ryan Gan
### Date: 2018-05-28

Extracting gridded temperature values to North American region. First attempt with Python.

In [1]:
# import dataset from netCDF as nc_open; array storage system
import netCDF4 as nc
from netCDF4 import Dataset as nc_open
# import numpy as np; for working with array data
import numpy as np
# import pandas as pd; working with data.frames
import pandas as pd
# Matplotlib for additional customization
from matplotlib import pyplot as plt 
%matplotlib inline
#import mpl_toolkits # i'd like basemap but doesn't seem to be available for py3
# Seaborn for plotting and styling
import seaborn as sns

Open connection to monthly temperature netCDF file.

In [2]:
temp_nc = nc_open("./data/air.mon.mean.nc")

IOError: No such file or directory

Print out summary of temperature NetCDF file. It looks like x/lon = 144, y/lat = 73, and 843 months.

In [None]:
print(temp_nc)
# print out details of each variable
for v in temp_nc.variables:
    print(temp_nc.variables[v])

### Extracting Grid Locations of Temperature Grid

I'm going to see if I can output the grid locations and plot it.

In [None]:
# extract latitude
lat = temp_nc.variables['lat'][:]
# extract longitude
lon = temp_nc.variables['lon'][:]
# extract time
time = temp_nc.variables['time'][:]
# 1st temperature values
temp = temp_nc.variables['air'][1,:,:]
# check dimension of shape
np.shape(temp)

Print out min, mean, max of temperature vector.

In [None]:
print(np.min(temp), np.mean(temp), np.max(temp))

Using Seaborn to print out a heatmap the first matrix of temperature. I notice the outline of the continents based on the temperature. You can see the Rocky mountains and the Andes mountains running down the left side of the Americas. The Poles are much colder. The Australian outback looks hot too.

In [None]:
sns.heatmap(temp)

I want to subset the array to the spatial extent of the continental United States.  The longitude bounds should be  -124.848974 to -66.885444. I will need to add 180 degrees to these degrees west to match the convert from degrees east of the nc file. The latitude bounds should be 24.396308 to 49.384358.

In [None]:
# lat and lon bounds
latbounds = [24.4, 49.4]
# lon bounds need to be converted from easterling to westerling to get US
lonbounds = [-124.8 + 360, -66.9 + 360]

# latitude lower and upper bounds
lat_ui = np.argmin(np.abs(lat - latbounds[0]))
lat_li = np.argmin(np.abs(lat - latbounds[1]))

# longitude lower and upper boudns
lon_li = np.argmin(np.abs(lon - lonbounds[0]))
lon_ui = np.argmin(np.abs(lon - lonbounds[1]))
# print index
print(lat_li, lat_ui, lon_li, lon_ui)

Subset latitude and longitude grid points.

In [None]:
# subset us lon
lon_us = lon[lon_li:lon_ui]-360
# subset us lat
lat_us = lat[lat_li:lat_ui]
# print dimensions
print(np.min(lon_us), np.max(lon_us), np.min(lat_us), np.max(lat_us))

Print points over US.

In [None]:
# extract lonlat grid
lons, lats = np.meshgrid(lon_us, lat_us)
# plot
plt.plot(lons, lats, marker='.', color='k', linestyle='none')
plt.show()

In [None]:
# subset temp to us
temp_us = temp_nc.variables['air'][1, lat_li:lat_ui, lon_li:lon_ui]
np.shape(temp_us)

Plot heatmap of US temperature to make sure subset looks right.

In [None]:
sns.heatmap(temp_us)

Attempt to extract lat-lon grid.

In [None]:
# coordinates to a 2 dimension array
us_grid = np.array(np.meshgrid(lon_us, lat_us)).reshape(2, 230).T
# check dimensions
np.shape(us_grid)

In [None]:
# extract temp grid as a 230 by n matrix
us_temp = np.array(temp_nc.variables['air']
    [:, lat_li:lat_ui, lon_li:lon_ui]).reshape(843,230).T #.T is for transpose
np.shape(us_temp)

Creating data frame of us temp and coordinates using pandas. I'm going to create a sequential numeric vector to indicate grid cell id.

In [None]:
# create grid id 1 to 230 and save as type string
grid_id = pd.DataFrame(data = np.arange(1, 231, 1).T).astype('str')
grid_id.columns = ['grid_id'] 
# head and tail
print(grid_id.head(), grid_id.tail())

Binding/concatenating grid id variable with lon and lat coordinates.

In [None]:
# create grid dataframe from coordinate array
grid_df = pd.DataFrame(data = us_grid)
# name columns
grid_df.columns = ['lon', 'lat']
# concat dataframes
grid_df = pd.concat([grid_id, grid_df], axis=1)
# view first rows
grid_df.head()
# write grid coords
#grid_df.to_csv('./data/temp_grid.csv')

Assigning year and month as column header for temperature values.

In [None]:
from datetime import datetime
# extract date times from nc file; define units
date = nc.num2date(time, 'hours since 1800-01-01 00:00:0.0')
# ts
ts = pd.Series(date, index = date)
# view first couple observations
ts.head()

Convert temperature array to dataframe.

In [None]:
# array to dataframe
temp_df = pd.DataFrame(data = us_temp)
# dim of temp dataframe
print(temp_df.shape)
# add name
temp_df.columns = ts.dt.date
temp_df.head()

Bind temperature values to coordinate and grid id and then melt/gather columns to row.

In [None]:
# concat grid id and temp
temp_wide_df = pd.concat([grid_df, temp_df], axis = 1)
temp_wide_df.head()

In [None]:
# wide to long
temp_long_df = pd.melt(temp_wide_df, id_vars = ['grid_id', 'lon', 'lat'],
                      var_name = "date", value_name = "temp_c")
# view head of final row-wise dataset
temp_long_df.head()

In [None]:
# describe dataframe
temp_long_df.describe()

In [None]:
# print min and max date
print(temp_long_df['date'].min(), temp_long_df['date'].max())