# Geospatial Data Science - EEPS 440/460

# Lecture 4

# Visualizing Data

![image.png](https://matplotlib.org/_static/logo2.png)

The following Matplotlib lecture is adapted from the Scipy lecture notes (https://scipy-lectures.org/) and the official Matplolib tutorials (https://matplotlib.org/tutorials). For more information, I encourage you to check out these resources.

Matplotlib is the primary tool for exploratory and publication-quality graphs in Python. Becoming experts in Numpy + Matplotlib will take you far.

## Getting started

When running matplotlib in a Jupyter notebook, we use cell magic to tell matplotlib to show the created figures within the notebook. 

In [1]:
%matplotlib inline

Within `matplotlib`, `pyplot` is the module that is most commonly used to make graphs. It has a `Matlab`-feel to it.

In [2]:
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')
import numpy as np

## Plotting with default settings

In [None]:
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y1 = np.cos(x)
y2 = np.sin(x)
plt.plot(x,y1)
plt.plot(x,y2)
plt.show()

## Changing colors, line widths, and line styles

In [None]:
plt.plot(x, y1, color="blue", linewidth=2.5, linestyle="-")
plt.plot(x, y2, color="green",  linewidth=5, linestyle="--")
plt.show()

## Setting limits

In [None]:
plt.plot(x, y1, color="blue", linewidth=2.5, linestyle="-")
plt.plot(x, y2, color="red",  linewidth=2.5, linestyle="--")
plt.xlim([-10,10])
plt.ylim([-2,2])
plt.show()

## Setting ticks

In [None]:
plt.plot(x, y1, color="blue", linewidth=2.5, linestyle="-")
plt.plot(x, y2, color="red",  linewidth=2.5, linestyle="--")
plt.xlim([-5,5]);plt.ylim([-2,2])
plt.xticks([-np.pi,0,np.pi])
plt.yticks([-1,-0.5,0,0.5,1])
plt.show()

## Setting tick labels

In [None]:
plt.plot(x, y1, color="blue", linewidth=2.5, linestyle="-")
plt.plot(x, y2, color="red",  linewidth=2.5, linestyle="--")
plt.xlim([-5,5]);plt.ylim([-1.5,1.5])
plt.xticks([-np.pi,0,np.pi],[r'$-\pi$',0,r'$\pi$'],fontsize=20) #Latex inside of pyplot
plt.yticks([-1,-0.5,0,0.5,1],fontsize=15)
plt.show()

## Moving spines

In [None]:
plt.plot(x, y1, color="blue", linewidth=2.5, linestyle="-")
plt.plot(x, y2, color="red",  linewidth=2.5, linestyle="--")
plt.xlim([-5,5]);plt.ylim([-1.5,1.5])
plt.xticks([-np.pi,0,np.pi],[r'-$\pi$',0,r'$\pi$'],fontsize=20) #Latex inside of pyplot
plt.yticks([-1,-0.5,0,0.5,1],fontsize=15)
ax = plt.gca()  # gca stands for 'get current axis'
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.spines['bottom'].set_position(('data',0))
ax.spines['left'].set_position(('data',0))
plt.show()

## Adding a legend

In [None]:
plt.xlim([-5,5]);plt.ylim([-1.5,1.5])
plt.xticks([-np.pi,0,np.pi],[r'-$\pi$',0,r'$\pi$'],fontsize=20) #Latex inside of pyplot
plt.yticks([-1,-0.5,0,0.5,1],fontsize=15)
ax = plt.gca()  # gca stands for 'get current axis'
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.spines['bottom'].set_position(('data',0))
ax.spines['left'].set_position(('data',0))
plt.plot(x, y1, color="blue", linewidth=2.5, linestyle="-")
plt.plot(x, y2, color="red",  linewidth=2.5, linestyle="--")
plt.legend([r'$\cos$','sin'],fontsize=10)
plt.show()

## Saving plot

In [None]:
plt.plot(x, y1, color="blue", linewidth=2.5, linestyle="-")
plt.plot(x, y2, color="red",  linewidth=2.5, linestyle="--")
plt.xlim([-5,5]);plt.ylim([-1.5,1.5])
plt.xticks([-np.pi,0,np.pi],[r'-$\pi$',0,r'$\pi$'],fontsize=20) #Latex inside of pyplot
plt.yticks([-1,-0.5,0,0.5,1],fontsize=15)
ax = plt.gca()  # gca stands for 'get current axis'
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.spines['bottom'].set_position(('data',0))
ax.spines['left'].set_position(('data',0))
plt.legend([r'$\cos(x)$',r'$\sin(x)$'],fontsize=15)
plt.savefig('../Workspace/test.png',transparent=True)
plt.clf()

## Figures, Subplots, Axes and Ticks

* Up to this point, we have let pyplot create a default figure and associated axes. This is the fast way of exploring data but becomes limiting when we want to make presentation or publication quality figures. 

* Now we will explore how to more directly control the figure, subplots, and axes directly. 

* Subplot position the plots in a regular grid while axes allows any placement within the figure. Subplots are safer but axes give you more freedom (and its associated responsility...).

## Figures (there are many more args...)

In [None]:
fig = plt.figure(figsize=(10,2)) #Note that there are many arguments that you can pass here
plt.plot([3,4,5,])
plt.show()

In [None]:
fig = plt.figure(figsize=(3,3)) #Note that there are many arguments that you can pass here
plt.plot([3,4,5,])
plt.show()

## Subplots

With subplots you can arrange plots in a rectangular grid. You have to specify the rows and columns that are taken up by each plot. 

In [None]:
plt.figure(figsize=(8,8))
plt.subplot(2,2,1)
plt.plot(np.random.randn(10))
plt.subplot(2,2,2)
plt.plot(np.random.randn(10))
plt.subplot(2,2,3)
plt.plot(np.random.randn(10))
plt.subplot(2,2,4)
plt.plot(np.random.randn(10))
plt.show()

## Advanced subplots: Gridspec

The gridspec module within matplotlib gives you a lot more control on the subplots. 

In [14]:
import matplotlib.gridspec as gridspec

In [None]:
#Create the figure
plt.figure(figsize=(10,5))
#Define the grid spect
G = gridspec.GridSpec(2,3)
#Create the plots for each different section of the figure
plt.subplot(G[:,0])
plt.plot(np.random.randn(10))
plt.subplot(G[0,1:])
plt.plot(np.random.randn(10))
plt.subplot(G[1,1])
plt.plot(np.random.randn(10)) 
plt.subplot(G[1,2])
plt.plot(np.random.randn(10))
#Show the plot
plt.show()

## Axes

Axes are very similar to subplots but allow placement of plots at any location.

**With great power comes great responsibility!**

In [None]:
#Define the first axis
plt.axes([0, 0, 1, 1])
plt.plot(np.linspace(0,1,10)/2)
#Define the second axis
plt.axes([0.07, 0.55, .4, .4])
#Create a histogram from a randomly sampled distribution
np.random.seed(1)
plt.hist(np.random.randn(10000),density=1,bins=100)
plt.show()

Let's show what is possible now. The following examples are taken "as is" from http://scipy-lectures.org/intro/matplotlib/index.html.

## Plot and filled plots

In [None]:
n = 256
X = np.linspace(-np.pi, np.pi, n, endpoint=True)
Y = np.sin(2 * X)
plt.axes([0.025, 0.025, 0.95, 0.95])
plt.plot(X, Y + 1, color='blue', alpha=1.00)
plt.fill_between(X, 1, Y + 1, color='blue', alpha=.25)
plt.plot(X, Y - 1, color='blue', alpha=1.00)
plt.fill_between(X, -1, Y - 1, (Y - 1) > -1, color='blue', alpha=.25)
plt.fill_between(X, -1, Y - 1, (Y - 1) < -1, color='red',  alpha=.25)
plt.xlim(-np.pi, np.pi)
plt.xticks(())
plt.ylim(-2.5, 2.5)
plt.yticks(())   
plt.show()

## Scatter plot

In [None]:
n = 1024
X = np.random.normal(0, 1, n)
Y = np.random.normal(0, 1, n)
T = np.arctan2(Y, X)
plt.axes([0.025, 0.025, 0.95, 0.95])
plt.scatter(X, Y, s=75, c=T, alpha=0.1)
plt.xlim(-1.5, 1.5)
plt.xticks(())
plt.ylim(-1.5, 1.5)
plt.yticks(())
plt.show()

## Bar Plots

In [None]:
n = 12
X = np.arange(n)
Y1 = (1 - X / float(n)) * np.random.uniform(0.5, 1.0, n)
Y2 = (1 - X / float(n)) * np.random.uniform(0.5, 1.0, n)
plt.bar(X, +Y1, facecolor='#9999ff', edgecolor='white')
plt.bar(X, -Y2, facecolor='#ff9999', edgecolor='white')
for x, y in zip(X, Y1):
    plt.text(x + 0.4, y + 0.05, '%.2f' % y, ha='center', va='bottom')
plt.ylim(-1.25, +1.25)

## Contour plot

In [None]:
def f(x, y):
    return (1 - x / 2 + x ** 5 + y ** 3) * np.exp(-x ** 2 -y ** 2)

n = 256
x = np.linspace(-3, 3, n)
y = np.linspace(-3, 3, n)
X, Y = np.meshgrid(x, y)
plt.contourf(X, Y, f(X, Y), 8, alpha=.75, cmap='jet')
C = plt.contour(X, Y, f(X, Y), 8, colors='black')

## Imshow: Plotting a 2d array

In [None]:
def f(x, y):
    return (1 - x / 2 + x ** 5 + y ** 3 ) * np.exp(-x ** 2 - y ** 2)

n = 10
x = np.linspace(-3, 3, 4 * n)
y = np.linspace(-3, 3, 3 * n)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.axes([0.025, 0.025, 0.95, 0.95])
plt.imshow(Z, interpolation='nearest', cmap=plt.get_cmap('terrain'), origin='lower')
plt.colorbar(shrink=.92)
plt.xticks(())
plt.yticks(())
plt.show()

## Pie charts

In [None]:
n = 20
Z = np.ones(n)
Z[-1] *= 2
plt.axes([0.025, 0.025, 0.95, 0.95])
plt.pie(Z, explode=Z*.05, colors = ['%f' % (i/float(n)) for i in range(n)])
plt.axis('equal')
plt.xticks(())
plt.yticks()
plt.show()

## Histogram

In [None]:
np.random.seed(1)
x = np.random.randn(10000)
plt.hist(x, bins=50, density=1, facecolor='r', alpha=0.5)
plt.xlabel('Values',fontsize=15)
plt.ylabel('Probability',fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.title('Standard normal distribution',fontsize=20)
plt.text(1,0.35, r'$\mu=0,\ \sigma=1$',fontsize=20)
plt.show()

## Environmental data

Let's now explore some plots with actual environmental data. We will use the ERA-Interim data that we used last lecture.

## Comparing 2015 temperature at different sites

Start off by defining the coordinates for three different sites

In [24]:
sites = {'seattle':{'lat':47.6062,'lon':-122.3321},
         'paris':{'lat':48.8566,'lon':2.3522},
         'santiago':{'lat':-33.4489,'lon':-70.6693}}

Let's now read in the metadata from the ERA-Interim file

In [None]:
import netCDF4 as nc
file = '../data/era-interim/era_interim_monthly_197901_201512_upscaled.nc'
fp = nc.Dataset(file)
lats = fp['lat'][:]
lons = fp['lon'][:]
times = fp['time']
plt.pcolormesh(fp['t2m'][0,:,:])
plt.show()

Let's plot the lat and lon arrays:

In [None]:
plt.figure(figsize=(10,3))
plt.subplot(121)
plt.title('latitudes',fontsize=15)
plt.plot(lats,lw=3)
plt.subplot(122)
plt.title('longitudes',fontsize=15)
plt.plot(lons,lw=3)
plt.show()

Let's now compute the ilat/ilon for each of these sites and add it to the dictionary

In [None]:
#Look for the closest grid cell center for each site
for site in sites:
    lat = sites[site]['lat']
    lon = sites[site]['lon']
    if lon < 0:lon = 360 + lon
    sites[site]['ilat'] = int(np.argmin(np.abs(lats - lat)))
    sites[site]['ilon'] = int(np.argmin(np.abs(lons - lon)))
print(sites)

Read in the times as a datetime array

In [None]:
import datetime
#Convert the dates to a datetime array
dates = nc.num2date(times[:],units=times.units,calendar=times.calendar,only_use_cftime_datetimes=False)
print(dates)

Now we can extract the data for the variable `t2m`.

In [None]:
var = 't2m'
#Iterate through all the sites
for site in sites:
    #Extract the data for the given site
    ilat = sites[site]['ilat']
    ilon = sites[site]['ilon']
    sites[site][var] = fp[var][:,ilat,ilon]
print(sites)

Let's make a plot only for 2015

In [None]:
#Calculate the Boolean mask for dates within 2015
m = (dates >= datetime.datetime(2015,1,1)) & (dates <= datetime.datetime(2015,12,31))
#Assemble the subsetted dates array
dates_subset = dates[m]

plt.figure(figsize=(10,7))
plt.plot(dates_subset,sites['seattle']['t2m'][m],lw=7)
plt.plot(dates_subset,sites['paris']['t2m'][m],lw=7)
plt.plot(dates_subset,sites['santiago']['t2m'][m],lw=7)
plt.xticks(fontsize=20,rotation=35)
plt.yticks(fontsize=20,rotation=35)
plt.xlabel('date',fontsize=30)
plt.ylabel(r'$^o$C',fontsize=30)
plt.grid('on')
plt.legend(['Seattle','Paris','Santiago'],fontsize=25)
plt.show()

Let's create some scatter plots to compare the three sites

In [None]:
fig = plt.figure(figsize=(10,10))
i = 0
for site1 in sites:
    for site2 in sites:
        i = i + 1
        plt.subplot(3,3,i)
        plt.plot(sites[site1]['t2m'],sites[site2]['t2m'],'bo',alpha=0.2)
        plt.plot(sites[site1]['t2m'],sites[site1]['t2m'],'r',lw=3)
        plt.xticks(fontsize=15)
        plt.yticks(fontsize=15)
        plt.title('%s (x) vs %s (y)' % (site1,site2),fontsize=15)
        if i in [7,8,9]:plt.xlabel(r'$^o$C',fontsize=20)
        if i in [1,4,7]:plt.ylabel(r'$^o$C',fontsize=20)
fig.tight_layout()
plt.show()

You could also use hexbins to show the same thing but prettier!

In [None]:
fig = plt.figure(figsize=(10,10))
i = 0
for site1 in sites:
    for site2 in sites:
        i = i + 1
        plt.subplot(3,3,i)
        plt.hexbin(sites[site1]['t2m'],sites[site2]['t2m'],gridsize=10,cmap=plt.get_cmap('binary'))
        plt.plot(sites[site1]['t2m'],sites[site1]['t2m'],'r',lw=3)
        plt.xticks(fontsize=15)
        plt.yticks(fontsize=15)
        plt.title('%s (x) vs %s (y)' % (site1,site2),fontsize=15)
        if i in [7,8,9]:plt.xlabel(r'$^o$C',fontsize=20)
        if i in [1,4,7]:plt.ylabel(r'$^o$C',fontsize=20)
fig.tight_layout()
plt.show()

# Let's look at making maps now

We will start with the example from last lecture

In [33]:
data = fp['t2m'][:]

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(np.flipud(data[-1,:,:]),cmap=plt.get_cmap('RdBu_r'))
plt.axis('off')
plt.title('ERA interim (%02d/%04d)' % (dates[-1].month,dates[-1].year),fontsize=20)
cb = plt.colorbar(orientation='horizontal',shrink=0.8,pad=0.05)
cb.ax.tick_params(labelsize=15)
cb.set_label('Monthly temperature ($^o$C)',fontsize=20)
plt.show()

Let's now show a map for the annual average from 1979-2015 of each month.

In [None]:
fig = plt.figure(figsize=(10,10))
for month in range(1,13):
    plt.subplot(4,3,month)
    #Extract the data a given month for all years
    tmp = data[month-1::12,:,:]
    #Calculate the temporal mean of that data
    tmp = np.mean(tmp,axis=0)
    im = plt.imshow(np.flipud(tmp),cmap=plt.get_cmap('RdBu_r'))
    plt.axis('off')
    plt.title('Month: %d' % month,fontsize=20)
#Let's place a colorbar
cb_ax = fig.add_axes([0.92, 0.2, 0.03, 0.6])
cb = fig.colorbar(im, cax=cb_ax)
cb.ax.tick_params(labelsize=15)
plt.show()

We could do contours instead of showing every pixel value as well

In [None]:
plt.figure(figsize=(10,10))
tmp = np.mean(data,axis=0)
plt.contourf(tmp,levels=100,cmap=plt.get_cmap('RdBu_r'))
plt.axis('off')
plt.title('Annual average',fontsize=20)
cb = plt.colorbar(orientation='horizontal',shrink=0.8,pad=0.05)
cb.ax.tick_params(labelsize=15)
cb.set_label('Monthly temperature ($^o$C)',fontsize=20)
plt.show()

## Colormaps

We can quickly change the colormap that we use when we create a plot. The following list of colorbars come from [here](https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html). If you are feeling adventerous you can create more yourself.

![image](https://matplotlib.org/3.1.0/_images/sphx_glr_colormaps_002.png)
![image](https://matplotlib.org/3.1.0/_images/sphx_glr_colormaps_003.png)
![image](https://matplotlib.org/3.1.0/_images/sphx_glr_colormaps_004.png)
![image](https://matplotlib.org/3.1.0/_images/sphx_glr_colormaps_001.png)

Let's explore some colormap examples

In [None]:
#Adding _r at the end, flips the colormap
cmaps = ['RdBu_r','RdPu_r','Greens','binary_r','terrain','viridis']
for cmap in cmaps:
 plt.figure(figsize=(10,10))
 tmp = np.mean(data,axis=0)
 #Pcolormesh is slower but gives you more control
 #We will prefer pcolormesh when we start projecting the data
 plt.pcolormesh(tmp,cmap=plt.get_cmap(cmap))
 plt.axis('off')
 plt.title(cmap,fontsize=25)
 cb = plt.colorbar(orientation='horizontal',shrink=0.8,pad=0.05)
 cb.ax.tick_params(labelsize=15)
 cb.set_label('Monthly temperature ($^o$C)',fontsize=20)
 plt.show()

* Think of what you want to show and then figure out how to do it. 
* Google, tutorials, and these Jupyter Notebooks are your friends.
* The only way to really learn this is to make plot after plot after plot...
* You will find all things matplotlib [here](https://matplotlib.org).