# Lesson 2
## Gridded Data

### The tools of the trade

Some of the most common tools used in modifying numerical data in grids (or arrays) are NumPy and Pandas. NumPy has many extensions that are helpful in the weather domain such as SciPy.

### NumPy 

If you're a visual learner, there is a wonderful tutorial here: [NumPy Illustrated](https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d). 1-D arrays are simple enough to understand, and even 2-D arrays make a lot of sense to anyone who has spent a decent amount of time dealing with images or maps of some kind. Weather data can start to make things complex, however, by adding 3rd, 4th, 5th and sometimes even more dimensions to our data. NumPy (when used properly) can help alleviate us from having to think too hard about how to apply equations to these datasets, but you must understand how to use it properly.

In this section we'll explore some basic NumPy functionality and use Pandas to visualize what we're doing.

In [None]:
# Import libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# make a simple 1-d array of data

one_d_array = np.array((0, 1, 2, 3, 4, 5))

In [None]:
one_d_array

In [None]:
pd.DataFrame(one_d_array)

In [None]:
# Now we'll render a raster of the data with matplotlib, but in order to do this we must add a 2nd dimension 
# so we warp the array inside another array
fig, ax = plt.subplots(figsize=(13, 3))

pos = plt.imshow([one_d_array], vmin=0, vmax=20)
fig.colorbar(pos)

In [None]:
one_d_array *= 2

In [None]:
def plot_grid(data):
    fig, ax = plt.subplots(figsize=(18, 10))

    pos = plt.imshow(data)
    fig.colorbar(pos)

In [None]:
# Now we'll render a raster of the data with matplotlib, but in order to do this we must add a 2nd dimension 
# so we warp the array inside another array
fig, ax = plt.subplots(figsize=(13, 3))

pos = plt.imshow([one_d_array])
fig.colorbar(pos)

In [None]:
# We can also apply more advanced logic to our array
# Let's multiply two arrays together

another_array = np.array((21, 24, 25, 26, 28, 22))

another_array

In [None]:
one_d_array * another_array

In [None]:
# Now we'll load some real weather data into an array and play with some more advanced features
with open("../sample_array.np", "rb") as f:
    temperature_data = np.fromfile(f)

In [None]:
temperature_data

In [None]:
# now let's visualize it
fig, ax = plt.subplots(figsize=(13, 3))

pos = plt.imshow(temperature_data)
fig.colorbar(pos)

In [None]:
# whoops! This data is 1-D still
print(temperature_data.shape)
temperature_data = temperature_data.reshape((721, 1440))
print(temperature_data.shape)

In [None]:
# now let's visualize it
fig, ax = plt.subplots(figsize=(18, 10))

pos = plt.imshow(temperature_data)
fig.colorbar(pos)

In [None]:
# The data is loaded in Kelvin, so let's convert it to celsius

temperature_data -= 273.15

In [None]:
plot_grid(temperature_data)

In [None]:
# what if we wanted to apply an equation to only specific data?
# let's look at numpy's "where" feature

sample_array = np.array([[1, 2, 3, 4], [2, 3, 4, 5]])
np.where(sample_array > 3)

In [None]:
# numpy.where returns the indexes of all locations in your array where your equation evaluates to true
# this can be fed in to access those specifc points in an array

sample_array[0, 1]

In [None]:
# you can also access multiple points in the array
sample_array[[0,0], [1,2]]

In [None]:
# let's get every value that is greater than 3 now
sample_array[np.where(sample_array > 3)]

In [None]:
# now let's use this to replace all values larger than 3 with a 0
sample_array[np.where(sample_array > 3)] = 0
sample_array

In [None]:
# we can also mask out the data
# masked arrays allow you to run advanced calculations and ignore masked values saving computing time
# or excluding those values from a render

plot_grid(sample_array)

In [None]:
sample_masked = np.ma.masked_where(sample_array == 0, sample_array)
sample_masked

In [None]:
# the results of a mask will come back in two parts, the data itself, and a boolean mask array
# most numpy and scipy functions will automatically parse these pieces of information to optimize

# when rendered, masked values will just be missing/tranpsarent
plot_grid(sample_masked)

In [None]:
# next we'll apply this logic to our temperature arrray and mask out anywhere the temperature is above freezing
temperature_masked = np.ma.masked_where(temperature_data > 0, temperature_data)
plot_grid(temperature_masked)