# Numerical Data in Python

*February 13, 2023*

**Jumpstart Comprehension Check**

```python
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
```

**What output would be produced by each of the following Python expressions?** Try to answer without typing anying.

1. `months[2:4]`
2. `list(range(months))`
3. `months[4:]`
4. `list(enumerate(months))`
5. `months[-2]`

In [4]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
list(range(len(months)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [5]:
months[4:]

['May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

In [6]:
list(enumerate(months))

[(0, 'Jan'),
 (1, 'Feb'),
 (2, 'Mar'),
 (3, 'Apr'),
 (4, 'May'),
 (5, 'Jun'),
 (6, 'Jul'),
 (7, 'Aug'),
 (8, 'Sep'),
 (9, 'Oct'),
 (10, 'Nov'),
 (11, 'Dec')]

In [7]:
months[-3:-1]

['Oct', 'Nov']

---

## Introducing NumPy

In [13]:
# np.ndarray
arr = np.array([1,2,3])
arr

array([1, 2, 3])

In [14]:
arr1 = np.array([1, False, 3])
arr1

array([1, 0, 3])

In [23]:
np.dtype(arr1[1]) # Dtype of second element in list

dtype('int32')

In [24]:
xy = [[42.0, 45.5], [-118.1, -118.2]]
xy

[[42.0, 45.5], [-118.1, -118.2]]

In [25]:
arr = np.array(xy)
arr

array([[  42. ,   45.5],
       [-118.1, -118.2]])

In [26]:
arr[0]

array([42. , 45.5])

In [29]:
arr.ndim

2

In [30]:
arr.shape

(2, 2)

---

## Working with NumPy Arrays

For the rest of this lesson, we'll be working with [data on near-surface air temperatures from the NOAA Center for Climate Prediction (CPC).](http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.GHCN_CAMS/.gridded/.deg0p5/index.html) 

In [8]:
import pandas as pd

temps = pd.read_csv(
    'http://files.ntsg.umt.edu/data/GIS_Programming/data/NOAA_NCEP_CPC_gridded_deg0p5_1948-2022_Utqiagvik.txt',
    header = None).to_numpy()

In [11]:
import numpy as np

In [31]:
temps.shape

(75, 12)

In [32]:
# First year of data
temps[0]

array([-27.52, -25.63, -27.88, -16.84, -10.27,  -0.7 ,   4.9 ,   2.36,
        -3.82, -11.5 , -21.82, -30.65])

In [33]:
# First month of first year
temps[0,0]

-27.52

In [34]:
# Second month of first five years
temps[0:5, 1]

array([-25.63, -30.53, -32.76, -31.  , -32.15])

In [36]:
# Every year in January
temps[:,0]

array([-27.52, -25.03, -20.86, -33.64, -28.22, -29.13, -28.01, -24.26,
       -28.72, -20.38, -26.03, -27.57, -28.86, -22.84, -20.94, -22.14,
       -28.38, -28.54, -27.94, -25.05, -23.9 , -26.26, -26.7 , -31.04,
       -28.07, -27.04, -25.43, -31.55, -27.58, -20.83, -20.67, -20.32,
       -26.34, -18.03, -25.74, -30.18, -27.27, -22.49, -27.68, -25.08,
       -23.53, -31.53, -30.05, -25.87, -28.84, -25.1 , -25.03, -25.35,
       -24.79, -28.25, -27.98, -29.31, -26.31, -22.29, -25.8 , -23.65,
       -24.35, -23.12, -26.99, -25.91, -26.89, -26.07, -27.62, -23.35,
       -30.97, -23.  , -16.91, -24.29, -16.56, -24.61, -24.08, -23.47,
       -29.37, -21.65, -28.45])

In [38]:
# Last two years in October
temps[-2:, -3]

array([-5.88, -7.04])

![](numpy-matrix-indexing.png)

*Image is from a presentation by Mauricio Sevilla.*

---

### Challenge: Working with Multi-dimensional Arrays

1. What's the average July temperature in Utqiagvik over the years?
2. What was the minimum monthly temperature in 1979? Recall that the years of this data extend from 1948 through 2025.

In [50]:
avg = np.mean(temps[:, 6])
avg

4.907466666666667

In [49]:
min = np.min(temps[30, :])
min

-25.55

---

## Calculations on NumPy Arrays

In [51]:
# If you add a number or do a calculation to a numpy array, the calculation will be performed on every element in the array
# For example, if I want the average temp for each month for the first two years, I can:
(temps[0] + temps[1]) / 2

array([-26.275, -28.08 , -24.505, -20.69 , -10.295,  -0.96 ,   4.595,
         4.06 ,  -1.285,  -8.945, -18.15 , -27.805])

In [56]:
# Or if I want temp avg per month across all years
temps.mean(axis = 0) # This specifies WHICH AXIS I WANT TO COLLAPSE
# for example, axis = 0 are rows. I want to go across columns and only have 1 row

array([-25.70133333, -27.22226667, -25.85093333, -17.79946667,
        -6.47933333,   1.8008    ,   4.90746667,   3.85333333,
        -0.16373333,  -8.76786667, -17.87333333, -23.7628    ])

In [57]:
# Or if I want temp avg per year across months
temps.mean(axis = 1)

array([-14.11416667, -12.275     , -11.1975    , -12.40833333,
       -13.42333333, -13.47583333, -12.45583333, -14.54416667,
       -14.25083333, -11.965     , -11.7575    , -13.62916667,
       -12.84333333, -13.56      , -11.19166667, -12.0875    ,
       -14.75666667, -12.9475    , -13.39333333, -12.0125    ,
       -12.53666667, -12.5225    , -13.59333333, -14.04416667,
       -12.80333333, -12.22916667, -14.545     , -14.32666667,
       -12.84833333, -11.90416667, -11.62333333, -11.24      ,
       -12.6825    , -11.41333333, -13.1425    , -13.5475    ,
       -14.82583333, -12.68416667, -12.11416667, -12.42833333,
       -13.025     , -11.045     , -12.39833333, -12.865     ,
       -12.47833333, -10.4775    , -12.75083333, -11.45083333,
       -11.77      , -11.22833333,  -9.32083333, -12.25666667,
       -11.6675    , -11.5725    , -10.03833333, -10.465     ,
       -10.59166667, -10.3175    , -10.75416667,  -9.75      ,
       -11.05      , -10.775     , -10.21833333, -10.46

It can be very difficult to remember what `axis` to use in calculating a summary... Here's a helpful visual representation.

![](numpy-axis.jpg)

*Image courtesy of Alex Riley*

---

### Challenge: Statistical Summary of an Array

What's the minimum, maximum, and mean monthly temperature for August in Utqiagvik?

In [60]:
print(temps[:, 7].min())
print(temps[:, 7].max())
print(temps[:, 7].mean())

-0.07
7.0
3.8533333333333326


---

## Sorting and Filtering Arrays

In [65]:
# In which of these months is the temperature below -5C?
temps <= -5

# This is a really cool operation for rasters because you can create
# a whole new raster converted based on threshhold values of pixels
# e.g. burned area vs. non-burned area basd on ndbr
# e.g. height threshholds for vegetation in HIZ

array([[ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True,  True],
       [ True,  True,  True,  True,  True, False, False, False, False,
         True,  True

In [66]:
# You can find where it's true (e.g. what months of what years were avg temps < -5C)
np.argwhere(temps <= -5)

array([[ 0,  0],
       [ 0,  1],
       [ 0,  2],
       ...,
       [74,  9],
       [74, 10],
       [74, 11]], dtype=int64)

---

## More Resources

- [Visual introduction to NumPy](https://jmsevillam.github.io/slides/Python/Numpy.slides.html#/)