# Indexing and masking on xarray

In [None]:
#Import xarray
import xarray as xr
#Other useful Python libraries
import os
import matplotlib.pyplot as plt
import numpy as np

# This is what the display will look like on Sublime so we'll change our display style to reflect that
xr.set_options(display_style="text")

Load in the SST data and save the necessary variables

In [None]:
#Change to path of directory
dataPath = '/Users/hellenfellow/Dropbox (AMNH)/AMNH-Ankitha/Teaching_2021/Lessons/Unit2_Ocean_Temperature/Data'
os.chdir(dataPath)
os.getcwd()

In [None]:
# Load the sea surface temperature dataset
fileName = 'HadISST_sst.nc'
data = xr.open_dataset(fileName)

#Save the necessary variables
sst = data.sst
lat = data.sst.latitude
lon = data.sst.longitude
time = data.sst.time

### Refresher on list indexing

For the provided list, find
1. the first and last element
1. second half of the list

In [None]:
test_list = [i for i in range(2,42,2)] #does the same thing as appending values to a list
test_list

In [None]:
#first and last element



In [None]:
#second half of list



### Indexing NumPy arrays

Lists are only useful when storing one dimensional data. For multidimensional data, we would use NumPy arrays instead. 

In [None]:
#Create dataset
x = np.arange(-5,5)
y = x
X, Y = np.meshgrid(x,y)
data = X**2 + Y**2
data

Our data looks a little bit like a grid. Say we want the element in the **first** row and **second** column i.e 41

In [None]:
data[0,1] #data[first row, second column]

Find the index for 0 in the dataset

Print out the first 3 columns. 

*Hint:* To output everything in a dimension, you would use `:`

### Indexing on xarray

xarray saves its multidimensional data as NumPy arrays but has some additional functionalities.

First find the three dimensions of the SST dataset.

*Hint*: There's a pretty handy xarray method to do this!

Let's find SST of the first time period in the dataset (Jan 1870 or the 0th value of the time dimension) over the entire Earth or all the latitude and longitude values. 

This follows the same syntax as the 2D data we just worked with but now you have 3 dimensions!

Find the data for the first time period and the 89th and 90th latitude value. This is all the Jan 1870 temperature data along the equator. 

Figuring out what coordinates these index values correspond to can be tricky. xarray makes this easier with the `loc[]` method where you can specify the range of the actual coordinate values you want. The latitude values that come closest to the equator are 0.5N (0.5) and 0.5S (-0.5).

Using `loc[]`, the result above would be:

In [None]:
sst.loc['1870-01-16T11:59:59.505615234',0.5:-0.5,:] #the latitude values go from positive to negative or North to South

In [None]:
#or even (making use of the powerful time indexing)
sst.loc['1870-01',0.5:-0.5,:]

What about all temperature values along the equator for the year 1870?

Since we're studying the impact of the North Atlantic Oscillation on the ocean, we want to identify a good region to conduct our data analysis of ocean temperature over. What might that be?

Find the SST for this region.

### Masking on xarray

You might have noticed that parts of the SST data are marked with NaN (or Not a Number). These are the Earth's land masses; the regions generally whited out on a plot of the data. This data is **masked**.

We can create our own masks using the `where` method. Here, rather than specifying a range of values to index over, you specify a condition your data must meet. For example, all temperature values not equal to 30&deg;C.

Python conditions
* equals: `a == b`
* not equals: `a != b`
* less than: `a < b`
* less than or equal to: `a <= b`
* greater than: `a > b`
* greater than or equal to: `a >= b`

In [None]:
sst.where(sst != 30)

Find all the SST values less than or equal 0&deg;C

Plot the mean of this masked SST data using a contour plot

### Challenge

Plot the mean SST for the Northern Hemisphere.

*Hint*: You can set your conditions for SST based on your latitude values