# Python Bootcamp Day 6
## Multi-dimensional data with xarray, part 2

# Challenge 

"The World Ocean Atlas (WOA) is a collection of objectively analyzed, quality controlled temperature, salinity, oxygen, phosphate, silicate, and nitrate means based on profile data from the World Ocean Database (WOD). It can be used to create boundary and/or initial conditions for a variety of ocean models, verify numerical simulations of the ocean, and corroborate satellite data." - [NOAA National Centers for Environmental Information](https://www.ncei.noaa.gov/products/world-ocean-atlas)

In this challenge, we will explore WOA temperature and oxygen climatologies. A climatology indicates the average environmental conditions over a long period of time, generally 30 years or more. Here, we are looking at annual means, so all seasonal variability has been removed. 

### Tasks: 

1) **Find Data** 
    * Starting from this website, navigate to the OPENDAP link for temperature and dissolved oxygen and open these datasets in your notebook using xarray.open_dataset(). https://www.ncei.noaa.gov/access/world-ocean-atlas-2018/
    * Hint: Select 1 degree NetCDF output for "Averaged Decades Years", then select the annual file. (For the oxygen dataset, only one time period is available)
<br><br>
2) **Understand the Datasets**
    * Look through the netcdf metadata. Select the variables of interest.
    * You may find the dataset documentation useful: https://www.ncei.noaa.gov/sites/default/files/2020-04/woa18documentation.pdf
    * Note the different time periods for temperature and oxygen (1981-2010 vs. 1955-2010). This is not ideal, but we'll have to work with the data that is easily accessible. 
<br><br>
3) **Merge Temperature and Oxygen into one dataset**
    * Hint: https://docs.xarray.dev/en/latest/generated/xarray.Dataset.merge.html
<br><br>
4) **Select surface-level values**

5) **Make maps of temperature and oxygen annual mean climatologies**

6) **Calculate correlation between temperature datasets**
    * Import the mean SST netcdf that we exported earlier
    * Calculate the Pearson correlation coefficient between the two temperature datasets
    * Hint: https://docs.xarray.dev/en/stable/generated/xarray.corr.html?highlight=corr
    * How well do the climatologies match up?
<br><br>
7) **Calculate correlation between WOA temperature and oxygen**
    * First, make a scatter plot of temperature vs. oxygen. Does this relationship make sense based on our knowledge of chemistry?
    * Calculate the Pearson correlation coefficient
<br><br>
8) **Calculate the line of best fit**
    * Hint: 

        ```
        # Import stats module
        from scipy import stats
        # Create 1-D numpy arrays
        x = woa_data_surface.t_an.to_numpy().flatten()
        y = woa_data_surface.o_an.to_numpy().flatten()
        # Remove np.nan values by masking
        mask = ~np.isnan(x) & ~np.isnan(y)
        slope, intercept, r_value, p_value, std_err = stats.linregress(x[mask], y[mask])
        ```
<br>
9) **Add the line of best fit to your scatter plot**

10) **Visualize spatially**
    * While most points sit close to the line of best fit, we see some points where our expected oxygen does not match what we observe. Why might this be the case? To investigate, let's plot these points spatially so that we can see if spatial patterns emerge. 
    * Use the line of best fit, to calculate the 'expected_o2', and add this as a new variable in your dataset. This is the oxygen concentration that we would estimate based on temperature measurements alone.
    * Now, calculate the difference between the 'expected_o2' and the measured value. 
    * Plot this difference on a map. What spatial trends do you observe? Any ideas about why the observed oxygen differs from the expected value in these regions?