# **Geophysics 310 Lab 1: Python, Linear Regression, and Sea Level Rise** 

The aim of this first 310 lab is to become familiar with using Python to analyse and visualise scientific data. We will also fit some models to sea level data to explain trends over time. 

**Lab work and submission:** Complete all the exercises within the notebook itself, unless otherwise stated. Make sure you save the notebook (ideally to your Google Drive) before beginning, and save regularly throughout. You are encouraged to work on the problems together, but everyone will submit their own work. Submit your .ipynb notebook file along with any other files or data through Canvas.

First, we import the python libraries we'll use in this notebook 

In [None]:
import pandas as pd # pandas is a suite of tools to use on (csv) data
import matplotlib.pyplot as plt # plotting tools
import numpy as np # numerical tools
import scipy.stats # this package has a pre-made linear regression function 


## Sea Level Data
Sea level measurements for the Auckland harbour are averaged over the year to obtain a mean sea level and standard deviation. The Python library [pandas](https://pandas.pydata.org) can easily read CSV data for us, without needing to know a lot of programming. Then we will convert the *pandas* data into a [NumPy](http://numpy.org) matrix, so we can easily work with the data throughout the rest of this notebook.

In [None]:
url="https://www.linz.govt.nz/sites/default/files/data/Auckland_Annual_MSL.csv"
harbour_data = pd.read_csv(url,delimiter=',',names=['year','MSL','err','comment'],skiprows=2)

There is a ton of things you can do with this "data frame," which we called "harbour_data." Here, we only used the tool to read the data in, and now we convert the data frame to a numpy array. A numpy array is like a matrix of numerical values. You can then slice the matrix in any way you want to extract information:

In [None]:
harbour_data_matrix = harbour_data.to_numpy()
# check shape of matrix
print(harbour_data_matrix.shape)
# print tenth row
print(harbour_data_matrix[9]) # remember, python starts counting at 0!
# print first column
print(harbour_data_matrix[:,0]) #format for indexing is [rows,columns]

#### Exercise 1

In our case, the matrix has all the time data in the first column, all the sea level height in the second column, and the standard deviations in the third column. In the cell below, create three variables ```time```, ```height```, and ```errors```, and assign the correct data to each.

In [None]:
time   =  #Extract the time data
height =  #Extract the height data
errors =  #Extract the standard deviations

Next, we will plot sea level in the Auckland Harbour, as a function of time, using [matplotlib](https://matplotlib.org/):

In [None]:
plt.errorbar(time, height, yerr=errors, color='r', ecolor='gray', marker='o', linestyle='')
plt.grid()
plt.xlabel('Date (year)')
plt.ylabel('Mean Sea Level (m)')
plt.title('Princes Wharf, Auckland (New Zealand)')
plt.axis('tight')
plt.savefig('test.png', bbox_inches='tight')
plt.show()

### Linear regression

The value for mean sea level goes up and down, but clearly has a trend upwards. Let's see what the best fitting line is through these data. 
There are many ways to do such a linear regression, or more advanced polynomial fitting, in Python. Here's one example of linear regression from the stats functions in scipy:

In [None]:
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(time, height)
print(slope, intercept)

#### Exercise 2
Replot the data, adding the best-fitting line through these data. 

In [None]:
# write your code here

### Residuals or misfit
#### Exercise 3
The line looks like a reasonable representation of the data, but how did we do from a quantitative point of view? Let's compute the mean and standard deviation of the residual values (these are the values of the water depth minus the best fitting straight line through the data). Compute and print these values below:

In [None]:
# Write code here

#### Exercise 4
If the regression did its job, the mean is practically zero. This means that any overshooting by the observations are balanced by undershootings. What did you find for the standard devation? What factors can you think of that contribute to the standard deviation? Type your responses in the box below: (Note, the scientists involved in collecting these data estimate the standard error in each of these annual means for sea level in Auckland is 2.5 cm)

<font color='red'>*Type your answers here*<font>

If we were to feel that the linear model is a "poor" fit, one could always fit the data better with a model that has more degrees of freedom. Does this experiment warrant a quadratic term? Or even higher-order polynomials? Maybe not over these 100 years of data, but if the rise is due to climate change and we have positive feedbacks, maybe we should account for that in our model. In any case:

**Given enough degrees of freedom in the model, we can fit the (any) data perfectly!**

If we decided to model the data with a polynomial instead of a straight line, then we could use the NumPy [polyfit](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html) function. Here, we fit with a third-degree polynomial:

In [None]:
fit = np.polyfit(time, height, 3) #Least squares polynomial fit. Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). 

curve = (fit[0] * time ** 3) + (fit[1] * time ** 2) + (fit[2] * time) + fit[3]

### Sea level rise in a time of climate change
#### Exercise 5
Australian scientists show historic data also support our linear regression results of a [1 to 2 mm/y rise in sea level](https://en.wikipedia.org/wiki/Sea_level_rise) averaged over the last 100+ years. However, tidal gauge and satellite data from the last decade(s) indicate sea level may now be rising at double this rate, according to the latest IPCC! With this info, have another look at the Auckland data. Most sea level values in the 2000s falls *above* the regression line. It would require more than data from just one tidal gauge to attribute this significant, of course. Especially when you learn that the Auckland tidal gauge has been moved site three times since 2000. However, for the sake of a fitting exercise, write code to fit these data with a polynomial (see example above) and [an exponentional function](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html). Plot the data with the linear, polynomial, and exponential fits on the same plot, and caluclate the residuals for each of the fits. 

In [1]:
# Write code here

#### Exercise 6
Are the polynomial residuals smaller than for a linear fit? How about the exponential residuals? Are any of the residuals closer to the reported standard error in the data? What do you conclude?

<font color='red'>*Type your answers here*<font>

### Using multiple datasets to understand a trend
As we have seen, it is often difficult to confirm or reject particular models based on one dataset. This is why geophysicists combine multiple different types of data to build a better picture of what is physically happening. However, it often takes a bit of geophysics detective work to prove or disprove a particular hypothesis...

#### Exercise 7
The following list gives the average sea level height from 2011-2020 recorded at a location somewhere in New Zealand. Write some code to plot the data.

In [None]:
years = range(2011,2021)
sea_level = np.array([2.99, 3.2, 3.28, 3.29, 3.31, 3.21, 2.32, 2.19, 2.25, 2.22])

# Plot the data here

Is this trend different to what you would expect? What do you think could be causing this discrepancy? (Note: measurement error has been ruled out as a possible cause)

<font color='red'>*Type your answer here*<font>

#### Exercise 8
In order to confirm or deny our hypothesis for what we observe, we need to use a second type of data. Locate and download a second dataset that will help you explain the trend you observe in the sea level data. Explain why you chose this dataset in the space below. You may find these links helpful:

[Map of sea level gauge locations in New Zealand](https://www.linz.govt.nz/sea/tides/sea-level-data/sea-level-data-downloads)

[Map of geodetic sensors in New Zealand](https://www.geonet.org.nz/data/gnss/map)

[Instructions for how to download data from GeoNet](https://fits.geonet.org.nz/api-docs/endpoint/observation)

<font color='red'>*Type your answer here*<font>

#### Exercise 9
Plot your dataset with the sea level data and explain how they combine to help you understand the physical processes at work.

In [None]:
# Make a plot here

<font color='red'>*Type your answer here*<font>