# Precipitation exercises
***

## <font color=steelblue>Exercise 2 - Double-mass curve<br>

<font color=steelblue>The table *2MassCurve* in the file *RainfallData.xlsx* provides annual precipitation measured over a 17-year period at five gages in a region. Gage C was moved at the end of 1977. Carry out a double-mass curve analysis to check for consistency in the record of that gage C, and make appropriate adjustments to correct for any inconsistencies.</font>

In [None]:
import numpy as np

import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
sns.set_context('notebook')

## Introduction
A **double mass curve** is a plot of the cumulative data of one variable against the cumulative data of another variable (or against the average cumulative values of the same variable in different locations) during the same period.
    
<img src="img/Double mass curve.JPG" alt="Mountain View" style="width:450px">

> <font color=grey>Double-mass curve of precipitation data. *[(Double-Mass Curves. USGS, 1960)](https://pubs.usgs.gov/wsp/1541b/report.pdf)*.</font>
    
If no change occurred during the period, the plot must be a straight line in which slope is the constant of proportionality between series. A break in the slope means that a change in the constant of proportionality.

The double-mass curve, when applied to precipitation, adopts the form $Y=m \cdot X$, where $b$ is the slope. This form implies that the line should not have an intercept.

### Import data

In [None]:
# import data from sheet '2MassCurve' in file 'RainfallData.xlxs'


### The double-mass curve

In [None]:
# compute annual average across all gages


__Visualize the data__
We will create first a scatter plot comparing the annual series of precipitation in gage C against the average across gages. 

In [None]:
# scatter plot of annual precipitation


This type of plot has a large dispersion, so it isn't convenient to spot anomalies. We can neither see trends nor the year with possible errors in the data set.

Instead, we will plot a __double mass curve__. This plot  is created from the series of __accumulated precipitation__. This way, the plot must have always a positive and continues trend, allowing us to identity anomalies in the precipitation records.

The function `cumsum` in `NumPy` calculates the accumulated series from a series of data. For instance:

In [None]:
# annual series of accumulated precipitation


To avoid duplication of data, we will not save the previous accumulated series, but we will be using the function `cumsum` very often in the following.

Let's plot the double-mass curve for station C.

In [None]:
plt.figure(figsize=(5, 5))
# line of slope 1

# double-mass curve

# year 1978


# configuration
plt.axis('equal')
plt.xlabel('average across gages')
plt.ylabel('gage C')
plt.legend();

We can clearly observe a break in the line in year from year 1918 onwards, exactly matching the year in which the station C was moved.

### Correct series
To correct the series we must decide whether we trust the data before or after the break in the double-mass curve. For that, we would need further information about the location and instruments used. In this exercise, we will assume that the data up till 1978 is the one we trust the most, and we will fix the series from 1979 on.

The steps are the following:
1. Calculate the slope of the first part of the double-mass curve ($m_1$).
2. Calculate the slope of the second part of the double-mass curve ($m_2$).
3. Correct the series. Assuming that the correct slope is $m_1$, the corrected precipitation $P_c$ for the observed data $P_o$ is:
$$P_c = \frac{m_1}{m_2} \cdot P_o$$

Therefore, we need to learn how to fit the slope of a linear regression of the form $y = m\cdot x$. We will first define a function that represents that form of the linear regression and later on we will use the function `scipy.optimize.curve_fit` to fit the slope $m$.

In [None]:
# linear regression function


In [None]:
# import function scipy.optimize.curve_fit


#### Fit the regression for the first part
This first time we will do it step by step.

In [None]:
# define x and y in the linear regression


In [None]:
# compute cumulative series


In [None]:
# fit the regression up till 1978


In [None]:
plt.figure(figsize=(5, 5))
# linear regression


# double-mass curve


# configuration
plt.axis('equal')
plt.xlabel('average across gages')
plt.ylabel('gage C')
plt.legend();

#### Fit the regression for the second part

In [None]:
# fit the regression from 1978 onwards


#### Correct the data

In [None]:
# correction factor


In [None]:
# copy the original data in a new column

#multiply the second period by the correction factor


In [None]:
plt.figure(figsize=(5, 5))
# line of slope 1

# double-mass curve


# year 1978


# configuration
plt.axis('equal')
plt.xlabel('average across gages')
plt.ylabel('gage C')
plt.legend();

# save figure
plt.savefig('../output/Ex2_double-mass curve.png', dpi=300)

### Useful links:
[USGS report on double-mass curves](https://pubs.usgs.gov/wsp/1541b/report.pdf)<br>
[`SciPy.optimize.curve_fit` help](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html)