# Homework 5: Detecting Trends and Change Points in Time Series

## In class we analyzed the annual maxima in the Turkey River at Garber, IA to test for trends and change points. Here we will apply the same methods to tests for trends and change points in the annual minimum 7-day flows at the same site.

### 1. Compute the annual minimum 7-day flows. Drop the data before WY 1933.

In [None]:
!pip install dataretrieval
!pip install pmdarima
!pip install astropy
!pip install lmoments3
!pip install pymannkendall
!pip install numpy==1.24.0

In [None]:
from google.colab import drive
import numpy as np
import scipy.stats as ss
import pandas as pd
import matplotlib.pyplot as plt
import dataretrieval.nwis as nwis
from scipy.signal import periodogram
import statsmodels.api as sm
import seaborn as sns
import pymannkendall as mk
import statsmodels.formula.api as smf
from astropy.stats import bootstrap as bootstrap
from pmdarima.arima import auto_arima
from statsmodels.tsa.ar_model import AutoReg

# allow access to google drive
drive.mount('/content/drive')

!cp "drive/MyDrive/Colab Notebooks/CE6280/Homeworks/HW4_utils.py" .
from HW4_utils import *

$\color{red}{\text{Run the code below to get the annual 7-day minima, min7dayQ, starting in WY1933, which you will use for the time series analysis.}}$

In [None]:
flow_df = nwis.get_record(sites='05412500', service='dv', parameterCd='00060', start='1932-10-01', end='2024-09-30') # Turkey River at Garber, IA

# find water year of each data point
flow_df['Year'] = flow_df.index.year
flow_df['Month'] = flow_df.index.month
flow_df['WY'] = flow_df['Year']
flow_df['WY'][np.where(flow_df['Month']>=10)[0]] = flow_df['WY'][np.where(flow_df['Month']>=10)[0]] + 1

# find minimum 7-day flows each year
years = np.arange(np.min(flow_df['WY']), np.max(flow_df['WY'])+1, 1)
min7dayQ = np.zeros(len(years))
for i,year in enumerate(years):
    yearlyData = np.array(flow_df['00060_Mean'])[np.where(flow_df['WY']==year)[0],]
    sevenDayQ = np.zeros(len(yearlyData)-7+1)
    for j in range(len(yearlyData)-7+1):
        sevenDayQ[j] = np.mean(yearlyData[j:(j+7)])

    min7dayQ[i] = np.min(sevenDayQ)

min7dayQ

### Use the PPCC test to determine if min7dayQ are normally distributed and report your conclusion. If they are not normally distributed, transform them. If you transform the data, what transformation do you choose and why? What is the result of a PPCC test on the transformed data with the normal distribution? (10 pts)

### 2. Using the raw or transformed time series of seven-day annual minima beginning in WY 1933 (whichever is normally-distributed), plot the time series, its periodogram, and its ACF. Based on these plots, does there appear to be a trend, or any seasons in the time series? Explain. (10 pts)

### 3. Let's assume there are no seasons. Do the ACF and PACF suggest there is auto-correlation? What ARMA model would you predict would fit the data best? (5 pts)

### 4. Fit the best hypothesized ARMA model from question 3 (including an intercept) and print its summary. Plot the observed and fitted values, the ACF of the residuals, the residuals vs. fitted values, and perform a PPCC test for normality of the residuals. How do the fit and diagnostics look? (15 pts)

### 5. Using regression, test for a trend in the residuals of the ARMA model from question 4. Is the trend parameter statistically significant (report the p-value)? Use the Mann-Kendall test on the residuals of the ARMA model and compare its associated p-value. What do you conclude under each test? (10 pts)

### 6. Might there be a change point in the mean/median/distribution of the residuals, and if so, where is it most likely to be? Report your conclusions to this question using regression, the Wilcoxon rank-sum test, and the K-S test, respectively. (15 pts)

### 7. Use $\texttt{auto_arima}$ to fit another ARMA model to the raw or transformed time series of seven-day annual minima beginning in WY 1933 (whichever is normally distributed). This will find the ARMA model that minimizes the AIC. Plot the observed and fitted values, the ACF of the residuals, the residuals vs. fitted values, and perform a PPCC test for normality of the residuals. How do the fit and diagnostics look? How does the model compare with your model from question 3? What does the optimal order of the model ($p$, $d$, $q$) imply? (15 pts)