## Prophet Water Level Outliers

Use water level measurements of groundwater available from IntellusNM.com to identify outliers, trend, seasonality, and upset events.

This notebook contains basic statistical analysis and visualization of the data.

### Data Sources
- summary : Processed file from notebook 1-Data_Prep

### Changes
- 02-19-2024 : Started project

In [1]:
import pandas as pd
from pathlib import Path
from datetime import datetime
import seaborn as sns
import prophet
import plotly
import matplotlib.pyplot as plt
import numpy as np

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
  from .autonotebook import tqdm as notebook_tqdm


In [2]:
from prophet import Prophet
from prophet.plot import add_changepoints_to_plot
from prophet.plot import plot_plotly, plot_components_plotly
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error


In [3]:
%matplotlib inline

### File Locations

In [4]:
today = datetime.today()
in_file = Path.cwd() / "data" / "processed" / f"summary_{today:%b-%d-%Y}.pkl"
report_dir = Path.cwd() / "reports"
report_file = report_dir / "Excel_Analysis_{today:%b-%d-%Y}.xlsx"

In [5]:
df = pd.read_pickle(in_file)
df = df.rename(columns={'Measurement Date Time':'ds', 'Groundwater Elevation':'y'})
df.dtypes

Site ID                                             object
Location ID                                         object
ds                                          datetime64[ns]
Groundwater Measurement                            float64
y                                                  float64
Groundwater Level Comments                          object
Groundwater Level Data Quality Code                 object
Groundwater Level Validation Reason Code            object
dtype: object

In [6]:
pip show prophet

Name: prophet
Version: 1.1.5
Summary: Automatic Forecasting Procedure
Home-page: 
Author: 
Author-email: "Sean J. Taylor" <sjtz@pm.me>, Ben Letham <bletham@fb.com>
License: MIT
Location: /Users/paulmark/JupyterNotebooks/Water Level Outliers/.venv/lib/python3.11/site-packages
Requires: cmdstanpy, holidays, importlib-resources, matplotlib, numpy, pandas, tqdm
Required-by: 
Note: you may need to restart the kernel to use updated packages.


### Perform Data Analysis - Loop through locations to identify anomalies

In [14]:
location = df[['Location ID','Site ID']].drop_duplicates().reset_index()

In [15]:
markers = {'N':'X', 'Y':'o'}
hue_order = ['N','Y']
style_order = ['N','Y']
iw = 0.99


for location, site_id in zip(location['Location ID'], location['Site ID']):
	# Add seasonality and instantiate a new Prophet model
	model = Prophet(interval_width=iw, yearly_seasonality=True, weekly_seasonality=True)

	# print(location, parameter)
	export_subset = df[(df['Location ID'] == location) & (df['Site ID'] == site_id)]
	
	export_subset = export_subset[export_subset.groupby(['Location ID']).transform('size')>10]

	if export_subset.empty:
		continue

	# Fit the model on the training dataset
	model.fit(export_subset)

	# Make prediction
	forecast = model.predict(export_subset)

	# Merge actual and predicted values
	performance = pd.merge(export_subset, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']], on='ds')

	# Create an anomaly indicator
	performance['anomaly'] = performance.apply(lambda rows: 1 if ((rows.y<rows.yhat_lower)|(rows.y>rows.yhat_upper)) else 0, axis = 1)

	anomalies = performance[performance['anomaly']==1].sort_values(by='ds')
	if anomalies.empty:
		continue
	
	anomalies.to_csv('anomalies.csv', mode='a', index=True, header=True)

16:39:08 - cmdstanpy - INFO - Chain [1] start processing
16:39:09 - cmdstanpy - INFO - Chain [1] done processing
16:39:10 - cmdstanpy - INFO - Chain [1] start processing
16:39:10 - cmdstanpy - INFO - Chain [1] done processing
16:39:10 - cmdstanpy - INFO - Chain [1] start processing
16:39:11 - cmdstanpy - INFO - Chain [1] done processing
16:39:12 - cmdstanpy - INFO - Chain [1] start processing
16:39:33 - cmdstanpy - INFO - Chain [1] done processing
16:39:38 - cmdstanpy - INFO - Chain [1] start processing
16:39:45 - cmdstanpy - INFO - Chain [1] done processing
16:39:49 - cmdstanpy - INFO - Chain [1] start processing
16:39:51 - cmdstanpy - INFO - Chain [1] done processing
16:39:53 - cmdstanpy - INFO - Chain [1] start processing
16:40:02 - cmdstanpy - INFO - Chain [1] done processing
16:40:04 - cmdstanpy - INFO - Chain [1] start processing
16:40:11 - cmdstanpy - INFO - Chain [1] done processing
16:40:13 - cmdstanpy - INFO - Chain [1] start processing
16:40:14 - cmdstanpy - INFO - Chain [1]

### Save Excel file into reports directory

Save an Excel file with intermediate results into the report directory

In [None]:
writer = pd.ExcelWriter(report_file, engine='xlsxwriter')

In [None]:
df.to_excel(writer, sheet_name='Report')

In [None]:
writer.save()