In [None]:
import pandas as pd

alerts_df = pd.read_parquet("../../data/nps/nps_public_data_alerts.parquet")

Given Python's differences from SQL, `window` functions are mostly used for rolling averages, something reflected in the Pandas [documentation](https://pandas.pydata.org/docs/user_guide/window.html) that we'll demonstrate here.

pandas supports 4 types of windowing operations:
- Rolling window: Generic fixed or variable sliding window over the values.
- Weighted window: Weighted, non-rectangular window supplied by the scipy.signal library.
- Expanding window: Accumulating window over the values.
- Exponentially Weighted window: Accumulating and exponentially weighted window over the values.

We'll focus on rolling + expanding, since those are the most applicable for transformation

In [None]:
alerts_df["alert_date"] = pd.to_datetime(alerts_df["lastIndexedDate"]).dt.date

alerts_df.head()

Let's get a count of alerts by category

In [None]:
num_alerts = (
    alerts_df.groupby(["alert_date", "category"])["description"].count().reset_index()
)

num_alerts.rename(columns={"description": "num_alerts"}, inplace=True)

num_alerts.tail()

If you're in analytics, you're likely aware that _rolling counts or averages_ can be incredibly valuable for monitoring trends over time. Pandas makes this quite easy. It will be helpful to first set the `alert_date` as our index.

In [None]:
num_alerts_reindexed = num_alerts.set_index("alert_date")

# get rolling 7-day
rolling_alerts_7 = (
    num_alerts_reindexed.groupby(["category"])["num_alerts"]
    .rolling(window=7)
    .sum()
    .reset_index()
)

It can also be useful to get multiple rolling averages to compare trends

In [None]:
rolling_alerts_7 = (
    num_alerts_reindexed.groupby(["category"])["num_alerts"]
    .rolling(window=7)
    .sum()
    .reset_index()
    .rename(columns={"num_alerts": "rolling_7"})
)
rolling_alerts_14 = (
    num_alerts_reindexed.groupby(["category"])["num_alerts"]
    .rolling(window=14)
    .sum()
    .reset_index()
    .rename(columns={"num_alerts": "rolling_14"})
)
rolling_alerts_28 = (
    num_alerts_reindexed.groupby(["category"])["num_alerts"]
    .rolling(window=28)
    .sum()
    .reset_index()
    .rename(columns={"num_alerts": "rolling_28"})
)

rolling_alerts_joined = rolling_alerts_7.merge(
    rolling_alerts_14, on=["alert_date", "category"]
).merge(rolling_alerts_28, on=["alert_date", "category"])

rolling_alerts_joined

It's outside this course, but whether you're an analyst or engineer, visualization of results is important for a gut-check! You can do this easily with plotly.

In [None]:
import plotly.express as px
import datetime

plot_df = rolling_alerts_joined[
    rolling_alerts_joined["alert_date"] > datetime.date(2023, 7, 1)
]

plot_cols = ["rolling_7", "rolling_14", "rolling_28"]

categories = ["Information", "Park Closure", "Caution"]

for category in categories:
    fig = px.line(
        data_frame=plot_df[plot_df["category"] == category],
        x="alert_date",
        y=plot_cols,
        title=f"Rolling '{category}' alerts",
    )

    fig.show()

Finally, if we were more interested in accumulating windows:

In [None]:
num_alerts_reindexed.sort_values(by=["alert_date", "category"], inplace=True)

cumulative_alerts = (
    num_alerts_reindexed.groupby(["category"])["num_alerts"]
    .expanding()
    .sum()
    .reset_index()
    .rename(columns={"num_alerts": "cumulative"})
)

In [None]:
import plotly.express as px

categories = ["Information", "Park Closure", "Caution"]

fig = px.line(
    data_frame=cumulative_alerts,
    x="alert_date",
    y="cumulative",
    color="category",
    title=f"Cumulative Alerts",
)

fig.show()