# Scheduled Query Execution Report

A notebook to report on failed or long-running scheduled queries, providing insights into reliability issues.

Here's a breakdown of the steps:
1. Retrieve Data
2. Convert Table to a DataFrame
3. Create an Interactive Slider Widget & Data Preparation
4. Create a Heatmap for Visualizing Scheduled Query Execution

## 1. Retrieve Data

Firstly, we'll write an SQL query to retrieve the execution history for scheduled queries, along with their status, timing metrics, and execution status. 

We're obtaining this from the `snowflake.account_usage.task_history` table.

In [None]:
SELECT 
    name,
    database_name,
    query_id,
    query_text,
    schema_name,
    scheduled_time,
    query_start_time,
    completed_time,
    DATEDIFF('second', query_start_time, completed_time) as execution_time_seconds,
    state,
    error_code,
    error_message,
FROM snowflake.account_usage.task_history
WHERE scheduled_time >= DATEADD(days, -1, CURRENT_TIMESTAMP())
ORDER BY scheduled_time DESC;

## 2. Convert Table to a DataFrame

Next, we'll convert the table to a Pandas DataFrame.

In [None]:
sql_data.to_pandas()

## 3. Create an Interactive Slider Widget & Data Preparation

Here, we'll create an interactive slider for dynamically selecting the number of days to analyze. This would then trigger the filtering of the DataFrame to the specified number of days.

Next, we'll reshape the data by calculating the frequency count by hour and task name, which will subsequently be used for creating the heatmap in the next step.

In [None]:
import pandas as pd
import streamlit as st
import altair as alt

# Create date filter slider
st.subheader("Select time duration")
days = st.slider('Select number of days to analyze', 
                 min_value=10, 
                 max_value=90, 
                 value=30, 
                 step=10)
    
# Filter data according to day duration
latest_date = pd.to_datetime(df['SCHEDULED_TIME']).max()
cutoff_date = latest_date - pd.Timedelta(days=days)
filtered_df = df[pd.to_datetime(df['SCHEDULED_TIME']) > cutoff_date].copy()
    
# Prepare data for heatmap
filtered_df['HOUR_OF_DAY'] = pd.to_datetime(filtered_df['SCHEDULED_TIME']).dt.hour
filtered_df['HOUR_DISPLAY'] = filtered_df['HOUR_OF_DAY'].apply(lambda x: f"{x:02d}:00")
    
# Calculate frequency count by hour and task name
agg_df = filtered_df.groupby(['NAME', 'HOUR_DISPLAY', 'STATE']).size().reset_index(name='COUNT')

st.warning(f"Analyzing data for the last {days} days!")

## 4. Create a Heatmap for Visualizing Scheduled Query Execution

Finally, a heatmap and summary statistics table are generated that will allow us to gain insights on the task name and state (e.g. `SUCCEEDED`, `FAILED`, `SKIPPED`).

In [None]:
# Create heatmap
chart = alt.Chart(agg_df).mark_rect(
    stroke='black',
    strokeWidth=1
).encode(
    x=alt.X('HOUR_DISPLAY:O', 
            title='Hour of Day',
            axis=alt.Axis(
                labels=True,
                tickMinStep=1,
                labelOverlap=False
            )),
    y=alt.Y('NAME:N', 
            title='',
            axis=alt.Axis(
                labels=True,
                labelLimit=200,
                tickMinStep=1,
                labelOverlap=False,
                labelPadding=10
            )),
    color=alt.Color('COUNT:Q', 
                    title='Number of Executions'),
    row=alt.Row('STATE:N', 
                title='Task State',
                header=alt.Header(labelAlign='left')),
    tooltip=[
        alt.Tooltip('NAME', title='Task Name'),
        alt.Tooltip('HOUR_DISPLAY', title='Hour'),
        alt.Tooltip('STATE', title='State'),
        alt.Tooltip('COUNT', title='Number of Executions')
    ]
).properties(
    height=100,
    width=450
).configure_view(
    stroke=None,
    continuousWidth=300
).configure_axis(
    labelFontSize=10
)

# Display the chart
st.subheader(f'Task Execution Frequency by State ({days} Days)')
st.altair_chart(chart)

# Optional: Display summary statistics
st.subheader("Summary Statistics")
summary_df = filtered_df.groupby('NAME').agg({
    'STATE': lambda x: pd.Series(x).value_counts().to_dict()
}).reset_index()

# Format the state counts as separate columns
state_counts = pd.json_normalize(summary_df['STATE']).fillna(0).astype(int)
summary_df = pd.concat([summary_df['NAME'], state_counts], axis=1)

st.dataframe(summary_df)

## Want to learn more?

- Snowflake Docs on [Account Usage](https://docs.snowflake.com/en/sql-reference/account-usage) and [TASK_HISTORY view](https://docs.snowflake.com/en/sql-reference/account-usage/task_history)
- More about [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake)
- For more inspiration on how to use Streamlit widgets in Notebooks, check out [Streamlit Docs](https://docs.streamlit.io/) and this list of what is currently supported inside [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake#label-notebooks-streamlit-support)
- Check out the [Altair User Guide](https://altair-viz.github.io/user_guide/data.html) for further information on customizing Altair charts