# Query Cost Monitoring

A notebook that breaks down compute costs by individual query, allowing teams to identify high-cost operations.

Here's our 4 step process:
1. SQL query to retrieve query cost data
2. Convert SQL table to a Pandas DataFrame
3. Data preparation and filtering (using user input from Streamlit widgets)
4. Data visualization and exploration

## 1. Retrieve Data

To gain insights on query costs, we'll write a SQL query to retrieve the `credits_used` data from the `snowflake.account_usage.metering_history` table and merging this with associated user, database, schema and warehouse information from the `snowflake.account_usage.query_history` table.


In [None]:
SELECT
  query_history.query_id,
  query_history.query_text,
  query_history.start_time,
  query_history.end_time,
  query_history.user_name,
  query_history.database_name,
  query_history.schema_name,
  query_history.warehouse_name,
  query_history.warehouse_size,
  metering_history.credits_used,
  execution_time/1000 as execution_time_s,
FROM
  snowflake.account_usage.query_history
  JOIN snowflake.account_usage.metering_history ON query_history.start_time >= metering_history.start_time
  AND query_history.end_time <= metering_history.end_time
WHERE
  query_history.start_time >= DATEADD (DAY, -7, CURRENT_TIMESTAMP())
ORDER BY
  query_history.query_id;

## 2. Convert Table to a DataFrame

Next, we'll convert the table to a Pandas DataFrame.


In [None]:
sql_data.to_pandas()

## 3. Create an Interactive Slider Widget & Data Preparation

Here, we'll create an interactive slider for dynamically selecting the number of days to analyze. This would then trigger the filtering of the DataFrame to the specified number of days.

Next, we'll reshape the data by calculating the frequency count by hour and task name, which will subsequently be used for creating the heatmap in the next step.


In [None]:
import pandas as pd
import streamlit as st
import altair as alt

# Get data
df = py_dataframe.copy()

# Create date filter slider
st.subheader("Select time duration")

col = st.columns(3)

with col[0]:
    days = st.slider('Select number of days to analyze', 
                     min_value=1, 
                     max_value=7, 
                     value=7, 
                     step=1)
with col[1]:
    var = st.selectbox("Select a variable", ['WAREHOUSE_NAME', 'USER_NAME', 'WAREHOUSE_SIZE'])
with col[2]:
    metric = st.selectbox("Select a metric", ["COUNT", "TOTAL_CREDITS_USED"])

# Filter data according to day duration
df['START_TIME'] = pd.to_datetime(df['START_TIME'])
latest_date = df['START_TIME'].max()
cutoff_date = latest_date - pd.Timedelta(days=days)
filtered_df = df[df['START_TIME'] > cutoff_date].copy()
    
# Prepare data for heatmap
filtered_df['HOUR_OF_DAY'] = filtered_df['START_TIME'].dt.hour
filtered_df['HOUR_DISPLAY'] = filtered_df['HOUR_OF_DAY'].apply(lambda x: f"{x:02d}:00")
    
# Calculate frequency count by hour and query
#agg_df = filtered_df.groupby(['QUERY_ID', 'HOUR_DISPLAY', var]).size().reset_index(name='COUNT')

# Calculate frequency count and sum of credits by hour and query
agg_df = (filtered_df.groupby(['QUERY_ID', 'HOUR_DISPLAY', var])
          .agg(
              COUNT=('QUERY_ID', 'size'),
              TOTAL_CREDITS_USED=('CREDITS_USED', 'sum')
          )
          .reset_index()
)

st.warning(f"Analyzing {var} data for the last {days} days!")



## Initialize the button state in session state
if 'expanded_btn' not in st.session_state:
    st.session_state.expanded_btn = False

## Callback function to toggle the state
def toggle_expand():
    st.session_state.expanded_btn = not st.session_state.expanded_btn

## Create button with callback
st.button(
    '⊕ Expand DataFrames' if not st.session_state.expanded_btn else '⊖ Collapse DataFrames',
    on_click=toggle_expand,
    type='secondary' if st.session_state.expanded_btn else 'primary'
)

## State conditional
if st.session_state.expanded_btn:
    expand_value = True
else:
    expand_value = False

with st.expander("See Filtered DataFrame", expanded=expand_value):
    st.dataframe(filtered_df.head(100))
with st.expander("See Heatmap DataFrame", expanded=expand_value):
    st.dataframe(agg_df)


## 4. Create a Heatmap for Visualizing Query Cost

Finally, a heatmap, and stacked bar chart, and bubble chart are generated that will allow us to gain insights on query cost and frequency.

In [None]:
## Heatmap
heatmap = alt.Chart(agg_df).mark_rect(stroke='black',strokeWidth=1).encode(
    x='HOUR_DISPLAY:O',
    #y='WAREHOUSE_NAME:N',
    y=alt.Y(f'{var}:N', 
            title='',
            axis=alt.Axis(
                labels=True,
                labelLimit=250,
                tickMinStep=1,
                labelOverlap=False,
                labelPadding=10
            )),
    color=f'{metric}:Q',
    tooltip=['HOUR_DISPLAY', var, metric]
).properties(
    title=f'Query Activity Heatmap by Hour and {var}'
)

st.altair_chart(heatmap, use_container_width=True)

In [None]:
## Stacked bar chart with time series
bar_time = alt.Chart(agg_df).mark_bar().encode(
    x='HOUR_DISPLAY:O',
    y=f'{metric}:Q',
    color=alt.Color(f'{var}:N', legend=alt.Legend(orient='bottom')),
    tooltip=['HOUR_DISPLAY', var, metric]
).properties(
    title=f'Query Activity by Hour and {var}',
    height=400
)

st.altair_chart(bar_time, use_container_width=True)


In [None]:
## Bubble plot with size representing the metric
bubble = alt.Chart(agg_df).mark_circle().encode(
    x='HOUR_DISPLAY:O',
    y=alt.Y(f'{var}:N', title=''),
    size=alt.Size(f'{metric}:Q', legend=alt.Legend(title='Query Count')),
    color=alt.Color(f'{var}:N', legend=None),
    tooltip=['HOUR_DISPLAY', var, metric]
).properties(
    title=f'Query Distribution by Hour and {var}',
    height=550
)

st.altair_chart(bubble, use_container_width=True)

## Want to learn more?

- Snowflake Docs on [Account Usage](https://docs.snowflake.com/en/sql-reference/account-usage), [METERING_HISTORY view](https://docs.snowflake.com/en/sql-reference/account-usage/task_history) and [QUERY_HISTORY](https://docs.snowflake.com/en/sql-reference/account-usage/query_history)
- More about [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake)
- For more inspiration on how to use Streamlit widgets in Notebooks, check out [Streamlit Docs](https://docs.streamlit.io/) and this list of what is currently supported inside [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake#label-notebooks-streamlit-support)
- Check out the [Altair User Guide](https://altair-viz.github.io/user_guide/data.html) for further information on customizing Altair charts
