# Schema Change Tracker

This utility notebook helps to log and track schema changes (e.g., dropped columns) across databases for better governance.

Here's our 4 step process:
1. SQL query to retrieve data
2. Convert SQL table to a Pandas DataFrame
3. Data preparation and filtering (using user input from Streamlit widgets)
4. Data visualization and exploration

## 1. Retrieve Data

To gain insights on query costs, we'll write a SQL query to retrieve data on *dropped columns* from the `snowflake.account_usage.columns` table.


In [None]:
-- Track dropped columns
SELECT
  COLUMN_ID,
  COLUMN_NAME,
  TABLE_ID,
  TABLE_NAME,
  TABLE_SCHEMA_ID,
  TABLE_SCHEMA,
  TABLE_CATALOG_ID,
  TABLE_CATALOG,
  DATA_TYPE,
  CHARACTER_MAXIMUM_LENGTH,
  DELETED
FROM 
  SNOWFLAKE.ACCOUNT_USAGE.COLUMNS
WHERE 
  DELETED >= DATEADD(days, -90, CURRENT_DATE())

In [None]:
-- Track dropped tables
SELECT
  id as table_id,
  table_name,
  table_created,
  table_dropped,
  
  table_schema_id,
  table_schema,
  schema_created,
  schema_dropped,
  
  table_catalog_id,
  table_catalog,
  catalog_created,
  catalog_dropped
FROM
  SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS
WHERE
  table_dropped >= DATEADD(days, -90, CURRENT_DATE())

In [None]:
-- Track dropped databases
SELECT
  database_id,
  database_name,
  database_owner,
  created,
  last_altered,
  deleted
FROM
  SNOWFLAKE.ACCOUNT_USAGE.DATABASES
WHERE
  deleted >= DATEADD(days, -90, CURRENT_DATE())

## 2. Convert Table to a DataFrame

Next, we'll convert the tables to a Pandas DataFrame.


In [None]:
sql_columns.to_pandas()

In [None]:
sql_tables.to_pandas()

In [None]:
sql_databases.to_pandas()

## 3. Create an Interactive Widget & Data Preparation

Here, we'll create an interactive widget for dynamically selecting the entity of interest (e.g. Column, Table, Schema, Catalog or Database). This would then trigger the filtering of the DataFrame accordingly.

### 3.1. Create Interactive Widget
Next, we'll reshape the data by calculating the frequency count by hour and task name, which will subsequently be used for creating the heatmap in the next step.


In [None]:
import streamlit as st

st.header("Schema Change Tracker")
snowflake_option = st.selectbox("Select an option", ("Column", 
                                                     "Table", 
                                                     "Schema", 
                                                     "Catalog", 
                                                     "Database"))
if snowflake_option == "Column":
    df = py_columns.copy()
    date_deleted = "DELETED"
    col_name = "COLUMN_NAME"
if snowflake_option == "Table":
    df = py_tables.copy()
    date_deleted = "TABLE_DROPPED"
    col_name = "TABLE_NAME"
if snowflake_option == "Schema":
    df = py_tables.copy()
    date_deleted = "SCHEMA_DROPPED"
    col_name = "SCHEMA_NAME"
if snowflake_option == "Catalog":
    df = py_tables.copy()
    date_deleted = "CATALOG_DROPPED"
    col_name = "CATALOG_NAME"
if snowflake_option == "Database":
    df = py_databases.copy()
    date_deleted = "DELETED"
    col_name = "DATABASE_NAME"

st.write(f"You selected: `{snowflake_option}`")
st.dataframe(df)

### 3.2. Data Filtering

Here, we'll filter the DataFrame by defining the `start_date` variable, add the `WEEK` column to the DataFrame and reshape the data by applying the `groupby()` method to the DataFrame so that the data is now aggregated by `WEEK` and `col_name` (e.g. `COLUMN_NAME`, `TABLE_NAME`, `SCHEMA_NAME`, `CATALOG_NAME`, `DATABASE_NAME`).

In [None]:
# Data filtering
import pandas as pd

# Get the minimum date from date column
start_date = pd.to_datetime(df[date_deleted]).min()

# Create week numbers for x-axis
df['WEEK'] = pd.to_datetime(df[date_deleted]).dt.isocalendar().week

# Create aggregation for heatmap
agg_df = df.groupby(['WEEK', col_name]).size().reset_index(name='COUNT')
agg_df

Next, we'll define what the Week numbers correspond to. Particularly, the date range for each week.

In [None]:
# Week legend
import pandas as pd
from datetime import datetime

# Get unique weeks
weeks = sorted(df['WEEK'].unique())

# Create week ranges
for week in weeks:
    monday = datetime.strptime(f'2024-W{week:02d}-1', '%Y-W%W-%w')
    print(f"Week {week}: {monday.strftime('%b %d')} - {(monday + pd.Timedelta(days=6)).strftime('%b %d')}")

## Creation of the Heatmap

Here, we're visualizing the data as a heatmap.

In [None]:
# Create the heatmap
import pandas as pd
import altair as alt
import numpy as np


heatmap = alt.Chart(agg_df).mark_rect(stroke='black', strokeWidth=1).encode(
    x=alt.X('WEEK:O', 
            title='Week Number',
            axis=alt.Axis(
                labelAngle=0,
                labelOverlap=False
            )),
    y=alt.Y(f'{col_name}:N', 
            title='',
            axis=alt.Axis(
                labels=True,
                labelLimit=250,
                tickMinStep=1,
                labelOverlap=False,
                labelPadding=10
            )),
    color=alt.Color('COUNT:Q',
                    title=f'Number of {snowflake_option}'),
    tooltip=['WEEK', col_name, 'COUNT']
).properties(
    title=f'{snowflake_option} Usage Heatmap by Week and Table (Starting from {start_date.strftime("%Y-%m-%d")})',
    width=800,
    height=df[col_name].nunique()*20 # Multiply the number of unique values by 15 
)

st.altair_chart(heatmap, use_container_width=True)

## Want to learn more?

- Snowflake Docs on [Account Usage](https://docs.snowflake.com/en/sql-reference/account-usage), [COLUMNS view](https://docs.snowflake.com/en/sql-reference/account-usage/columns), [TABLES view](https://docs.snowflake.com/en/sql-reference/account-usage/tables) and [DATABASES view](https://docs.snowflake.com/en/sql-reference/account-usage/databases)
- More about [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake)
- For more inspiration on how to use Streamlit widgets in Notebooks, check out [Streamlit Docs](https://docs.streamlit.io/) and this list of what is currently supported inside [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake#label-notebooks-streamlit-support)
- Check out the [Altair User Guide](https://altair-viz.github.io/user_guide/data.html) for further information on customizing Altair charts
