## Time Series Analytics with Pricing Data on Snowflake

This solution demonstrates several advanced time series features using FactSet Tick Data on Snowflake. You will learn to leverage powerful SQL functions such as TIME_SLICE, ASOF JOIN, and RANGE BETWEEN to gain deeper insights into time series trade data.

Import below packages which will be used in the demo:
- matplotlib=3.8.0
- seaborn=0.13.2


In [None]:
# Import python packages
import streamlit as st
import pandas as pd

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()

# Add a query tag to the session. This helps with debugging and performance monitoring.
session.query_tag = {"origin":"sf_sit", "name":"time_series_analysis", "version":{"major":1, "minor":0}}

## Preview Data
We will be using FactSet Tick History data in this notebook. The data includes access high quality tick data sourced from FactSet’s real-time consolidated feed. In this notebook, we will focus on trade data from META. 


In [None]:
SELECT TOP 100 * 
FROM tick_history.public.th_sf_mktplace
WHERE ticker='META' 
AND date =20221025
AND msg_type = 0 -- trade messages
AND security_type = 1; -- equity

We'll start by formatting the data and filtering for META. We will use this result for future queries. 

In [None]:
SELECT 
    TIMESTAMP_FROM_PARTS(
        SUBSTR(date, 0, 4), -- year
        SUBSTR(date, 5, 2), -- month
        SUBSTR(date, 7, 2), -- day 
        SUBSTR(LPAD(time, 9, 0), 0, 2), -- hour
        SUBSTR(LPAD(time, 9, 0), 3, 2), -- minute
        SUBSTR(LPAD(time, 9, 0), 5, 2), -- second
        RPAD(SUBSTR(LPAD(time, 9, 0), 7, 3), 9, 0) -- nanoseconds
    ) AS trade_timestamp,
    ticker,
    last_price,
    last_vol,
FROM tick_history.public.th_sf_mktplace
WHERE ticker = 'META'
AND msg_type=0
AND security_type = 1;

## Prevailing Price

Let's now find a trade at a particular time. We can use a basic less than in a where clause to get this. 

In [None]:
SELECT *
FROM {{meta_trades}}
WHERE trade_timestamp <= '2022-10-10 12:00:00'
ORDER BY trade_timestamp DESC
LIMIT 1

## TIME_SLICE
[TIME_SLICE](https://docs.snowflake.com/en/sql-reference/functions/time_slice) calculates the beginning or end of a “slice” of time, where the length of the slice is a multiple of a standard unit of time (minute, hour, day, etc.). This function can be used to calculate the start and end times of fixed-width “buckets” into which data can be categorized.

### Using TIME_SLICE
We will now use [TIME_SLICE](https://docs.snowflake.com/en/sql-reference/functions/time_slice) to get the average weekly trade price and total volume. Snowflake Notebooks allow you to [reference the results](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-develop-run#reference-cells-and-variables-in-sf-notebooks) of other cell queries using Jinja syntax.

In [None]:
SELECT 
    TIME_SLICE(trade_timestamp, 1, 'WEEK', 'START') AS week_starting,
    AVG(last_price) AS average_price,
    SUM(last_vol) AS total_volume
FROM {{meta_trades}}
WHERE ticker='META'
GROUP BY week_starting
ORDER BY week_starting;

### Plot Data
Finally, we can use Streamlit plots directly in our notebook to do a quick plot of average weekly price.

In [None]:
st.line_chart(weekly_data, x="WEEK_STARTING", y="AVERAGE_PRICE")

### Slice by Month

We can also slice by YEAR, QUARTER, MONTH, WEEK, DAY with TIME_SLICE. Let's find average montly price and total volume.

In [None]:
SELECT 
    TIME_SLICE(trade_timestamp, 1, 'MONTH', 'START') AS month_starting,
    AVG(last_price) AS average_price,
    SUM(last_vol) AS total_volume
FROM {{meta_trades}}
WHERE ticker='META'
GROUP BY month_starting
ORDER BY month_starting;

### Slice by Hour

Let's now slice by hour

In [None]:
SELECT 
    TIME_SLICE(trade_timestamp, 1, 'HOUR', 'START') AS hour_starting,
    AVG(last_price) AS average_price,
    SUM(last_vol) AS total_volume
FROM {{meta_trades}}
WHERE DATE(trade_timestamp) = '2022-09-19'
AND ticker='META'
GROUP BY hour_starting
ORDER BY hour_starting;

## Transaction Cost 

We will now determine transaction costs by joining Trades with the closest creceding price data. To accomplish this, we will us an [ASOF JOIN](https://docs.snowflake.com/en/sql-reference/constructs/asof-join) to join our trade data with closing price data, which we have stored in another table.

In [None]:
SELECT 
    TIMESTAMP_FROM_PARTS(
        SUBSTR(date, 0, 4), -- year
        SUBSTR(date, 5, 2), -- month
        SUBSTR(date, 7, 2), -- day 
        SUBSTR(LPAD(time, 9, 0), 0, 2), -- hour
        SUBSTR(LPAD(time, 9, 0), 3, 2), -- minute
        SUBSTR(LPAD(time, 9, 0), 5, 2), -- second
        RPAD(SUBSTR(LPAD(time, 9, 0), 7, 3), 9, 0) -- nanoseconds
    ) AS timestamp,
    ticker,
    closing_price
FROM raw.closing_prices
WHERE ticker = 'META';

In [None]:
SELECT
    t1.ticker,
    t1.trade_timestamp,
    t1.last_price AS trade_price,
    t2.closing_price,
    trade_price - t2.closing_price AS price_impact,
    t1.last_vol
FROM 
     {{meta_trades}} t1
ASOF JOIN 
     {{meta_closing_prices}} t2
MATCH_CONDITION 
    (t1.trade_timestamp <= t2.timestamp)
ON 
    t1.ticker = t2.ticker
ORDER BY 
    t1.ticker,
    t1.trade_timestamp;


## Plots for Transaction Cost Analysis

Now that we have joined trade and price data, let's calculate the price impact. Let's again use TIME_SLICE to get the average daily trade price and closing price, so we can plot. 

In [None]:
SELECT 
    TIME_SLICE(trade_timestamp, 1, 'DAY', 'START') AS trade_date,
    AVG(trade_price) AS trade_price,
    AVG(closing_price) AS closing_price,
    AVG(price_impact) AS price_impact,
    SUM(price_impact) AS cumulative_price_impact,
    SUM(last_vol) AS total_volume
FROM {{transaction_cost}}
GROUP BY trade_date
ORDER BY trade_date;

We can also reference SQL cells in Python within the same notebook. Let's convert the daily sampled data to pandas for plotting. 

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

df = transaction_cost_daily.to_pandas()

### Trade Prices vs. Market Prices Over Time

In [None]:
plt.figure(figsize=(14, 7))
sns.lineplot(x='TRADE_DATE', y='TRADE_PRICE', data=df, label='Trade Price', color='blue')
sns.lineplot(x='TRADE_DATE', y='CLOSING_PRICE', data=df, label='Market Price', color='red', linestyle='--')
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Trade Prices vs. Market Prices Over Time')
plt.legend()
plt.show()

### Price Impact of Trades

In [None]:
plt.figure(figsize=(14, 7))
sns.scatterplot(x='TRADE_DATE', y='PRICE_IMPACT', data=df, alpha=0.5, color='purple')
plt.axhline(0, color='black', linestyle='--')
plt.xlabel('Date')
plt.ylabel('Price Impact')
plt.title('Price Impact of Trades Over Time')
plt.show()

### Volume vs. Price Impact

In [None]:
plt.figure(figsize=(14, 7))
sns.scatterplot(x='TOTAL_VOLUME', y='PRICE_IMPACT', data=df, alpha=0.5, color='green')
plt.xlabel('Volume')
plt.ylabel('Price Impact')
plt.title('Trade Volume vs. Price Impact')
plt.show()

In [None]:
plt.figure(figsize=(14, 7))
sns.histplot(df['PRICE_IMPACT'], bins=50, color='orange', kde=True)
plt.xlabel('Price Impact')
plt.ylabel('Frequency')
plt.title('Distribution of Price Impacts')
plt.show()

In [None]:
st.line_chart(df.set_index('TRADE_DATE')['CUMULATIVE_PRICE_IMPACT'], use_container_width=True, color = ["#FF0000"])

## Range Between
A range-based [window frame](https://docs.snowflake.com/en/sql-reference/functions-analytic) consists of a logically computed set of rows rather than a physical number of rows as would be expressed in a row-based frame. Let's explore Range Between to create interesting time series metrics on our data.

### Rolling Average

Let's start by getting the 10 minute rolling average for META trades

In [None]:
SELECT
    trade_timestamp,
    last_price,
    ticker,
    AVG(last_price) OVER (
        PARTITION BY ticker
        ORDER BY trade_timestamp 
        RANGE BETWEEN INTERVAL '10 MINUTE' PRECEDING AND CURRENT ROW
    ) AS moving_avg
FROM {{meta_trades}}
WHERE DATE(trade_timestamp) = '2022-06-09'

### Volume Weighted Average
Let's now look at the volume weighted average price. We'll again use 10 minute intervals

In [None]:
SELECT 
    ticker,
    trade_timestamp,
    last_price,
    last_vol,
    SUM(last_price * last_vol) OVER (
        PARTITION BY ticker 
        ORDER BY trade_timestamp 
        RANGE BETWEEN INTERVAL '10 MINUTE' PRECEDING AND CURRENT ROW
    ) / 
    SUM(last_vol) OVER (
        PARTITION BY ticker 
        ORDER BY trade_timestamp 
        RANGE BETWEEN INTERVAL '10 MINUTE' PRECEDING AND CURRENT ROW
    ) AS volume_weighted_avg
FROM {{meta_trades}}
WHERE DATE(trade_timestamp) = '2022-06-09'

## Time Shifts

Finally, let's look at how we can use time shifts. We'll start by getting the previous trade price using [LAG](https://docs.snowflake.com/en/sql-reference/functions/lag). 

In [None]:
SELECT 
    ticker,
    trade_timestamp,
    last_price,
    LAG(last_price, 1) OVER (
        PARTITION BY ticker 
        ORDER BY trade_timestamp
    ) AS previous_price
FROM {{meta_trades}}
WHERE DATE(trade_timestamp) = '2022-06-09'

We'll now use [LEAD](https://docs.snowflake.com/en/sql-reference/functions/lead) to get the next immediate trade. 

In [None]:
SELECT 
    ticker,
    trade_timestamp,
    last_price,
    LEAD(last_price, 1) OVER (
        PARTITION BY ticker 
        ORDER BY trade_timestamp
    ) AS next_price
FROM {{meta_trades}}
WHERE DATE(trade_timestamp) = '2022-06-09'