<a href="https://colab.research.google.com/github/antonum/Timescale-Workshops/blob/main/Tutorials/query-bitcoin-data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analyzing Bitcoin Data with TimescaleDB

In this notebook, we'll explore Bitcoin price and transaction data using TimescaleDB's specialized time-series capabilities. We'll look at various aspects including:

- Basic price analysis and statistics
- Time-weighted averages and continuous aggregates
- Analyzing specific time periods and trends
- Comparing Bitcoin with traditional assets

This notebook installs a disposable copy of Timescale Community in the notebook environment. When you close the notebook, all data will be lost.

# Setup Timescale Connection

By default, this notebook installs Timescale right within the colab runtime with endpoint `"postgres://postgres:password@localhost/postgres"`. You can optionally use your own Timescale cloud instance endpoint.

Try Timescale Cloud for free at: https://console.cloud.timescale.com/signup

In [2]:
import os
### Default connection for in-notebook Timescale ###
TS_CONNECTION="postgres://postgres:password@localhost/postgres"

### Use environment variable ###
#TS_CONNECTION = os.getenv("TS_CONNECTION", "postgres://postgres:password@localhost/postgres")

### Use your own Timescale Cloud instance ###
#TS_CONNECTION="postgres://tsdbadmin:xxxxxxx.yyyyy.tsdb.cloud.timescale.com:39966/tsdb?sslmode=require"

### Use colab secret ###
#from google.colab import userdata
#TS_CONNECTION=userdata.get('TS_CONNECTION')

### Set environment variable to be used in psql CLI ###
os.environ["TS_CONNECTION"]=TS_CONNECTION

In [None]:
#@title Install Timescale
%%bash
set -e # Exit immediately if a command exits with a non-zero status.

# --- Configuration ---
PG_VERSION="17"
PGVECTORSCALE_VERSION="0.7.0"
PG_PASSWORD="password" # Consider using a more secure password

echo "--- 1. Installing Prerequisites & Adding Repositories ---"
# Install essential packages quietly
apt-get -qq -y install gnupg postgresql-common apt-transport-https lsb-release wget > /dev/null 2>&1

# Add the official PostgreSQL repository
# The 'yes |' answers confirmation prompts automatically. Output redirected.
yes | /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh > /dev/null 2>&1

# Add the TimescaleDB repository
echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list > /dev/null
# Add the TimescaleDB GPG key using the recommended method (avoids apt-key add)
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/timescaledb.gpg

echo "--- 2. Updating Package List & Installing PostgreSQL + Extensions ---"
# Update package list quietly (should suppress apt-key warnings too)
apt-get -qq update > /dev/null 2>&1

# Install PostgreSQL, TimescaleDB, pgvector, toolkit, and client
apt-get -qq -y install \
  "timescaledb-2-postgresql-${PG_VERSION}" \
  "postgresql-client-${PG_VERSION}" \
  "postgresql-${PG_VERSION}-pgvector" \
  "timescaledb-toolkit-postgresql-${PG_VERSION}" > /dev/null 2>&1

echo "--- 3. Installing pgvectorscale ---"
# Download and install pgvectorscale
wget --quiet "https://github.com/timescale/pgvectorscale/releases/download/${PGVECTORSCALE_VERSION}/pgvectorscale-${PGVECTORSCALE_VERSION}-pg${PG_VERSION}-amd64.zip" -O pgvectorscale.zip
unzip -q pgvectorscale.zip # Use -q for quiet unzip
# Install the .deb package quietly
apt-get -qq -y install "./pgvectorscale-postgresql-${PG_VERSION}_${PGVECTORSCALE_VERSION}-Linux_amd64.deb" > /dev/null 2>&1

# Clean up downloaded files
rm pgvectorscale.zip "./pgvectorscale-postgresql-${PG_VERSION}_${PGVECTORSCALE_VERSION}-Linux_amd64.deb"

echo "--- 4. Configuring PostgreSQL & TimescaleDB ---"
# Tune PostgreSQL for TimescaleDB
timescaledb-tune --quiet --yes  > /dev/null 2>&1

# Restart PostgreSQL service to apply changes
service postgresql restart
sleep 2 # Give the service a moment to restart fully

echo "--- 5. Setting Up Database User and Extensions ---"
# Set the password for the default postgres user
sudo -u postgres psql -c "ALTER USER postgres PASSWORD '${PG_PASSWORD}'" > /dev/null

# Connect as the postgres user and create extensions quietly
psql -d "postgres://postgres:${PG_PASSWORD}@localhost/postgres" > /dev/null <<EOF
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE EXTENSION IF NOT EXISTS timescaledb_toolkit CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
EOF

echo "--- Installation and Setup Complete ---"



In [None]:
# Optional: Verify extensions are installed
#!psql -d $TS_CONNECTION -c '\dx'

In [None]:
#@title Init psycopg2 connection to Timescale
import pandas as pd
import psycopg2

# establish connection to Timescale
conn = psycopg2.connect(TS_CONNECTION)
cursor = conn.cursor()

# helper function to convert SQL Results to the dataframe
def execute_sql(query, cursor=cursor):
    try:
        cursor.execute(query)
        conn.commit()
        # Check if query returns data (SELECT)
        if cursor.description:  # If description is not None, query returned data
            columns = [desc[0] for desc in cursor.description]
            data = cursor.fetchall()
            df = pd.DataFrame(data, columns=columns)
            return df
        else:
            # Query was likely INSERT, CREATE TABLE, UPDATE, DELETE, etc.
            return f"Rows affected: {cursor.rowcount}"  # Return the number of rows affected

    except psycopg2.Error as e:
        print(f"Error executing SQL query: {e}")
        conn.rollback()  # Rollback changes in case of error
        return None  # Or raise the exception if you prefer

# Downloading and Loading Bitcoin Data

We'll create tables for Bitcoin price data and load sample data for analysis. Our primary table will be called `bitcoin_price` and will contain historical Bitcoin price information.

In [None]:
# Create tables and load sample data
query = """
CREATE TABLE IF NOT EXISTS bitcoin_price (
    time TIMESTAMPTZ NOT NULL,
    opening_price DOUBLE PRECISION,
    highest_price DOUBLE PRECISION,
    lowest_price DOUBLE PRECISION,
    closing_price DOUBLE PRECISION,
    volume_btc DOUBLE PRECISION,
    volume_usd DOUBLE PRECISION
);

-- Convert the table to a hypertable to enable TimescaleDB features
SELECT create_hypertable('bitcoin_price', 'time', if_not_exists => TRUE);
"""
execute_sql(query)

## Load Sample Data

Now we'll download and insert sample Bitcoin price data for our analysis.

In [None]:
# Download Bitcoin price data
!wget -q https://raw.githubusercontent.com/timescale/examples/master/bitcoin/data/bitcoin-small.csv

# Load the data into the bitcoin_price table
!psql -d $TS_CONNECTION -c "COPY bitcoin_price FROM '/content/bitcoin-small.csv' DELIMITER ',' CSV HEADER;"

# Basic Queries with Bitcoin Data

Let's start by exploring our Bitcoin dataset with some simple queries.

In [None]:
query = """
-- Get the first and last timestamp in the database
SELECT
  MIN(time) AS first_record,
  MAX(time) AS last_record
FROM bitcoin_price;
"""
execute_sql(query)

## Price Analysis

Let's examine the price trends and basic statistics of Bitcoin.

In [None]:
query = """
-- Basic price statistics
SELECT
  MIN(lowest_price) AS all_time_low,
  MAX(highest_price) AS all_time_high,
  AVG(closing_price) AS average_price
FROM bitcoin_price;
"""
execute_sql(query)

In [None]:
query = """
-- What are the 10 days with the highest trading volume (in USD)?
SELECT
  time,
  opening_price,
  closing_price,
  volume_usd
FROM bitcoin_price
ORDER BY volume_usd DESC
LIMIT 10;
"""
execute_sql(query)

# Time-Weighted Average Price (TWAP)

TWAP is commonly used in financial analysis to better represent average prices over time periods.

In [None]:
query = """
-- Time-weighted price averages by month
SELECT
  time_bucket('30 days', time) AS period,
  AVG(opening_price) AS avg_opening,
  AVG(closing_price) AS avg_closing,
  AVG(highest_price) AS avg_high,
  AVG(lowest_price) AS avg_low
FROM bitcoin_price
GROUP BY period
ORDER BY period;
"""
execute_sql(query)

# Volatility Analysis

Bitcoin is known for its price volatility. Let's analyze daily price changes.

In [None]:
query = """
-- Calculate daily price volatility (high-low range)
SELECT
  time,
  highest_price,
  lowest_price,
  highest_price - lowest_price AS price_range,
  ((highest_price - lowest_price) / lowest_price) * 100 AS volatility_percent
FROM bitcoin_price
ORDER BY volatility_percent DESC
LIMIT 10;
"""
execute_sql(query)

# Monthly Trends and Patterns

Let's look at how Bitcoin performs across different months.

In [None]:
query = """
-- Monthly average prices and volume
SELECT
  DATE_TRUNC('month', time) AS month,
  AVG(closing_price) AS avg_price,
  SUM(volume_btc) AS total_volume_btc,
  SUM(volume_usd) AS total_volume_usd
FROM bitcoin_price
GROUP BY month
ORDER BY month;
"""
execute_sql(query)

# Moving Averages

Moving averages are important technical indicators in price analysis.

In [None]:
query = """
-- 7-day and 30-day moving averages for closing price
SELECT
  time,
  closing_price,
  AVG(closing_price) OVER(ORDER BY time ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7day,
  AVG(closing_price) OVER(ORDER BY time ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS ma_30day
FROM bitcoin_price
ORDER BY time DESC
LIMIT 30;
"""
execute_sql(query)

# Year-over-Year Analysis

Compare Bitcoin's performance across different years.

In [None]:
query = """
-- Year-over-year comparison
SELECT
  EXTRACT(YEAR FROM time) AS year,
  MIN(lowest_price) AS yearly_low,
  MAX(highest_price) AS yearly_high,
  MAX(highest_price) - MIN(lowest_price) AS price_range,
  AVG(closing_price) AS avg_price
FROM bitcoin_price
GROUP BY year
ORDER BY year;
"""
execute_sql(query)

# Continuous Aggregates with TimescaleDB

Continuous aggregates are a powerful feature of TimescaleDB that allow for efficient querying of aggregated time-series data.

In [None]:
query = """
-- Create a continuous aggregate view for daily Bitcoin stats
CREATE MATERIALIZED VIEW IF NOT EXISTS bitcoin_daily_stats
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 day', time) AS bucket,
  AVG(opening_price) AS avg_opening,
  AVG(closing_price) AS avg_closing,
  MAX(highest_price) AS max_price,
  MIN(lowest_price) AS min_price,
  SUM(volume_btc) AS total_volume_btc,
  SUM(volume_usd) AS total_volume_usd
FROM bitcoin_price
GROUP BY bucket;
"""
execute_sql(query)

In [None]:
query = """
-- Query the continuous aggregate view
SELECT *
FROM bitcoin_daily_stats
ORDER BY bucket DESC
LIMIT 10;
"""
execute_sql(query)

# Identifying Significant Price Movements

Let's find days with large price movements, which could indicate significant market events.

In [None]:
query = """
-- Days with more than 10% price change
SELECT
  time,
  opening_price,
  closing_price,
  (closing_price - opening_price) AS price_change,
  ((closing_price - opening_price) / opening_price) * 100 AS price_change_percent
FROM bitcoin_price
WHERE ABS((closing_price - opening_price) / opening_price) * 100 > 10
ORDER BY ABS(price_change_percent) DESC;
"""
execute_sql(query)

# Visualizing Bitcoin Data

Let's prepare some data for visualization of Bitcoin price trends.

In [None]:
# Fetch monthly data for visualization
query = """
SELECT
  time_bucket('1 month', time) AS month,
  AVG(closing_price) AS avg_price
FROM bitcoin_price
GROUP BY month
ORDER BY month;
"""
monthly_data = execute_sql(query)

# You can now use this data with visualization libraries like matplotlib or plotly
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(monthly_data['month'], monthly_data['avg_price'])
plt.title('Bitcoin Average Monthly Price')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.grid(True)
plt.show()

## Summary

In this notebook, we've explored Bitcoin price data using TimescaleDB's powerful time-series capabilities. We've analyzed price trends, volatility, and trading volumes across different time periods.

Features demonstrated include:
- Basic time-series queries and aggregations
- Time bucketing with `time_bucket()`
- Continuous aggregates
- Moving averages and volatility calculations

TimescaleDB makes these types of analyses efficient and straightforward, even with large volumes of time-series data.

## Basic test with psql

In [7]:
!psql -d $TS_CONNECTION -c "SELECT 'Hello World!' AS greeting;"

   greeting   
--------------
 Hello World!
(1 row)

