Our next dataset to explore will be the trading.prices
table which contains the daily price and volume data for the 2 cryptocurrency tickers: ETH
and BTC
(Ethereum and Bitcoin!)
Before we try to solve our next set of questions below - you can try viewing a few rows from the trading.prices
dataset:
Example Bitcoin price data:
SELECT * FROM trading.prices WHERE ticker = 'BTC' LIMIT 5;
ticker | market_date | price | open | high | low | volume | change |
---|---|---|---|---|---|---|---|
BTC | 2021-08-29 | 48255.0 | 48899.7 | 49621.7 | 48101.9 | 40.96K | -1.31% |
BTC | 2021-08-28 | 48897.1 | 49062.8 | 49289.4 | 48428.5 | 36.73K | -0.34% |
BTC | 2021-08-27 | 49064.3 | 46830.2 | 49142.0 | 46371.5 | 62.47K | 4.77% |
BTC | 2021-08-26 | 46831.6 | 48994.4 | 49347.8 | 46360.4 | 73.79K | -4.41% |
BTC | 2021-08-25 | 48994.5 | 47707.4 | 49230.2 | 47163.3 | 63.54K | 2.68% |
Example Ethereum price data:
SELECT * FROM trading.prices WHERE ticker = 'ETH' LIMIT 5;
ticker | market_date | price | open | high | low | volume | change |
---|---|---|---|---|---|---|---|
ETH | 2021-08-29 | 3177.84 | 3243.96 | 3282.21 | 3162.79 | 582.04K | -2.04% |
ETH | 2021-08-28 | 3243.90 | 3273.78 | 3284.58 | 3212.24 | 466.21K | -0.91% |
ETH | 2021-08-27 | 3273.58 | 3093.78 | 3279.93 | 3063.37 | 839.54K | 5.82% |
ETH | 2021-08-26 | 3093.54 | 3228.03 | 3249.62 | 3057.48 | 118.44K | -4.17% |
ETH | 2021-08-25 | 3228.15 | 3172.12 | 3247.43 | 3080.70 | 923.13K | 1.73% |
Column Name | Description |
---|---|
ticker | one of either BTC or ETH |
market_date | the date for each record |
price | closing price at end of day |
open | the opening price |
high | the highest price for that day |
low | the lowest price for that day |
volume | the total volume traded |
change | % change price in price |
Let's answer a few simple questions to help us better understand the trading.prices
table.
Remember to clear all previous SQL queries from SQLPad before running each new SQL query!
How many total records do we have in the
trading.prices
table?
Click here to reveal the solution!
SELECT
COUNT(*) AS total_records
FROM trading.prices;
total_records |
---|
3404 |
How many records are there per
ticker
value?
Click here to reveal the solution!
SELECT
ticker,
COUNT(*) AS record_count
FROM trading.prices
GROUP BY ticker;
ticker | record_count |
---|---|
BTC | 1702 |
ETH | 1702 |
What is the minimum and maximum
market_date
values?
Click here to reveal the solution!
SELECT
MIN(market_date) AS min_date,
MAX(market_date) AS max_date
FROM trading.prices;
min_date | max_date |
---|---|
2017-01-01 | 2021-08-29 |
Are there differences in the minimum and maximum
market_date
values for each ticker?
Click here to reveal the solution!
SELECT
ticker,
MIN(market_date) AS min_date,
MAX(market_date) AS max_date
FROM trading.prices
GROUP BY ticker;
ticker | min_date | max_date |
---|---|---|
BTC | 2017-01-01 | 2021-08-29 |
ETH | 2017-01-01 | 2021-08-29 |
What is the average of the
price
column for Bitcoin records during the year 2020?
Click here to reveal the solution!
SELECT
AVG(price)
FROM trading.prices
WHERE ticker = 'BTC'
AND market_date BETWEEN '2020-01-01' AND '2020-12-31';
avg |
---|
11111.631147540984 |
What is the monthly average of the
price
column for Ethereum in 2020? Sort the output in chronological order and also round the average price value to 2 decimal places
Click here to reveal the solution!
SELECT
DATE_TRUNC('MON', market_date) AS month_start,
-- need to cast approx. floats to exact numeric types for round!
ROUND(AVG(price)::NUMERIC, 2) AS average_eth_price
FROM trading.prices
WHERE EXTRACT(YEAR FROM market_date) = 2020
AND ticker = 'ETH'
GROUP BY month_start
ORDER BY month_start;
month_start | average_eth_price |
---|---|
2020-01-01 00:00:00+00 | 156.65 |
2020-02-01 00:00:00+00 | 238.76 |
2020-03-01 00:00:00+00 | 160.18 |
2020-04-01 00:00:00+00 | 171.29 |
2020-05-01 00:00:00+00 | 207.45 |
2020-06-01 00:00:00+00 | 235.92 |
2020-07-01 00:00:00+00 | 259.57 |
2020-08-01 00:00:00+00 | 401.73 |
2020-09-01 00:00:00+00 | 367.77 |
2020-10-01 00:00:00+00 | 375.79 |
2020-11-01 00:00:00+00 | 486.73 |
2020-12-01 00:00:00+00 | 622.35 |
Are there any duplicate
market_date
values for anyticker
value in our table?
As you inspect the output from the following SQL query - what is your final answer?
Click here to reveal the solution!
SELECT
ticker,
COUNT(market_date) AS total_count,
COUNT(DISTINCT market_date) AS unique_count
FROM trading.prices
GROUP BY ticker;
ticker | total_count | unique_count |
---|---|---|
BTC | 1702 | 1702 |
ETH | 1702 | 1702 |
How many days from the
trading.prices
table exist where thehigh
price of Bitcoin is over $30,000?
Click here to reveal the solution!
SELECT
COUNT(*) AS row_count
FROM trading.prices
WHERE ticker = 'BTC'
AND high > 30000;
row_count |
---|
240 |
How many "breakout" days were there in 2020 where the
price
column is greater than theopen
column for eachticker
?
Click here to reveal the solution!
SELECT
ticker,
SUM(CASE WHEN price > open THEN 1 ELSE 0 END) AS breakout_days
FROM trading.prices
WHERE DATE_TRUNC('YEAR', market_date) = '2020-01-01'
GROUP BY ticker;
ticker | breakout_days |
---|---|
BTC | 207 |
ETH | 200 |
How many "non_breakout" days were there in 2020 where the
price
column is less than theopen
column for eachticker
?
Click here to reveal the solution!
SELECT
ticker,
SUM(CASE WHEN price < open THEN 1 ELSE 0 END) AS non_breakout_days
FROM trading.prices
-- this another way to specify the year
WHERE market_date >= '2020-01-01' AND market_date <= '2020-12-31'
GROUP BY ticker;
ticker | non_breakout_days |
---|---|
BTC | 159 |
ETH | 166 |
What percentage of days in 2020 were breakout days vs non-breakout days? Round the percentages to 2 decimal places
Click here to reveal the solution!
SELECT
ticker,
ROUND(
SUM(CASE WHEN price > open THEN 1 ELSE 0 END)
/ COUNT(*)::NUMERIC,
2
) AS breakout_percentage,
ROUND(
SUM(CASE WHEN price < open THEN 1 ELSE 0 END)
/ COUNT(*)::NUMERIC,
2
) AS non_breakout_percentage
FROM trading.prices
WHERE market_date >= '2020-01-01' AND market_date <= '2020-12-31'
GROUP BY ticker;
ticker | breakout_percentage | non_breakout_percentage |
---|---|---|
BTC | 0.57 | 0.43 |
ETH | 0.55 | 0.45 |
Date Manipulations
We use a variety of date manipulations in questions 5, 6, 9 and 11 to filter the trading.prices
for 2020 values only.
These are all valid methods to qualify DATE
or TIMESTAMP
values within a range using a WHERE
filter:
market_date BETWEEN '2020-01-01' AND '2020-12-31'
EXTRACT(YEAR FROM market_date) = 2020
DATE_TRUNC('YEAR', market_date) = '2020-01-01'
market_date >= '2020-01-01' AND market_date <= '2020-12-31'
The only additional thing to note is that DATE_TRUNC
returns a TIMESTAMP
data type which can be cast back to a regular DATE
using the ::DATE
notation when used in a SELECT
query.
BETWEEN
Boundaries
An additional note for question 5 - the boundaries for the BETWEEN
clause must be earlier-date-first
AND later-date-second
See what happens when you reverse the order of the DATE
boundaries using the query below - does it match your expectation?
Click here to see the "wrong" code!
SELECT
AVG(price)
FROM trading.prices
WHERE ticker = 'BTC'
AND market_date BETWEEN '2020-12-31' AND '2020-01-01';
Rounding Floats/Doubles
In PostgreSQL - we cannot apply the ROUND
function directly to approximate FLOAT
or DOUBLE PRECISION
data types.
Instead we will need to cast any outputs from functions such as AVG
to an exact NUMERIC
data type before we can use it with other approximation functions such as ROUND
In question 6 - if we were to remove our ::NUMERIC
from our query - we would run into this error:
ERROR: function round(double precision, integer) does not exist
LINE 3: ROUND(AVG(price), 2) AS average_eth_price
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
You can try this yourself by running the below code snippet with the ::NUMERIC
removed:
Click here to see the "wrong" code!
SELECT
DATE_TRUNC('MON', market_date) AS month_start,
ROUND(AVG(price), 2) AS average_eth_price
FROM trading.prices
WHERE EXTRACT(YEAR FROM market_date) = 2020
GROUP BY month_start
ORDER BY month_start;
Integer Floor Division
In question 5 - when dividing values in SQL it is very important to consider the data types of the numerator (the number on top) and the denominator (the number on the bottom)
When there is an INTEGER
/ INTEGER
as there is in this case - SQL will default to FLOOR
division in this case!
You can try running the same query as the solution to question 5 above - but this time remove the 2 instances of ::NUMERIC
and the decimal place rounding to see what happens!
This is a super common error found in SQL queries and we usually recommend casting either the numerator or the denominator as a NUMERIC
type using the shorthand ::NUMERIC
syntax to ensure that you will avoid the dreaded integer floor division!
Click here to see the "wrong" code!
SELECT
ticker,
SUM(CASE WHEN price > open THEN 1 ELSE 0 END) / COUNT(*) AS breakout_percentage,
SUM(CASE WHEN price < open THEN 1 ELSE 0 END) / COUNT(*) AS non_breakout_percentage
FROM trading.prices
WHERE market_date >= '2019-01-01' AND market_date <= '2019-12-31'
GROUP BY ticker;