# **Bitcoin price prediction - Data crawling**
## Big Data Computing final project - A.Y. 2022 - 2023
Prof. Gabriele Tolomei

MSc in Computer Science

La Sapienza, University of Rome

### Author: Corsi Danilo (1742375) - corsi.1742375@studenti.uniroma1.it



---


Description: Bitcoin data retrieval via APIs call.

# Global constants, dependencies, libraries and tools

In [3]:
# Main constants
LOCAL_RUNNING = True
ROOT_DIR = "D:/Documents/Repository/BDC/project" if LOCAL_RUNNING else "/content/drive"

In [4]:
if not LOCAL_RUNNING:
    # Point Colaboratory to Google Drive
    from google.colab import drive

    # Define GDrive paths
    drive.mount(ROOT_DIR, force_remount=True)

## Import my utilities

In [5]:
# Set main dir
MAIN_DIR = ROOT_DIR + "" if LOCAL_RUNNING else ROOT_DIR + "/MyDrive/BDC/project"

###################
# --- DATASET --- #
###################

# Datasets dir
DATASET_RAW_DIR = MAIN_DIR + "/datasets/raw"

# Datasets name
DATASET_NAME = "bitcoin_blockchain_data_15min"

# Datasets path
DATASET_RAW = DATASET_RAW_DIR + "/" + DATASET_NAME + ".parquet"

In [6]:
# Useful imports
import pandas as pd
import functools
import plotly.io as pio
import time
from datetime import datetime, timedelta

# Suppression of warnings for better reading
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

pio.renderers.default = 'vscode+colab' # To correctly render plotly plots

# Metrics and parameters
I collected Bitcoin blockchain data using the API of the [Blockchain.com](https://www.blockchain.com/) website and price information from two popular exchanges, [Binance](https://www.binance.com) and [Kraken](https://www.kraken.com/). 
I decided to organize them in 15-minute time-frame and I’ve retrieved the most relevant data from the last four years to current days, a period for which there were moments of high volatility but also some price lateralization.

The features taken under consideration were divided into several categories, from those that describe the price characteristics to those that go into more detail about Bitcoin's blockchain:
   - **Currency Statistics**
      - `ohlcv:` stands for “Open, High, Low, Close and Volume” and it's a list of the five types of data that are most common in financial analysis regarding price.
      - `market-price:` the average USD market price across major bitcoin exchanges.
      - `trade-volume-usd:` the total USD value of trading volume on major bitcoin exchanges.
      - `total-bitcoins:` the total number of mined bitcoin that are currently circulating on the network.
      - `market-cap:` the total USD value of bitcoin in circulation.

   - **Block Details**
      - `blocks-size:` the total size of the blockchain minus database indexes in megabytes.
      - `avg-block-size:` the average block size over the past 24 hours in megabytes.
      - `n-transactions-total:` the total number of transactions on the blockchain.
      - `n-transactions-per-block:` the average number of transactions per block over the past 24 hours.

   - **Mining Information**
      - `hash-rate:` the estimated number of terahashes per second the bitcoin network is performing in the last 24 hours.
      - `difficulty:` a relative measure of how difficult it is to mine a new block for the blockchain.
      - `miners-revenue:` total value of coinbase block rewards and transaction fees paid to miners.
      - `transaction-fees-usd:` the total USD value of all transaction fees paid to miners. This does not include coinbase block rewards.

   - **Network Activity**
      - `n-unique-addresses:` the total number of unique addresses used on the blockchain.
      - `n-transactions:` the total number of confirmed transactions per day.
      - `estimated-transaction-volume-usd:` the total estimated value in USD of transactions on the blockchain.

   <img src="https://github.com/CorsiDanilo/bitcoin-price-prediction-with-pyspark/blob/main/notebooks/images/features_group.png?raw=1">

In [5]:
timespan = "4years" # Duration of the data
# end_date = datetime.today() # Get current date (ending date), use this to get a dataset updated to today's date
# Set static date (ending date)
date_string = '2023-11-13'
end_date = datetime.strptime(date_string, '%Y-%m-%d')
start_date = (end_date - timedelta(days=365*4)) # Get the starting date

# Metrics considered
metrics = [
          # Currency Statistics
          "market-price",
          "trade-volume",

          # Block Details
          "blocks-size",
          "avg-block-size",
          "n-transactions-total",
          "n-transactions-per-block",

          # Mining Information
          "hash-rate",
          "difficulty",
          "miners-revenue",
          "transaction-fees-usd",

          # Network Activity
          "n-unique-addresses",
          "n-transactions",
          "estimated-transaction-volume-usd"
]

# Data crawling

In [6]:
# Install ccxt trading library that provides a way to connect and trade with various cryptocurrency exchanges and payment processing services worldwide
!pip3 install ccxt
import ccxt

Collecting ccxt
  Downloading ccxt-4.1.79-py2.py3-none-any.whl (4.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
Collecting aiodns>=1.1.1 (from ccxt)
  Downloading aiodns-3.1.1-py3-none-any.whl (5.4 kB)
Collecting pycares>=4.0.0 (from aiodns>=1.1.1->ccxt)
  Downloading pycares-4.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m288.7/288.7 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pycares, aiodns, ccxt
Successfully installed aiodns-3.1.1 ccxt-4.1.79 pycares-4.4.0


In [7]:
# Create an array of dates in such a way as to contact the API in one-year increments
date_array = []

# Calculate the number of days between the start and end dates
num_days = (end_date - start_date).days

# Loop through the dates and add them to the array
for i in range(num_days + 1):
    current_date = start_date + timedelta(days=i)
    if i % 360 == 0:
        date_array.append(current_date)

# Append end_date
date_array.append(end_date)
date_array

[datetime.datetime(2019, 11, 14, 0, 0),
 datetime.datetime(2020, 11, 8, 0, 0),
 datetime.datetime(2021, 11, 3, 0, 0),
 datetime.datetime(2022, 10, 29, 0, 0),
 datetime.datetime(2023, 10, 24, 0, 0),
 datetime.datetime(2023, 11, 13, 0, 0)]

In [8]:
def ohlcv_crawler(exchange_to_use, start, end):
    exchange = exchange_to_use  # Connect to the exchange exchange
    market = 'BTC/USD'  # Bitcoin market
    exchange.enableRateLimit = False

    # Convert dates to milliseconds
    since = exchange.parse8601(start + 'T00:00:00Z')
    till = exchange.parse8601(end + 'T00:00:00Z')

    # Fetch OHLCV data
    ohlcv = exchange.fetch_ohlcv(market, '1d', since, till)

    # Convert to DataFrame
    dataset = pd.DataFrame(ohlcv, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    dataset['timestamp'] = pd.to_datetime(dataset['timestamp'], unit='ms')

    return dataset

In [9]:
# Fetch OHLCV data
exchange_to_use = ccxt.binanceus()

df0 = pd.DataFrame()
j = 1
for i in range(3):
  df0 = df0.append(ohlcv_crawler(exchange_to_use, date_array[i].strftime('%Y-%m-%d'), date_array[j].strftime('%Y-%m-%d')), ignore_index=True)
  time.sleep(5)
  j += 1
df0

Unnamed: 0,timestamp,open,high,low,close,volume
0,2019-11-14,8773.08,8784.51,8566.84,8637.10,126.897068
1,2019-11-15,8632.95,8766.00,8388.00,8459.07,221.746205
2,2019-11-16,8463.79,8528.44,8430.77,8484.07,50.607771
3,2019-11-17,8485.83,8627.56,8373.96,8508.26,89.357152
4,2019-11-18,8495.72,8501.58,8043.00,8175.14,173.564990
...,...,...,...,...,...,...
2593,2023-07-10,27575.17,28100.00,27450.00,27707.00,84.414950
2594,2023-07-11,27707.00,27797.05,26900.00,26951.45,78.992070
2595,2023-07-12,26911.15,27500.01,26777.00,26777.00,83.336960
2596,2023-07-13,26799.97,27400.00,25450.00,25937.06,129.776650


In [10]:
# Check duplicated rows
len(df0['timestamp'].unique())

1339

In [11]:
# Drop the duplicates in column "timestamp", keep the last value
df0.drop_duplicates(subset="timestamp", keep="last", inplace=True)
df0

Unnamed: 0,timestamp,open,high,low,close,volume
0,2019-11-14,8773.08,8784.51,8566.84,8637.10,126.897068
1,2019-11-15,8632.95,8766.00,8388.00,8459.07,221.746205
2,2019-11-16,8463.79,8528.44,8430.77,8484.07,50.607771
3,2019-11-17,8485.83,8627.56,8373.96,8508.26,89.357152
4,2019-11-18,8495.72,8501.58,8043.00,8175.14,173.564990
...,...,...,...,...,...,...
2593,2023-07-10,27575.17,28100.00,27450.00,27707.00,84.414950
2594,2023-07-11,27707.00,27797.05,26900.00,26951.45,78.992070
2595,2023-07-12,26911.15,27500.01,26777.00,26777.00,83.336960
2596,2023-07-13,26799.97,27400.00,25450.00,25937.06,129.776650


In [12]:
# Check duplicated rows
len(df0['timestamp'].unique())

1339

In [13]:
# Since I cannot get all the data from the same exchange, I will get the remaining data from another
last_date = df0['timestamp'].tail(1).values[0]

# Compare the last date with our end date
if not last_date == end_date:
  exchange_to_use = ccxt.kraken()
  for i in range(3):
    df0 = df0.append(ohlcv_crawler(exchange_to_use, pd.to_datetime(last_date).strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d')), ignore_index=True)
df0

Unnamed: 0,timestamp,open,high,low,close,volume
0,2019-11-14,8773.08,8784.51,8566.84,8637.10,126.897068
1,2019-11-15,8632.95,8766.00,8388.00,8459.07,221.746205
2,2019-11-16,8463.79,8528.44,8430.77,8484.07,50.607771
3,2019-11-17,8485.83,8627.56,8373.96,8508.26,89.357152
4,2019-11-18,8495.72,8501.58,8043.00,8175.14,173.564990
...,...,...,...,...,...,...
1775,2023-12-03,39477.10,40117.20,39299.00,39975.10,2138.769822
1776,2023-12-04,39975.10,42415.50,39476.00,41984.70,7454.600308
1777,2023-12-05,41983.00,44465.00,41427.10,44080.80,5566.774594
1778,2023-12-06,44080.80,44277.20,43335.50,43761.70,3558.308187


In [14]:
# Check duplicated rows
len(df0['timestamp'].unique())

1485

In [15]:
# Drop the duplicates in column "timestamp", keep the last value
df0.drop_duplicates(subset="timestamp", keep="last", inplace=True)
df0

Unnamed: 0,timestamp,open,high,low,close,volume
0,2019-11-14,8773.08,8784.51,8566.84,8637.10,126.897068
1,2019-11-15,8632.95,8766.00,8388.00,8459.07,221.746205
2,2019-11-16,8463.79,8528.44,8430.77,8484.07,50.607771
3,2019-11-17,8485.83,8627.56,8373.96,8508.26,89.357152
4,2019-11-18,8495.72,8501.58,8043.00,8175.14,173.564990
...,...,...,...,...,...,...
1775,2023-12-03,39477.10,40117.20,39299.00,39975.10,2138.769822
1776,2023-12-04,39975.10,42415.50,39476.00,41984.70,7454.600308
1777,2023-12-05,41983.00,44465.00,41427.10,44080.80,5566.774594
1778,2023-12-06,44080.80,44277.20,43335.50,43761.70,3558.308187


In [16]:
def blockchain_data_crawler(timespan, metrics, start, end):
    # API info
    url = f'https://api.blockchain.info/charts/{metrics}?timespan={timespan}&start={start}&format=csv'

    # Obtain data
    data = pd.read_csv(url, names=['timestamp', metrics])

    # Transform "timestamp" to datetime type
    data['timestamp'] = pd.to_datetime(data["timestamp"])

    # Select data up to the end date
    data = data[(data['timestamp'] < end)]

    return data

In [17]:
# Merge the data
merge = functools.partial(pd.merge, on='timestamp')

# Gain blockchain data from Blockchain.com API
df1 = functools.reduce(merge, [blockchain_data_crawler(timespan, metric, start_date.strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d')) for metric in metrics])
df1

Unnamed: 0,timestamp,market-price,trade-volume,blocks-size,avg-block-size,n-transactions-total,n-transactions-per-block,hash-rate,difficulty,miners-revenue,transaction-fees-usd,n-unique-addresses,n-transactions,estimated-transaction-volume-usd
0,2019-11-14,8762.42,5.327823e+07,249030.689529,0.981338,474394716,2071.142012,1.068612e+08,1.272001e+13,1.877947e+07,2.581060e+05,535548.0,350023.0,9.464503e+08
1,2019-11-15,8632.32,6.209435e+07,249196.673568,1.113353,474744608,2057.879747,9.990575e+07,1.272001e+13,1.735128e+07,2.545219e+05,575887.0,325145.0,1.199133e+09
2,2019-11-16,8457.69,1.023465e+08,249372.424907,1.171504,475069560,1915.324324,9.358260e+07,1.272001e+13,1.586438e+07,1.942917e+05,495077.0,283468.0,4.618346e+08
3,2019-11-17,8482.70,2.721844e+07,249545.811754,1.108507,475351799,1716.688312,9.737649e+07,1.272001e+13,1.658248e+07,1.795335e+05,430853.0,264370.0,3.428184e+08
4,2019-11-18,8503.93,4.276261e+07,249716.672832,1.019048,475617345,2176.221429,8.852408e+07,1.272001e+13,1.496338e+07,2.331815e+05,510494.0,304671.0,1.043918e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1448,2023-11-08,35436.28,1.706969e+08,524489.163390,1.653792,915522161,3914.100671,4.626563e+08,6.246347e+13,3.713802e+07,4.180095e+06,837044.0,583201.0,5.080135e+09
1449,2023-11-09,35629.52,1.569445e+08,524735.476915,1.663591,916106091,4010.251748,4.440259e+08,6.246347e+13,4.238651e+07,9.218558e+06,832393.0,573466.0,6.287697e+09
1450,2023-11-10,36696.25,4.387976e+08,524973.519610,1.621323,916679882,3401.500000,4.781817e+08,6.246347e+13,4.158990e+07,6.043066e+06,848433.0,523831.0,5.561959e+09
1451,2023-11-11,37321.65,2.492784e+08,525223.328826,1.661074,917204210,3922.056250,4.968122e+08,6.246347e+13,4.229329e+07,4.284425e+06,894325.0,627529.0,2.477252e+09


In [18]:
# Check duplicated rows
len(df1['timestamp'].unique())

1453

In [19]:
# Retrieving market capitalization and total circulating data
metrics = [
          # Currency Statistics
          "total-bitcoins",
          "market-cap",
  ]

df2 = functools.reduce(merge, [blockchain_data_crawler(timespan, metric, start_date.strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d')) for metric in metrics])
df2

Unnamed: 0,timestamp,total-bitcoins,market-cap
0,2019-11-14 00:00:38,18045587.50,1.581515e+11
1,2019-11-14 19:32:53,18047362.50,1.557758e+11
2,2019-11-15 16:08:02,18049137.50,1.528672e+11
3,2019-11-16 14:47:16,18050900.00,1.532340e+11
4,2019-11-17 14:50:03,18052662.50,1.539892e+11
...,...,...,...
1500,2023-11-08 16:12:45,19536750.00,6.884555e+11
1501,2023-11-09 18:23:03,19537637.50,7.122446e+11
1502,2023-11-10 16:32:17,19538518.75,7.316198e+11
1503,2023-11-11 13:08:52,19539406.25,7.251074e+11


In [20]:
# Check duplicated rows
len(df2['timestamp'].unique())

1505

In [21]:
# Wipe off the timestamp's h:m:s.
df2['timestamp'] = pd.to_datetime(df2["timestamp"]).dt.normalize()

# Drop the duplicates in column "timestamp", keep the last value
df2.drop_duplicates(subset="timestamp", keep="last", inplace=True)
df2

Unnamed: 0,timestamp,total-bitcoins,market-cap
1,2019-11-14,18047362.50,1.557758e+11
2,2019-11-15,18049137.50,1.528672e+11
3,2019-11-16,18050900.00,1.532340e+11
4,2019-11-17,18052662.50,1.539892e+11
5,2019-11-18,18054425.00,1.525599e+11
...,...,...,...
1500,2023-11-08,19536750.00,6.884555e+11
1501,2023-11-09,19537637.50,7.122446e+11
1502,2023-11-10,19538518.75,7.316198e+11
1503,2023-11-11,19539406.25,7.251074e+11


In [22]:
all_data_tmp = pd.merge(df0, df1, how="inner", on='timestamp')
all_data = pd.merge(all_data_tmp, df2, how="inner", on='timestamp')
all_data = all_data.interpolate(method='ffill')
all_data

Unnamed: 0,timestamp,open,high,low,close,volume,market-price,trade-volume,blocks-size,avg-block-size,...,n-transactions-per-block,hash-rate,difficulty,miners-revenue,transaction-fees-usd,n-unique-addresses,n-transactions,estimated-transaction-volume-usd,total-bitcoins,market-cap
0,2019-11-14,8773.08,8784.51,8566.84,8637.10,126.897068,8762.42,5.327823e+07,249030.689529,0.981338,...,2071.142012,1.068612e+08,1.272001e+13,1.877947e+07,2.581060e+05,535548.0,350023.0,9.464503e+08,18047362.50,1.557758e+11
1,2019-11-15,8632.95,8766.00,8388.00,8459.07,221.746205,8632.32,6.209435e+07,249196.673568,1.113353,...,2057.879747,9.990575e+07,1.272001e+13,1.735128e+07,2.545219e+05,575887.0,325145.0,1.199133e+09,18049137.50,1.528672e+11
2,2019-11-16,8463.79,8528.44,8430.77,8484.07,50.607771,8457.69,1.023465e+08,249372.424907,1.171504,...,1915.324324,9.358260e+07,1.272001e+13,1.586438e+07,1.942917e+05,495077.0,283468.0,4.618346e+08,18050900.00,1.532340e+11
3,2019-11-17,8485.83,8627.56,8373.96,8508.26,89.357152,8482.70,2.721844e+07,249545.811754,1.108507,...,1716.688312,9.737649e+07,1.272001e+13,1.658248e+07,1.795335e+05,430853.0,264370.0,3.428184e+08,18052662.50,1.539892e+11
4,2019-11-18,8495.72,8501.58,8043.00,8175.14,173.564990,8503.93,4.276261e+07,249716.672832,1.019048,...,2176.221429,8.852408e+07,1.272001e+13,1.496338e+07,2.331815e+05,510494.0,304671.0,1.043918e+09,18054425.00,1.525599e+11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1411,2023-11-08,35449.10,36094.00,35100.00,35626.40,2554.528167,35436.28,1.706969e+08,524489.163390,1.653792,...,3914.100671,4.626563e+08,6.246347e+13,3.713802e+07,4.180095e+06,837044.0,583201.0,5.080135e+09,19536750.00,6.884555e+11
1412,2023-11-09,35626.40,37971.00,35517.30,36694.00,8078.048005,35629.52,1.569445e+08,524735.476915,1.663591,...,4010.251748,4.440259e+08,6.246347e+13,4.238651e+07,9.218558e+06,832393.0,573466.0,6.287697e+09,19537637.50,7.122446e+11
1413,2023-11-10,36702.50,37500.00,36340.10,37311.70,4459.426938,36696.25,4.387976e+08,524973.519610,1.621323,...,3401.500000,4.781817e+08,6.246347e+13,4.158990e+07,6.043066e+06,848433.0,523831.0,5.561959e+09,19538518.75,7.316198e+11
1414,2023-11-11,37311.70,37411.70,36658.00,37139.80,1725.244614,37321.65,2.492784e+08,525223.328826,1.661074,...,3922.056250,4.968122e+08,6.246347e+13,4.229329e+07,4.284425e+06,894325.0,627529.0,2.477252e+09,19539406.25,7.251074e+11


In [23]:
# Check nan values
all_data[all_data.isnull().T.any()]

Unnamed: 0,timestamp,open,high,low,close,volume,market-price,trade-volume,blocks-size,avg-block-size,...,n-transactions-per-block,hash-rate,difficulty,miners-revenue,transaction-fees-usd,n-unique-addresses,n-transactions,estimated-transaction-volume-usd,total-bitcoins,market-cap


In [24]:
# Check duplicated rows
len(all_data['timestamp'].unique())

1416

Once I have the daily dataset I will sample it at a frequency of 15 minutes (15T) using the resample method.

This means that the data will be organized in 15-minute time-frame, and an interpolation method will be used to fill in any missing data or holes in the dataset by estimating missing values based on the surrounding known values.

In [25]:
# Rename some columns
all_data.rename(columns={'open': 'opening-price', 'high': 'highest-price', 'low': 'lowest-price', 'close': 'closing-price', 'volume': 'trade-volume-btc', 'trade-volume': 'trade-volume-usd'}, inplace=True)

# Reorder colunmns
new_columns = ['timestamp', 'market-price', 'opening-price', 'highest-price', 'lowest-price', 'closing-price', 'trade-volume-btc', 'total-bitcoins', 'market-cap'] + [col for col in all_data.columns if col not in ['timestamp', 'market-price', 'opening-price', 'highest-price', 'lowest-price', 'closing-price', 'trade-volume-btc', 'total-bitcoins', 'market-cap']]
all_data = all_data.reindex(columns=new_columns)

# Upsampling to 15min by interpolate
all_data.set_index('timestamp', inplace=True)
all_data_15m = all_data.resample('15T').interpolate()
all_data_15m

Unnamed: 0_level_0,market-price,opening-price,highest-price,lowest-price,closing-price,trade-volume-btc,total-bitcoins,market-cap,trade-volume-usd,blocks-size,avg-block-size,n-transactions-total,n-transactions-per-block,hash-rate,difficulty,miners-revenue,transaction-fees-usd,n-unique-addresses,n-transactions,estimated-transaction-volume-usd
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2019-11-14 00:00:00,8762.420000,8773.080000,8784.510000,8566.840000,8637.100000,126.897068,1.804736e+07,1.557758e+11,5.327823e+07,249030.689529,0.981338,4.743947e+08,2071.142012,1.068612e+08,1.272001e+13,1.877947e+07,2.581060e+05,535548.000000,350023.000000,9.464503e+08
2019-11-14 00:15:00,8761.064792,8771.620312,8784.317188,8564.977083,8635.245521,127.885080,1.804738e+07,1.557455e+11,5.337006e+07,249032.418529,0.982713,4.743984e+08,2071.003863,1.067888e+08,1.272001e+13,1.876460e+07,2.580687e+05,535968.197917,349763.854167,9.490824e+08
2019-11-14 00:30:00,8759.709583,8770.160625,8784.124375,8563.114167,8633.391042,128.873092,1.804740e+07,1.557152e+11,5.346190e+07,249034.147530,0.984088,4.744020e+08,2070.865715,1.067163e+08,1.272001e+13,1.874972e+07,2.580313e+05,536388.395833,349504.708333,9.517145e+08
2019-11-14 00:45:00,8758.354375,8768.700937,8783.931562,8561.251250,8631.536562,129.861104,1.804742e+07,1.556849e+11,5.355373e+07,249035.876530,0.985463,4.744057e+08,2070.727566,1.066439e+08,1.272001e+13,1.873484e+07,2.579940e+05,536808.593750,349245.562500,9.543466e+08
2019-11-14 01:00:00,8756.999167,8767.241250,8783.738750,8559.388333,8629.682083,130.849115,1.804744e+07,1.556546e+11,5.364557e+07,249037.605531,0.986838,4.744093e+08,2070.589417,1.065714e+08,1.272001e+13,1.871997e+07,2.579566e+05,537228.791667,348986.416667,9.569787e+08
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-11 23:00:00,37147.827500,37147.058333,37235.270833,36724.412500,37057.479167,1195.623304,1.954026e+07,7.242590e+11,1.056250e+08,525477.781422,1.735283,9.178043e+08,4349.889249,5.056973e+08,6.320184e+13,4.409653e+07,4.534400e+06,881531.250000,700168.750000,1.707548e+09
2023-11-11 23:15:00,37145.938125,37145.268750,37233.353125,36725.134375,37056.584375,1189.866550,1.954027e+07,7.242498e+11,1.040635e+08,525480.547211,1.736089,9.178108e+08,4354.539607,5.057939e+08,6.320987e+13,4.411613e+07,4.537117e+06,881392.187500,700958.312500,1.699182e+09
2023-11-11 23:30:00,37144.048750,37143.479167,37231.435417,36725.856250,37055.689583,1184.109797,1.954028e+07,7.242406e+11,1.025021e+08,525483.313000,1.736896,9.178173e+08,4359.189966,5.058905e+08,6.321790e+13,4.413573e+07,4.539834e+06,881253.125000,701747.875000,1.690816e+09
2023-11-11 23:45:00,37142.159375,37141.689583,37229.517708,36726.578125,37054.794792,1178.353043,1.954029e+07,7.242314e+11,1.009406e+08,525486.078789,1.737702,9.178238e+08,4363.840325,5.059870e+08,6.322592e+13,4.415533e+07,4.542551e+06,881114.062500,702537.437500,1.682449e+09


# Saving dataset

In [26]:
# Save the 15m dataset
all_data_15m.to_parquet(DATASET_RAW)

In [7]:
# Export notebook in html format (remember to save the notebook and change the model name)
if LOCAL_RUNNING:
    !jupyter nbconvert --to html 1-data-crawling.ipynb --output 1-data-crawling.ipynb --output-dir='./exports'

  warn(
[NbConvertApp] Converting notebook 1-data-crawling.ipynb to html
[NbConvertApp] Writing 417893 bytes to ..\exports\1-data-crawling.ipynb.html
