## Introduction & Executive Sumamry

We collected cryptocurrency data and traditional asset classes' data from different datasource. With this python notebook file, we integrated those data sources with different data format, and generated the file of team12_cleandata.csv, which is the base data infrastructure for the whole project.

We took significant time to create team12_cleandata.csv at the initial stage of our project. And we realized that, different from Kaggle competition where the nicely clean data is already prepared, in the real world situation, those data collection and data cleaning must take the significant part of our research.

**Team 12: Data Scientist track.**

**Team Member:**
*   Deng Lingzhe	e0674597@u.nus.edu
*   GOH ZHEN HAO	e0486543@u.nus.edu
*   NAOYA OHARA	e0395606@u.nus.edu



In [1]:
#It's for google colab usage on google drive.
#Ignore or change folder path that corresponds to your environment.
from google.colab import drive 
drive.mount('/content/drive', force_remount=True)

COLAB_PATH = '/content/drive/My Drive/IT5006/FinalPackage'
import sys
import os
sys.path.append(COLAB_PATH)
## change directory to the path above
os.chdir(COLAB_PATH)


Mounted at /content/drive


## Importing libraries
Here we are importing all the libraries required for the case study.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

## Cleaning and integrating Crypto data

Loading the dataset

In [3]:
#download crypt data
btc_df = pd.read_csv('FINAL_DATASETS_29012021/Bitcoin-btc-usd-max-gecko.csv')
eth_df = pd.read_csv('FINAL_DATASETS_29012021/Ethereum-eth-usd-max.csv')
xrp_df = pd.read_csv('FINAL_DATASETS_29012021/xrp-usd-max.csv')
ltc_df = pd.read_csv('FINAL_DATASETS_29012021/LiteCoin-ltc-usd-max.csv')

In [4]:
btc_df.head()

Unnamed: 0,snapped_at,price,market_cap,total_volume
0,2013-04-28 00:00:00 UTC,135.3,1500518000.0,0.0
1,2013-04-29 00:00:00 UTC,141.96,1575032000.0,0.0
2,2013-04-30 00:00:00 UTC,135.3,1501657000.0,0.0
3,2013-05-01 00:00:00 UTC,117.0,1298952000.0,0.0
4,2013-05-02 00:00:00 UTC,103.43,1148668000.0,0.0


Data cleaning

In [5]:
#change column names
btc_df.columns = ['snapped_at', 'btc_price', 'btc_mktcap', 'btc_volume']
eth_df.columns = ['snapped_at', 'eth_price', 'eth_mktcap', 'eth_volume']
xrp_df.columns = ['snapped_at', 'xrp_price', 'xrp_mktcap', 'xrp_volume']
ltc_df.columns = ['snapped_at', 'ltc_price', 'ltc_mktcap', 'ltc_volume']

In [6]:
btc_df.head()

Unnamed: 0,snapped_at,btc_price,btc_mktcap,btc_volume
0,2013-04-28 00:00:00 UTC,135.3,1500518000.0,0.0
1,2013-04-29 00:00:00 UTC,141.96,1575032000.0,0.0
2,2013-04-30 00:00:00 UTC,135.3,1501657000.0,0.0
3,2013-05-01 00:00:00 UTC,117.0,1298952000.0,0.0
4,2013-05-02 00:00:00 UTC,103.43,1148668000.0,0.0


In [7]:
#change data type at 'snapped_at' column to datetime
btc_df['snapped_at'] = btc_df['snapped_at'].astype("datetime64")
eth_df['snapped_at'] = eth_df['snapped_at'].astype("datetime64")
xrp_df['snapped_at'] = xrp_df['snapped_at'].astype("datetime64")
ltc_df['snapped_at'] = ltc_df['snapped_at'].astype("datetime64")

In [8]:
#add yyyymmdd column as 'day'
btc_df['day'] = btc_df['snapped_at'].dt.strftime('%Y%m%d').astype(int)
eth_df['day'] = eth_df['snapped_at'].dt.strftime('%Y%m%d').astype(int)
xrp_df['day'] = xrp_df['snapped_at'].dt.strftime('%Y%m%d').astype(int)
ltc_df['day'] = ltc_df['snapped_at'].dt.strftime('%Y%m%d').astype(int)

In [9]:
btc_df.head()

Unnamed: 0,snapped_at,btc_price,btc_mktcap,btc_volume,day
0,2013-04-28,135.3,1500518000.0,0.0,20130428
1,2013-04-29,141.96,1575032000.0,0.0,20130429
2,2013-04-30,135.3,1501657000.0,0.0,20130430
3,2013-05-01,117.0,1298952000.0,0.0,20130501
4,2013-05-02,103.43,1148668000.0,0.0,20130502


In [10]:
#delete 'snapped_at' column
btc_df = btc_df.drop(['snapped_at'], axis=1)
eth_df = eth_df.drop(['snapped_at'], axis=1)
xrp_df = xrp_df.drop(['snapped_at'], axis=1)
ltc_df = ltc_df.drop(['snapped_at'], axis=1)

In [11]:
btc_df.head()

Unnamed: 0,btc_price,btc_mktcap,btc_volume,day
0,135.3,1500518000.0,0.0,20130428
1,141.96,1575032000.0,0.0,20130429
2,135.3,1501657000.0,0.0,20130430
3,117.0,1298952000.0,0.0,20130501
4,103.43,1148668000.0,0.0,20130502


In [12]:
#merge dataframes
#see below documentation in details
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge
cryptos_df = btc_df.merge(eth_df, how='outer', on='day')
cryptos_df = cryptos_df.merge(xrp_df, how='outer', on='day')
cryptos_df = cryptos_df.merge(ltc_df, how='outer', on='day')

In [13]:
#set day as index, then sort index by day order
cryptos_df = cryptos_df.set_index('day') 
cryptos_df = cryptos_df.sort_index()

In [14]:
cryptos_df.tail()

Unnamed: 0_level_0,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
20210122,30913.695736,575205500000.0,69415310000.0,1122.912433,130713700000.0,49658520000.0,0.269193,12251470000.0,3432332000.0,130.157888,8633411000.0,7625684000.0
20210123,32957.908783,613271100000.0,69549740000.0,1236.683443,140271900000.0,48657450000.0,0.273186,12348110000.0,3735007000.0,137.947871,9151085000.0,7893968000.0
20210124,32068.087374,596744400000.0,43575410000.0,1231.17638,140773200000.0,30774910000.0,0.271428,12353170000.0,2366507000.0,137.738436,9138226000.0,5145554000.0
20210125,32273.51735,600595400000.0,42810540000.0,1392.539763,158167200000.0,39913980000.0,0.273592,12456250000.0,1991914000.0,141.742214,9404889000.0,4928311000.0
20210126,,,,,,,0.26867,12239550000.0,2148246000.0,,,


In [15]:
#reset index and 'day' from index to column
cryptos_df = cryptos_df.reset_index()
#change data type of 'day' from int to daytime
cryptos_df['day'] = pd.to_datetime(cryptos_df['day'].astype(str), format='%Y%m%d')

In [16]:
cryptos_df.tail()

Unnamed: 0,day,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume
2826,2021-01-22,30913.695736,575205500000.0,69415310000.0,1122.912433,130713700000.0,49658520000.0,0.269193,12251470000.0,3432332000.0,130.157888,8633411000.0,7625684000.0
2827,2021-01-23,32957.908783,613271100000.0,69549740000.0,1236.683443,140271900000.0,48657450000.0,0.273186,12348110000.0,3735007000.0,137.947871,9151085000.0,7893968000.0
2828,2021-01-24,32068.087374,596744400000.0,43575410000.0,1231.17638,140773200000.0,30774910000.0,0.271428,12353170000.0,2366507000.0,137.738436,9138226000.0,5145554000.0
2829,2021-01-25,32273.51735,600595400000.0,42810540000.0,1392.539763,158167200000.0,39913980000.0,0.273592,12456250000.0,1991914000.0,141.742214,9404889000.0,4928311000.0
2830,2021-01-26,,,,,,,0.26867,12239550000.0,2148246000.0,,,


In [17]:
#Limit data until the end of 2020
cryptos_df = cryptos_df[cryptos_df['day'] <= '2020/12/31']

In [18]:
cryptos_df.tail()

Unnamed: 0,day,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume
2800,2020-12-27,26476.130137,491978600000.0,41995150000.0,636.742317,72391400000.0,14640530000.0,0.295383,13418020000.0,7431667000.0,129.757721,8586988000.0,8927412000.0
2801,2020-12-28,26423.228792,493427500000.0,56654980000.0,689.659857,78833070000.0,24721300000.0,0.284147,13068760000.0,7010556000.0,127.899088,8553328000.0,10006160000.0
2802,2020-12-29,27125.384121,503712200000.0,42186520000.0,732.957029,83575560000.0,22707050000.0,0.246134,11125030000.0,5788819000.0,130.638658,8661848000.0,7456440000.0
2803,2020-12-30,27424.538955,509680300000.0,38081840000.0,735.590898,83885240000.0,17170430000.0,0.221407,10057610000.0,10590300000.0,129.628167,8581202000.0,5810650000.0
2804,2020-12-31,28837.288529,535967300000.0,43341140000.0,752.855932,85790180000.0,13293870000.0,0.21207,9633436000.0,6788559000.0,129.244151,8556679000.0,5932641000.0


In [19]:
#Found some data is left as blank. So fill data by previous day's data.
cryptos_df = cryptos_df.fillna(method='ffill')

We can analyze data not by price itself, but by daily % change.

In [20]:
#adding daily % change data
cryptos_df['btc_price_chg'] = cryptos_df["btc_price"].astype(float).pct_change(1)
cryptos_df['btc_mktcap_chg'] = cryptos_df["btc_mktcap"].astype(float).pct_change(1)
cryptos_df['btc_volume_chg'] = cryptos_df["btc_volume"].astype(float).pct_change(1)
cryptos_df['eth_price_chg'] = cryptos_df["eth_price"].astype(float).pct_change(1)
cryptos_df['eth_mktcap_chg'] = cryptos_df["eth_mktcap"].astype(float).pct_change(1)
cryptos_df['eth_volume_chg'] = cryptos_df["eth_volume"].astype(float).pct_change(1)
cryptos_df['xrp_price_chg'] = cryptos_df["xrp_price"].astype(float).pct_change(1)
cryptos_df['xrp_mktcap_chg'] = cryptos_df["xrp_mktcap"].astype(float).pct_change(1)
cryptos_df['xrp_volume_chg'] = cryptos_df["xrp_volume"].astype(float).pct_change(1)
cryptos_df['ltc_price_chg'] = cryptos_df["ltc_price"].astype(float).pct_change(1)
cryptos_df['ltc_mktcap_chg'] = cryptos_df["ltc_mktcap"].astype(float).pct_change(1)
cryptos_df['ltc_volume_chg'] = cryptos_df["ltc_volume"].astype(float).pct_change(1)

In [21]:
#adding rolling 20 days volatilities for each cryptos
#Reference: https://pandas.pydata.org/docs/reference/api/pandas.core.window.rolling.Rolling.std.html
#sprt(252) means conversion of daily volatility into annual volatility
#Reference: https://www.fool.com/knowledge-center/how-to-calculate-annualized-volatility.aspx
#e.g. btc_vol20d = 0.5542 means that "bit coin can move by 55.42% annually, based on standard 
#deviation of last 20 days of daily % change data"
cryptos_df['btc_vol20d'] = cryptos_df["btc_price_chg"].rolling(20).std()*np.sqrt(252)
cryptos_df['eth_vol20d'] = cryptos_df["eth_price_chg"].rolling(20).std()*np.sqrt(252)
cryptos_df['xrp_vol20d'] = cryptos_df["xrp_price_chg"].rolling(20).std()*np.sqrt(252)
cryptos_df['ltc_vol20d'] = cryptos_df["ltc_price_chg"].rolling(20).std()*np.sqrt(252)

#adding rolling 60 days volatilities for each cryptos
cryptos_df['btc_vol60d'] = cryptos_df["btc_price_chg"].rolling(60).std()*np.sqrt(252)
cryptos_df['eth_vol60d'] = cryptos_df["eth_price_chg"].rolling(60).std()*np.sqrt(252)
cryptos_df['xrp_vol60d'] = cryptos_df["xrp_price_chg"].rolling(60).std()*np.sqrt(252)
cryptos_df['ltc_vol60d'] = cryptos_df["ltc_price_chg"].rolling(60).std()*np.sqrt(252)

In [22]:
cryptos_df.tail()

Unnamed: 0,day,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume,btc_price_chg,btc_mktcap_chg,btc_volume_chg,eth_price_chg,eth_mktcap_chg,eth_volume_chg,xrp_price_chg,xrp_mktcap_chg,xrp_volume_chg,ltc_price_chg,ltc_mktcap_chg,ltc_volume_chg,btc_vol20d,eth_vol20d,xrp_vol20d,ltc_vol20d,btc_vol60d,eth_vol60d,xrp_vol60d,ltc_vol60d
2800,2020-12-27,26476.130137,491978600000.0,41995150000.0,636.742317,72391400000.0,14640530000.0,0.295383,13418020000.0,7431667000.0,129.757721,8586988000.0,8927412000.0,0.073163,0.073214,0.146692,0.016419,0.013561,0.115394,-0.071368,-0.047568,-0.434958,0.021838,0.021949,0.086009,0.578714,0.623479,2.214227,1.183245,0.551998,0.660465,1.863215,1.033421
2801,2020-12-28,26423.228792,493427500000.0,56654980000.0,689.659857,78833070000.0,24721300000.0,0.284147,13068760000.0,7010556000.0,127.899088,8553328000.0,10006160000.0,-0.001998,0.002945,0.349084,0.083107,0.088984,0.688553,-0.038039,-0.026029,-0.056664,-0.014324,-0.00392,0.120836,0.576529,0.679078,2.214313,1.188913,0.546791,0.671132,1.864388,1.029946
2802,2020-12-29,27125.384121,503712200000.0,42186520000.0,732.957029,83575560000.0,22707050000.0,0.246134,11125030000.0,5788819000.0,130.638658,8661848000.0,7456440000.0,0.026573,0.020844,-0.255378,0.06278,0.060159,-0.081478,-0.133779,-0.148731,-0.174271,0.02142,0.012687,-0.254815,0.529077,0.647568,2.239522,1.122251,0.547594,0.678655,1.886685,1.02731
2803,2020-12-30,27424.538955,509680300000.0,38081840000.0,735.590898,83885240000.0,17170430000.0,0.221407,10057610000.0,10590300000.0,129.628167,8581202000.0,5810650000.0,0.011029,0.011848,-0.097298,0.003593,0.003705,-0.243828,-0.100461,-0.095947,0.82944,-0.007735,-0.00931,-0.220721,0.529382,0.644061,2.236159,1.127708,0.547492,0.677487,1.89906,1.026195
2804,2020-12-31,28837.288529,535967300000.0,43341140000.0,752.855932,85790180000.0,13293870000.0,0.21207,9633436000.0,6788559000.0,129.244151,8556679000.0,5932641000.0,0.051514,0.051576,0.138105,0.023471,0.022709,-0.225769,-0.042174,-0.042174,-0.358983,-0.002962,-0.002858,0.020994,0.522009,0.62999,2.234735,1.111921,0.553237,0.677847,1.9016,1.026327


In [23]:
#Adding 10,50,100,200 days simple moving average (SMA), which are often used in currency trading strategies.
#Reference: https://www.investopedia.com/ask/answers/122314/how-do-i-use-moving-average-ma-create-forex-trading-strategy.asp#:~:text=Moving%20averages%20are%20a%20frequently,100%2C%20and%20200%20day%20periods.
#10 days SMA
cryptos_df['btc_SMA10d'] = cryptos_df["btc_price"].rolling(10).mean()
cryptos_df['eth_SMA10d'] = cryptos_df["eth_price"].rolling(10).mean()
cryptos_df['xrp_SMA10d'] = cryptos_df["xrp_price"].rolling(10).mean()
cryptos_df['ltc_SMA10d'] = cryptos_df["ltc_price"].rolling(10).mean()
#50 days SMA
cryptos_df['btc_SMA50d'] = cryptos_df["btc_price"].rolling(50).mean()
cryptos_df['eth_SMA50d'] = cryptos_df["eth_price"].rolling(50).mean()
cryptos_df['xrp_SMA50d'] = cryptos_df["xrp_price"].rolling(50).mean()
cryptos_df['ltc_SMA50d'] = cryptos_df["ltc_price"].rolling(50).mean()
#100 days SMA
cryptos_df['btc_SMA100d'] = cryptos_df["btc_price"].rolling(100).mean()
cryptos_df['eth_SMA100d'] = cryptos_df["eth_price"].rolling(100).mean()
cryptos_df['xrp_SMA100d'] = cryptos_df["xrp_price"].rolling(100).mean()
cryptos_df['ltc_SMA100d'] = cryptos_df["ltc_price"].rolling(100).mean()
#200 days SMA
cryptos_df['btc_SMA200d'] = cryptos_df["btc_price"].rolling(200).mean()
cryptos_df['eth_SMA200d'] = cryptos_df["eth_price"].rolling(200).mean()
cryptos_df['xrp_SMA200d'] = cryptos_df["xrp_price"].rolling(200).mean()
cryptos_df['ltc_SMA200d'] = cryptos_df["ltc_price"].rolling(200).mean()

In [24]:
#calculate % difference between price and each SMA, the indicator which is often used in trading strategy.
#If the price is far above the moving average, it indicate that it is "overbought" and vice versa.
#10 days SMA
cryptos_df['btc_DiffSMA10d'] = cryptos_df["btc_price"] / cryptos_df['btc_SMA10d'] - 1.0
cryptos_df['eth_DiffSMA10d'] = cryptos_df["eth_price"] / cryptos_df['eth_SMA10d'] - 1.0
cryptos_df['xrp_DiffSMA10d'] = cryptos_df["xrp_price"] / cryptos_df['xrp_SMA10d'] - 1.0
cryptos_df['ltc_DiffSMA10d'] = cryptos_df["ltc_price"] / cryptos_df['ltc_SMA10d']  - 1.0
#50 days SMA
cryptos_df['btc_DiffSMA50d'] = cryptos_df["btc_price"] / cryptos_df['btc_SMA50d'] - 1.0
cryptos_df['eth_DiffSMA50d'] = cryptos_df["eth_price"] / cryptos_df['eth_SMA50d'] - 1.0
cryptos_df['xrp_DiffSMA50d'] = cryptos_df["xrp_price"] / cryptos_df['xrp_SMA50d'] - 1.0
cryptos_df['ltc_DiffSMA50d'] = cryptos_df["ltc_price"] / cryptos_df['ltc_SMA50d']  - 1.0
#100 days SMA
cryptos_df['btc_DiffSMA100d'] = cryptos_df["btc_price"] / cryptos_df['btc_SMA100d'] - 1.0
cryptos_df['eth_DiffSMA100d'] = cryptos_df["eth_price"] / cryptos_df['eth_SMA100d'] - 1.0
cryptos_df['xrp_DiffSMA100d'] = cryptos_df["xrp_price"] / cryptos_df['xrp_SMA100d'] - 1.0
cryptos_df['ltc_DiffSMA100d'] = cryptos_df["ltc_price"] / cryptos_df['ltc_SMA100d']  - 1.0
#200 days SMA
cryptos_df['btc_DiffSMA200d'] = cryptos_df["btc_price"] / cryptos_df['btc_SMA200d'] - 1.0
cryptos_df['eth_DiffSMA200d'] = cryptos_df["eth_price"] / cryptos_df['eth_SMA200d'] - 1.0
cryptos_df['xrp_DiffSMA200d'] = cryptos_df["xrp_price"] / cryptos_df['xrp_SMA200d'] - 1.0
cryptos_df['ltc_DiffSMA200d'] = cryptos_df["ltc_price"] / cryptos_df['ltc_SMA200d']  - 1.0

In [25]:
cryptos_df.tail()

Unnamed: 0,day,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume,btc_price_chg,btc_mktcap_chg,btc_volume_chg,eth_price_chg,eth_mktcap_chg,eth_volume_chg,xrp_price_chg,xrp_mktcap_chg,xrp_volume_chg,ltc_price_chg,ltc_mktcap_chg,ltc_volume_chg,btc_vol20d,eth_vol20d,xrp_vol20d,ltc_vol20d,btc_vol60d,eth_vol60d,xrp_vol60d,ltc_vol60d,btc_SMA10d,eth_SMA10d,xrp_SMA10d,ltc_SMA10d,btc_SMA50d,eth_SMA50d,xrp_SMA50d,ltc_SMA50d,btc_SMA100d,eth_SMA100d,xrp_SMA100d,ltc_SMA100d,btc_SMA200d,eth_SMA200d,xrp_SMA200d,ltc_SMA200d,btc_DiffSMA10d,eth_DiffSMA10d,xrp_DiffSMA10d,ltc_DiffSMA10d,btc_DiffSMA50d,eth_DiffSMA50d,xrp_DiffSMA50d,ltc_DiffSMA50d,btc_DiffSMA100d,eth_DiffSMA100d,xrp_DiffSMA100d,ltc_DiffSMA100d,btc_DiffSMA200d,eth_DiffSMA200d,xrp_DiffSMA200d,ltc_DiffSMA200d
2800,2020-12-27,26476.130137,491978600000.0,41995150000.0,636.742317,72391400000.0,14640530000.0,0.295383,13418020000.0,7431667000.0,129.757721,8586988000.0,8927412000.0,0.073163,0.073214,0.146692,0.016419,0.013561,0.115394,-0.071368,-0.047568,-0.434958,0.021838,0.021949,0.086009,0.578714,0.623479,2.214227,1.183245,0.551998,0.660465,1.863215,1.033421,23815.983676,630.676429,0.44859,113.675218,19124.40861,556.560606,0.464185,83.825214,15504.202963,465.488615,0.354512,66.945347,12926.582137,391.662966,0.294451,58.54228,0.111696,0.009618,-0.34153,0.141478,0.384416,0.144066,-0.363651,0.547956,0.707674,0.367901,-0.16679,0.938263,1.048193,0.62574,0.003167,1.216479
2801,2020-12-28,26423.228792,493427500000.0,56654980000.0,689.659857,78833070000.0,24721300000.0,0.284147,13068760000.0,7010556000.0,127.899088,8553328000.0,10006160000.0,-0.001998,0.002945,0.349084,0.083107,0.088984,0.688553,-0.038039,-0.026029,-0.056664,-0.014324,-0.00392,0.120836,0.576529,0.679078,2.214313,1.188913,0.546791,0.671132,1.864388,1.029946,24177.740975,635.235888,0.419412,116.278218,19356.504008,561.645436,0.464889,85.204897,15659.163748,468.553485,0.354853,67.740778,13009.323788,393.873556,0.294857,58.948498,0.092874,0.085675,-0.32251,0.09994,0.365083,0.227927,-0.388784,0.501077,0.687397,0.471891,-0.199254,0.888066,1.031099,0.750968,-0.036323,1.169675
2802,2020-12-29,27125.384121,503712200000.0,42186520000.0,732.957029,83575560000.0,22707050000.0,0.246134,11125030000.0,5788819000.0,130.638658,8661848000.0,7456440000.0,0.026573,0.020844,-0.255378,0.06278,0.060159,-0.081478,-0.133779,-0.148731,-0.174271,0.02142,0.012687,-0.254815,0.529077,0.647568,2.239522,1.122251,0.547594,0.678655,1.886685,1.02731,24578.222332,643.089549,0.385678,118.440612,19589.085367,567.197409,0.464738,86.597095,15819.577605,472.028334,0.354795,68.5618,13098.320724,396.378337,0.295138,59.384367,0.103635,0.139743,-0.361814,0.102989,0.384719,0.292243,-0.47038,0.50858,0.714672,0.552782,-0.306264,0.905415,1.070905,0.849135,-0.166035,1.199883
2803,2020-12-30,27424.538955,509680300000.0,38081840000.0,735.590898,83885240000.0,17170430000.0,0.221407,10057610000.0,10590300000.0,129.628167,8581202000.0,5810650000.0,0.011029,0.011848,-0.097298,0.003593,0.003705,-0.243828,-0.100461,-0.095947,0.82944,-0.007735,-0.00931,-0.220721,0.529382,0.644061,2.236159,1.127708,0.547492,0.677487,1.89906,1.026195,24934.290743,650.716758,0.349903,119.33897,19830.869302,573.008202,0.464151,88.004319,15984.589728,475.674564,0.354546,69.387582,13188.095752,398.868324,0.295278,59.808963,0.099872,0.130432,-0.367232,0.086218,0.382922,0.283735,-0.522984,0.472975,0.715686,0.546416,-0.375518,0.868175,1.079492,0.844195,-0.250172,1.16737
2804,2020-12-31,28837.288529,535967300000.0,43341140000.0,752.855932,85790180000.0,13293870000.0,0.21207,9633436000.0,6788559000.0,129.244151,8556679000.0,5932641000.0,0.051514,0.051576,0.138105,0.023471,0.022709,-0.225769,-0.042174,-0.042174,-0.358983,-0.002962,-0.002858,0.020994,0.522009,0.62999,2.234735,1.111921,0.553237,0.677847,1.9016,1.026327,25466.187754,662.050808,0.315103,120.76877,20102.04088,579.068996,0.463321,89.431054,16168.567389,479.790352,0.35435,70.247589,13284.934828,401.441136,0.295374,60.229597,0.132376,0.137157,-0.326984,0.070179,0.434545,0.300114,-0.542283,0.445182,0.78354,0.569135,-0.401524,0.839838,1.170676,0.875383,-0.28203,1.145858


In [26]:
#install library for technical analysis
#Reference: https://pypi.org/project/pandas-ta/
!pip install pandas_ta



In [27]:
#import library for technical analysis
#Reference: https://pypi.org/project/pandas-ta/
import pandas_ta as ta

In [28]:
#Creating data for RSI
#What's RSI? Reference: https://www.investopedia.com/terms/r/rsi.asp
cryptos_df['btc_RSI'] = ta.rsi(cryptos_df["btc_price"])
cryptos_df['eth_RSI'] = ta.rsi(cryptos_df["eth_price"])
cryptos_df['xrp_RSI'] = ta.rsi(cryptos_df["xrp_price"])
cryptos_df['ltc_RSI'] = ta.rsi(cryptos_df["ltc_price"])

In [29]:
cryptos_df.tail()

Unnamed: 0,day,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume,btc_price_chg,btc_mktcap_chg,btc_volume_chg,eth_price_chg,eth_mktcap_chg,eth_volume_chg,xrp_price_chg,xrp_mktcap_chg,xrp_volume_chg,ltc_price_chg,ltc_mktcap_chg,ltc_volume_chg,btc_vol20d,eth_vol20d,xrp_vol20d,ltc_vol20d,btc_vol60d,eth_vol60d,xrp_vol60d,ltc_vol60d,btc_SMA10d,eth_SMA10d,xrp_SMA10d,ltc_SMA10d,btc_SMA50d,eth_SMA50d,xrp_SMA50d,ltc_SMA50d,btc_SMA100d,eth_SMA100d,xrp_SMA100d,ltc_SMA100d,btc_SMA200d,eth_SMA200d,xrp_SMA200d,ltc_SMA200d,btc_DiffSMA10d,eth_DiffSMA10d,xrp_DiffSMA10d,ltc_DiffSMA10d,btc_DiffSMA50d,eth_DiffSMA50d,xrp_DiffSMA50d,ltc_DiffSMA50d,btc_DiffSMA100d,eth_DiffSMA100d,xrp_DiffSMA100d,ltc_DiffSMA100d,btc_DiffSMA200d,eth_DiffSMA200d,xrp_DiffSMA200d,ltc_DiffSMA200d,btc_RSI,eth_RSI,xrp_RSI,ltc_RSI
2800,2020-12-27,26476.130137,491978600000.0,41995150000.0,636.742317,72391400000.0,14640530000.0,0.295383,13418020000.0,7431667000.0,129.757721,8586988000.0,8927412000.0,0.073163,0.073214,0.146692,0.016419,0.013561,0.115394,-0.071368,-0.047568,-0.434958,0.021838,0.021949,0.086009,0.578714,0.623479,2.214227,1.183245,0.551998,0.660465,1.863215,1.033421,23815.983676,630.676429,0.44859,113.675218,19124.40861,556.560606,0.464185,83.825214,15504.202963,465.488615,0.354512,66.945347,12926.582137,391.662966,0.294451,58.54228,0.111696,0.009618,-0.34153,0.141478,0.384416,0.144066,-0.363651,0.547956,0.707674,0.367901,-0.16679,0.938263,1.048193,0.62574,0.003167,1.216479,78.668659,57.648628,34.638574,68.753204
2801,2020-12-28,26423.228792,493427500000.0,56654980000.0,689.659857,78833070000.0,24721300000.0,0.284147,13068760000.0,7010556000.0,127.899088,8553328000.0,10006160000.0,-0.001998,0.002945,0.349084,0.083107,0.088984,0.688553,-0.038039,-0.026029,-0.056664,-0.014324,-0.00392,0.120836,0.576529,0.679078,2.214313,1.188913,0.546791,0.671132,1.864388,1.029946,24177.740975,635.235888,0.419412,116.278218,19356.504008,561.645436,0.464889,85.204897,15659.163748,468.553485,0.354853,67.740778,13009.323788,393.873556,0.294857,58.948498,0.092874,0.085675,-0.32251,0.09994,0.365083,0.227927,-0.388784,0.501077,0.687397,0.471891,-0.199254,0.888066,1.031099,0.750968,-0.036323,1.169675,78.204641,64.769956,33.963885,67.289803
2802,2020-12-29,27125.384121,503712200000.0,42186520000.0,732.957029,83575560000.0,22707050000.0,0.246134,11125030000.0,5788819000.0,130.638658,8661848000.0,7456440000.0,0.026573,0.020844,-0.255378,0.06278,0.060159,-0.081478,-0.133779,-0.148731,-0.174271,0.02142,0.012687,-0.254815,0.529077,0.647568,2.239522,1.122251,0.547594,0.678655,1.886685,1.02731,24578.222332,643.089549,0.385678,118.440612,19589.085367,567.197409,0.464738,86.597095,15819.577605,472.028334,0.354795,68.5618,13098.320724,396.378337,0.295138,59.384367,0.103635,0.139743,-0.361814,0.102989,0.384719,0.292243,-0.47038,0.50858,0.714672,0.552782,-0.306264,0.905415,1.070905,0.849135,-0.166035,1.199883,79.899348,69.31615,31.713353,68.358848
2803,2020-12-30,27424.538955,509680300000.0,38081840000.0,735.590898,83885240000.0,17170430000.0,0.221407,10057610000.0,10590300000.0,129.628167,8581202000.0,5810650000.0,0.011029,0.011848,-0.097298,0.003593,0.003705,-0.243828,-0.100461,-0.095947,0.82944,-0.007735,-0.00931,-0.220721,0.529382,0.644061,2.236159,1.127708,0.547492,0.677487,1.89906,1.026195,24934.290743,650.716758,0.349903,119.33897,19830.869302,573.008202,0.464151,88.004319,15984.589728,475.674564,0.354546,69.387582,13188.095752,398.868324,0.295278,59.808963,0.099872,0.130432,-0.367232,0.086218,0.382922,0.283735,-0.522984,0.472975,0.715686,0.546416,-0.375518,0.868175,1.079492,0.844195,-0.250172,1.16737,80.591759,69.573372,30.306559,67.482775
2804,2020-12-31,28837.288529,535967300000.0,43341140000.0,752.855932,85790180000.0,13293870000.0,0.21207,9633436000.0,6788559000.0,129.244151,8556679000.0,5932641000.0,0.051514,0.051576,0.138105,0.023471,0.022709,-0.225769,-0.042174,-0.042174,-0.358983,-0.002962,-0.002858,0.020994,0.522009,0.62999,2.234735,1.111921,0.553237,0.677847,1.9016,1.026327,25466.187754,662.050808,0.315103,120.76877,20102.04088,579.068996,0.463321,89.431054,16168.567389,479.790352,0.35435,70.247589,13284.934828,401.441136,0.295374,60.229597,0.132376,0.137157,-0.326984,0.070179,0.434545,0.300114,-0.542283,0.445182,0.78354,0.569135,-0.401524,0.839838,1.170676,0.875383,-0.28203,1.145858,83.485009,71.273342,29.769512,67.130674


Verifying dataset

In [30]:
#check shape of table.
cryptos_df.shape

(2805, 69)

In [31]:
#num of data and datatype for each columns
cryptos_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2805 entries, 0 to 2804
Data columns (total 69 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   day              2805 non-null   datetime64[ns]
 1   btc_price        2805 non-null   float64       
 2   btc_mktcap       2805 non-null   float64       
 3   btc_volume       2805 non-null   float64       
 4   eth_price        1974 non-null   float64       
 5   eth_mktcap       1974 non-null   float64       
 6   eth_volume       1974 non-null   float64       
 7   xrp_price        2707 non-null   float64       
 8   xrp_mktcap       2707 non-null   float64       
 9   xrp_volume       2707 non-null   float64       
 10  ltc_price        2805 non-null   float64       
 11  ltc_mktcap       2805 non-null   float64       
 12  ltc_volume       2805 non-null   float64       
 13  btc_price_chg    2804 non-null   float64       
 14  btc_mktcap_chg   2804 non-null   float64

In [32]:
pd.set_option("display.max_rows", None, "display.max_columns", None)
#checking num of null data
cryptos_df.isnull().sum()

day                   0
btc_price             0
btc_mktcap            0
btc_volume            0
eth_price           831
eth_mktcap          831
eth_volume          831
xrp_price            98
xrp_mktcap           98
xrp_volume           98
ltc_price             0
ltc_mktcap            0
ltc_volume            0
btc_price_chg         1
btc_mktcap_chg        1
btc_volume_chg      243
eth_price_chg       832
eth_mktcap_chg      832
eth_volume_chg      832
xrp_price_chg        99
xrp_mktcap_chg       99
xrp_volume_chg      243
ltc_price_chg         1
ltc_mktcap_chg        1
ltc_volume_chg      243
btc_vol20d           20
eth_vol20d          851
xrp_vol20d          118
ltc_vol20d           20
btc_vol60d           60
eth_vol60d          891
xrp_vol60d          158
ltc_vol60d           60
btc_SMA10d            9
eth_SMA10d          840
xrp_SMA10d          107
ltc_SMA10d            9
btc_SMA50d           49
eth_SMA50d          880
xrp_SMA50d          147
ltc_SMA50d           49
btc_SMA100d     

In [33]:
#checking num of unique data
cryptos_df.nunique()

day                2805
btc_price          2800
btc_mktcap         2802
btc_volume         2562
eth_price          1973
eth_mktcap         1972
eth_volume         1973
xrp_price          2654
xrp_mktcap         2677
xrp_volume         2546
ltc_price          2801
ltc_mktcap         2801
ltc_volume         2561
btc_price_chg      2803
btc_mktcap_chg     2802
btc_volume_chg     2562
eth_price_chg      1973
eth_mktcap_chg     1972
eth_volume_chg     1973
xrp_price_chg      2682
xrp_mktcap_chg     2683
xrp_volume_chg     2544
ltc_price_chg      2802
ltc_mktcap_chg     2801
ltc_volume_chg     2561
btc_vol20d         2785
eth_vol20d         1954
xrp_vol20d         2686
ltc_vol20d         2785
btc_vol60d         2745
eth_vol60d         1914
xrp_vol60d         2647
ltc_vol60d         2745
btc_SMA10d         2796
eth_SMA10d         1965
xrp_SMA10d         2695
ltc_SMA10d         2796
btc_SMA50d         2756
eth_SMA50d         1925
xrp_SMA50d         2658
ltc_SMA50d         2756
btc_SMA100d     

In [34]:
#checking duplicated data
cryptos_df.duplicated().sum()

0

In [35]:
cryptos_df.describe()

Unnamed: 0,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume,btc_price_chg,btc_mktcap_chg,btc_volume_chg,eth_price_chg,eth_mktcap_chg,eth_volume_chg,xrp_price_chg,xrp_mktcap_chg,xrp_volume_chg,ltc_price_chg,ltc_mktcap_chg,ltc_volume_chg,btc_vol20d,eth_vol20d,xrp_vol20d,ltc_vol20d,btc_vol60d,eth_vol60d,xrp_vol60d,ltc_vol60d,btc_SMA10d,eth_SMA10d,xrp_SMA10d,ltc_SMA10d,btc_SMA50d,eth_SMA50d,xrp_SMA50d,ltc_SMA50d,btc_SMA100d,eth_SMA100d,xrp_SMA100d,ltc_SMA100d,btc_SMA200d,eth_SMA200d,xrp_SMA200d,ltc_SMA200d,btc_DiffSMA10d,eth_DiffSMA10d,xrp_DiffSMA10d,ltc_DiffSMA10d,btc_DiffSMA50d,eth_DiffSMA50d,xrp_DiffSMA50d,ltc_DiffSMA50d,btc_DiffSMA100d,eth_DiffSMA100d,xrp_DiffSMA100d,ltc_DiffSMA100d,btc_DiffSMA200d,eth_DiffSMA200d,xrp_DiffSMA200d,ltc_DiffSMA200d,btc_RSI,eth_RSI,xrp_RSI,ltc_RSI
count,2805.0,2805.0,2805.0,1974.0,1974.0,1974.0,2707.0,2707.0,2707.0,2805.0,2805.0,2805.0,2804.0,2804.0,2562.0,1973.0,1973.0,1973.0,2706.0,2706.0,2562.0,2804.0,2804.0,2562.0,2785.0,1954.0,2687.0,2785.0,2745.0,1914.0,2647.0,2745.0,2796.0,1965.0,2698.0,2796.0,2756.0,1925.0,2658.0,2756.0,2706.0,1875.0,2608.0,2706.0,2606.0,1775.0,2508.0,2606.0,2796.0,1965.0,2698.0,2796.0,2756.0,1925.0,2658.0,2756.0,2706.0,1875.0,2608.0,2706.0,2606.0,1775.0,2508.0,2606.0,2791.0,1960.0,2693.0,2791.0
mean,4112.540653,72050990000.0,7585436000.0,222.873881,22874020000.0,4209944000.0,0.198295,8013279000.0,740836900.0,39.351947,2249210000.0,906331000.0,0.002774,0.002958,inf,0.004952,inf,0.195882,0.004049,0.00506,inf,0.003177,0.003663,inf,0.567186,0.862692,0.917078,0.826878,0.59846,0.899498,0.99006,0.881809,4082.657011,222.299823,0.198508,39.268754,3990.620553,220.690909,0.197616,39.138786,3916.971893,220.276491,0.197908,39.215826,3830.725435,221.691189,0.199956,39.611362,0.009922,0.016606,0.010531,0.008248,0.057349,0.107161,0.069645,0.053587,0.121213,0.234561,0.13892,0.109991,0.237287,0.474555,0.225776,0.229812,53.629189,52.695714,48.541601,49.492488
std,4698.437597,84970290000.0,12685450000.0,232.266598,23360220000.0,5936126000.0,0.299517,11849720000.0,1578131000.0,49.957118,2893575000.0,1458300000.0,0.041376,0.041363,,0.064593,,3.717841,0.078876,0.091234,,0.065183,0.065289,,0.309983,0.417892,0.791519,0.593262,0.249866,0.319078,0.683889,0.520694,4600.446917,229.978553,0.294955,49.485524,4324.207463,221.101683,0.272052,47.685516,4104.99083,211.527144,0.255737,46.201552,3839.107256,196.874287,0.239234,43.628625,0.072247,0.11051,0.147078,0.118601,0.22881,0.356164,0.486343,0.405262,0.374118,0.619967,0.809906,0.645595,0.608979,1.075729,1.279386,0.963421,14.539017,14.77217,13.802142,13.342298
min,67.809,771368100.0,0.0,0.432979,0.0,87074.8,0.002686,21944810.0,0.0,1.148851,38286160.0,0.0,-0.351903,-0.357757,-0.995927,-0.530039,-0.482741,-0.989981,-0.598844,-0.406699,-0.899513,-0.42144,-0.420948,-0.992147,0.096839,0.225407,0.142178,0.087057,0.161956,0.402514,0.290551,0.145117,79.01016,0.514582,0.003269,1.337361,94.91404,0.72817,0.003874,1.46382,105.491797,0.816918,0.004498,1.633505,131.147053,1.367936,0.004813,1.920646,-0.36927,-0.463905,-0.562164,-0.444169,-0.488876,-0.496007,-0.574555,-0.600129,-0.499647,-0.558545,-0.732853,-0.653845,-0.602839,-0.725801,-0.834913,-0.757695,10.477495,15.836875,14.349813,9.800285
25%,416.715,5829507000.0,63971590.0,12.657845,1039693000.0,22193610.0,0.006881,217379400.0,410219.5,3.748312,153118900.0,7571770.0,-0.012677,-0.012222,-0.171144,-0.022362,-0.022459,-0.182907,-0.021289,-0.02091,-0.234345,-0.021681,-0.021302,-0.203187,0.351529,0.576943,0.474055,0.474229,0.424015,0.637525,0.530251,0.55871,417.49815,12.685526,0.007003,3.757794,418.25824,12.530069,0.007436,3.792309,418.809437,12.401438,0.007398,3.83598,461.902592,23.954598,0.007391,3.910442,-0.023672,-0.039141,-0.046878,-0.039009,-0.066829,-0.11613,-0.14458,-0.122972,-0.090356,-0.128893,-0.187567,-0.205389,-0.144109,-0.181386,-0.225444,-0.364243,43.548168,42.191522,39.202002,40.559833
50%,1078.274711,17045270000.0,1274169000.0,180.14682,19101100000.0,1295095000.0,0.038061,1276137000.0,25374500.0,20.3577,579183800.0,196726500.0,0.002089,0.002258,-0.020856,0.000521,0.000739,-0.013897,-0.001135,-0.001176,-0.02243,-0.001004,-0.000529,-0.025849,0.500247,0.759013,0.654434,0.682202,0.546845,0.800687,0.745803,0.758105,1060.133939,180.082761,0.040096,21.031885,1111.922911,183.053044,0.055525,19.977547,1064.947305,182.958115,0.102172,19.048296,1204.942182,185.077274,0.113152,16.706857,0.005699,0.005767,-0.007793,-0.002054,0.026236,0.037865,-0.035371,-0.014755,0.05121,0.080925,-0.030567,-0.040324,0.123274,0.184781,-0.088545,-0.010103,52.226967,51.923697,46.679485,48.344966
75%,7508.315669,132963500000.0,6813550000.0,305.261725,30170070000.0,7458829000.0,0.294142,12449170000.0,1121553000.0,57.063465,3477427000.0,1041549000.0,0.018354,0.018474,0.173896,0.029786,0.029941,0.206874,0.019717,0.019349,0.262631,0.019626,0.020113,0.19912,0.693039,1.023225,1.095724,0.99446,0.761878,1.153685,1.143173,1.049813,7474.749073,302.740114,0.29517,57.064047,7697.979891,299.02655,0.28859,56.460334,7976.968139,297.535537,0.314701,55.093544,8009.947244,293.984157,0.335553,58.552665,0.041337,0.062674,0.034409,0.036219,0.136538,0.231083,0.101173,0.106015,0.232878,0.3434,0.138104,0.193707,0.399467,0.713121,0.158031,0.362749,63.074237,62.490929,56.229199,56.524261
max,28837.288529,535967300000.0,81406690000.0,1448.180086,140419500000.0,74747420000.0,3.39845,131653000000.0,25054630000.0,360.661762,19609010000.0,10006160000.0,0.332556,0.332724,inf,0.552358,inf,162.802292,1.413959,2.429747,inf,0.932542,0.935298,inf,1.830729,3.926493,5.684042,5.029283,1.498939,2.399359,3.747411,3.366572,25466.187754,1278.33284,2.743761,317.17434,20102.04088,1031.401368,1.630031,248.98043,16168.567389,887.829676,1.260152,215.853822,13284.934828,725.406266,0.953388,173.169745,0.539251,0.831022,2.210126,2.205423,1.982207,2.416048,5.490504,8.261777,3.41994,3.506569,8.21509,11.658255,5.208597,6.468063,14.922883,13.823928,94.144798,93.080273,97.720733,97.491189


In [36]:
#we found infinity data in % changes, such that we replace inf into nan
cryptos_df = cryptos_df.replace([np.inf, -np.inf], np.nan)

In [37]:
cryptos_df.describe()

Unnamed: 0,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume,btc_price_chg,btc_mktcap_chg,btc_volume_chg,eth_price_chg,eth_mktcap_chg,eth_volume_chg,xrp_price_chg,xrp_mktcap_chg,xrp_volume_chg,ltc_price_chg,ltc_mktcap_chg,ltc_volume_chg,btc_vol20d,eth_vol20d,xrp_vol20d,ltc_vol20d,btc_vol60d,eth_vol60d,xrp_vol60d,ltc_vol60d,btc_SMA10d,eth_SMA10d,xrp_SMA10d,ltc_SMA10d,btc_SMA50d,eth_SMA50d,xrp_SMA50d,ltc_SMA50d,btc_SMA100d,eth_SMA100d,xrp_SMA100d,ltc_SMA100d,btc_SMA200d,eth_SMA200d,xrp_SMA200d,ltc_SMA200d,btc_DiffSMA10d,eth_DiffSMA10d,xrp_DiffSMA10d,ltc_DiffSMA10d,btc_DiffSMA50d,eth_DiffSMA50d,xrp_DiffSMA50d,ltc_DiffSMA50d,btc_DiffSMA100d,eth_DiffSMA100d,xrp_DiffSMA100d,ltc_DiffSMA100d,btc_DiffSMA200d,eth_DiffSMA200d,xrp_DiffSMA200d,ltc_DiffSMA200d,btc_RSI,eth_RSI,xrp_RSI,ltc_RSI
count,2805.0,2805.0,2805.0,1974.0,1974.0,1974.0,2707.0,2707.0,2707.0,2805.0,2805.0,2805.0,2804.0,2804.0,2561.0,1973.0,1972.0,1973.0,2706.0,2706.0,2561.0,2804.0,2804.0,2561.0,2785.0,1954.0,2687.0,2785.0,2745.0,1914.0,2647.0,2745.0,2796.0,1965.0,2698.0,2796.0,2756.0,1925.0,2658.0,2756.0,2706.0,1875.0,2608.0,2706.0,2606.0,1775.0,2508.0,2606.0,2796.0,1965.0,2698.0,2796.0,2756.0,1925.0,2658.0,2756.0,2706.0,1875.0,2608.0,2706.0,2606.0,1775.0,2508.0,2606.0,2791.0,1960.0,2693.0,2791.0
mean,4112.540653,72050990000.0,7585436000.0,222.873881,22874020000.0,4209944000.0,0.198295,8013279000.0,740836900.0,39.351947,2249210000.0,906331000.0,0.002774,0.002958,0.172983,0.004952,0.005541,0.195882,0.004049,0.00506,0.175522,0.003177,0.003663,0.170377,0.567186,0.862692,0.917078,0.826878,0.59846,0.899498,0.99006,0.881809,4082.657011,222.299823,0.198508,39.268754,3990.620553,220.690909,0.197616,39.138786,3916.971893,220.276491,0.197908,39.215826,3830.725435,221.691189,0.199956,39.611362,0.009922,0.016606,0.010531,0.008248,0.057349,0.107161,0.069645,0.053587,0.121213,0.234561,0.13892,0.109991,0.237287,0.474555,0.225776,0.229812,53.629189,52.695714,48.541601,49.492488
std,4698.437597,84970290000.0,12685450000.0,232.266598,23360220000.0,5936126000.0,0.299517,11849720000.0,1578131000.0,49.957118,2893575000.0,1458300000.0,0.041376,0.041363,4.581505,0.064593,0.063396,3.717841,0.078876,0.091234,0.954116,0.065183,0.065289,1.937697,0.309983,0.417892,0.791519,0.593262,0.249866,0.319078,0.683889,0.520694,4600.446917,229.978553,0.294955,49.485524,4324.207463,221.101683,0.272052,47.685516,4104.99083,211.527144,0.255737,46.201552,3839.107256,196.874287,0.239234,43.628625,0.072247,0.11051,0.147078,0.118601,0.22881,0.356164,0.486343,0.405262,0.374118,0.619967,0.809906,0.645595,0.608979,1.075729,1.279386,0.963421,14.539017,14.77217,13.802142,13.342298
min,67.809,771368100.0,0.0,0.432979,0.0,87074.8,0.002686,21944810.0,0.0,1.148851,38286160.0,0.0,-0.351903,-0.357757,-0.995927,-0.530039,-0.482741,-0.989981,-0.598844,-0.406699,-0.899513,-0.42144,-0.420948,-0.992147,0.096839,0.225407,0.142178,0.087057,0.161956,0.402514,0.290551,0.145117,79.01016,0.514582,0.003269,1.337361,94.91404,0.72817,0.003874,1.46382,105.491797,0.816918,0.004498,1.633505,131.147053,1.367936,0.004813,1.920646,-0.36927,-0.463905,-0.562164,-0.444169,-0.488876,-0.496007,-0.574555,-0.600129,-0.499647,-0.558545,-0.732853,-0.653845,-0.602839,-0.725801,-0.834913,-0.757695,10.477495,15.836875,14.349813,9.800285
25%,416.715,5829507000.0,63971590.0,12.657845,1039693000.0,22193610.0,0.006881,217379400.0,410219.5,3.748312,153118900.0,7571770.0,-0.012677,-0.012222,-0.171271,-0.022362,-0.022475,-0.182907,-0.021289,-0.02091,-0.234449,-0.021681,-0.021302,-0.203246,0.351529,0.576943,0.474055,0.474229,0.424015,0.637525,0.530251,0.55871,417.49815,12.685526,0.007003,3.757794,418.25824,12.530069,0.007436,3.792309,418.809437,12.401438,0.007398,3.83598,461.902592,23.954598,0.007391,3.910442,-0.023672,-0.039141,-0.046878,-0.039009,-0.066829,-0.11613,-0.14458,-0.122972,-0.090356,-0.128893,-0.187567,-0.205389,-0.144109,-0.181386,-0.225444,-0.364243,43.548168,42.191522,39.202002,40.559833
50%,1078.274711,17045270000.0,1274169000.0,180.14682,19101100000.0,1295095000.0,0.038061,1276137000.0,25374500.0,20.3577,579183800.0,196726500.0,0.002089,0.002258,-0.020909,0.000521,0.000735,-0.013897,-0.001135,-0.001176,-0.02252,-0.001004,-0.000529,-0.026138,0.500247,0.759013,0.654434,0.682202,0.546845,0.800687,0.745803,0.758105,1060.133939,180.082761,0.040096,21.031885,1111.922911,183.053044,0.055525,19.977547,1064.947305,182.958115,0.102172,19.048296,1204.942182,185.077274,0.113152,16.706857,0.005699,0.005767,-0.007793,-0.002054,0.026236,0.037865,-0.035371,-0.014755,0.05121,0.080925,-0.030567,-0.040324,0.123274,0.184781,-0.088545,-0.010103,52.226967,51.923697,46.679485,48.344966
75%,7508.315669,132963500000.0,6813550000.0,305.261725,30170070000.0,7458829000.0,0.294142,12449170000.0,1121553000.0,57.063465,3477427000.0,1041549000.0,0.018354,0.018474,0.173748,0.029786,0.029917,0.206874,0.019717,0.019349,0.262478,0.019626,0.020113,0.198581,0.693039,1.023225,1.095724,0.99446,0.761878,1.153685,1.143173,1.049813,7474.749073,302.740114,0.29517,57.064047,7697.979891,299.02655,0.28859,56.460334,7976.968139,297.535537,0.314701,55.093544,8009.947244,293.984157,0.335553,58.552665,0.041337,0.062674,0.034409,0.036219,0.136538,0.231083,0.101173,0.106015,0.232878,0.3434,0.138104,0.193707,0.399467,0.713121,0.158031,0.362749,63.074237,62.490929,56.229199,56.524261
max,28837.288529,535967300000.0,81406690000.0,1448.180086,140419500000.0,74747420000.0,3.39845,131653000000.0,25054630000.0,360.661762,19609010000.0,10006160000.0,0.332556,0.332724,230.400992,0.552358,0.55305,162.802292,1.413959,2.429747,17.763785,0.932542,0.935298,86.427666,1.830729,3.926493,5.684042,5.029283,1.498939,2.399359,3.747411,3.366572,25466.187754,1278.33284,2.743761,317.17434,20102.04088,1031.401368,1.630031,248.98043,16168.567389,887.829676,1.260152,215.853822,13284.934828,725.406266,0.953388,173.169745,0.539251,0.831022,2.210126,2.205423,1.982207,2.416048,5.490504,8.261777,3.41994,3.506569,8.21509,11.658255,5.208597,6.468063,14.922883,13.823928,94.144798,93.080273,97.720733,97.491189


## Cleaning and integrating macro market data

In [38]:
#reading macro market data
spx_df = pd.read_csv('FINAL_DATASETS_29012021/HistoricalQuotes_SPX500MiniFutures.csv')
nasdaq_df = pd.read_csv('FINAL_DATASETS_29012021/HistoricalQuotes_Nasdaq100EminiFutures.csv')
gold_df = pd.read_csv('FINAL_DATASETS_29012021/GLD_US_archive_EN.csv', header=6)
intrate_df = pd.read_csv('FINAL_DATASETS_29012021/FRB_H15.csv', header=[0,5])
dollarindex_df = pd.read_csv('FINAL_DATASETS_29012021/DollarIndex_DX-Y.NYB.csv')

In [39]:
spx_df.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,01/25/2021,3848.4,1725082,3835.5,3853.25,3788.5
1,01/22/2021,3834.25,1138551,3846.75,3849.0,3813.25
2,01/21/2021,3846.0,1029074,3841.5,3853.75,3842.25
3,01/20/2021,3845.0,1199672,3796.75,3852.5,3835.0
4,01/19/2021,3790.6,1262393,3750.0,3797.0,3740.5


In [40]:
nasdaq_df.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,01/25/2021,13475.5,658421,13370.0,13554.5,13189.0
1,01/22/2021,13361.5,404023,13394.0,13402.25,13300.25
2,01/21/2021,13395.5,418687,13288.5,13423.5,13315.75
3,01/20/2021,13294.25,431065,13030.0,13327.0,13242.0
4,01/19/2021,12985.5,478030,12774.25,13032.75,12727.0


In [41]:
gold_df.head()

Unnamed: 0,Date,GLD Close,LBMA Gold Price,NAV per GLD in Gold,NAV/share at 10.30 a.m. NYT,Indicative Price of GLD at 4.15 p.m. NYT,Mid point of bid/ask spread at 4.15 p.m. NYT#,Premium/Discount of GLD mid point v Indicative Value of GLD at 4.15 p.m. NYT,Daily Share Volume,Total Net Asset Value Ounces in the Trust as at 4.15 p.m. NYT,Total Net Asset Value Tonnes in the Trust as at 4.15 p.m. NYT,Total Net Asset Value in the Trust
0,18-Nov-2004,44.38,$442.00,100.0,44.2,44.305,$44.37,0.146%,5992000,260000.0,8.09,114920000.0
1,19-Nov-2004,44.78,$445.60,99.9989,44.55951167,44.694,$44.78,0.192%,11655000,1859994.06,57.85,828806907.2
2,22-Nov-2004,44.75,$447.80,99.9956,44.77803823,44.903,$44.95,0.105%,11976800,2799952.98,87.09,1253785205.5
3,23-Nov-2004,45.05,$448.15,99.9945,44.81255136,44.812,$44.74,-0.160%,3139000,2799952.98,87.09,1254751438.19
4,24-Nov-2004,45.05,$448.60,99.9934,44.85705902,44.952,$45.00,0.095%,6052700,3099933.3,96.42,1390568824.08


In [42]:
intrate_df.head()

Unnamed: 0_level_0,Series Description,"Market yield on U.S. Treasury securities at 1-month constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 3-month constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 6-month constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 1-year constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 2-year constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 3-year constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 5-year constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 7-year constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 10-year constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 20-year constant maturity, quoted on investment basis","Market yield on U.S. Treasury securities at 30-year constant maturity, quoted on investment basis"
Unnamed: 0_level_1,Time Period,RIFLGFCM01_N.B,RIFLGFCM03_N.B,RIFLGFCM06_N.B,RIFLGFCY01_N.B,RIFLGFCY02_N.B,RIFLGFCY03_N.B,RIFLGFCY05_N.B,RIFLGFCY07_N.B,RIFLGFCY10_N.B,RIFLGFCY20_N.B,RIFLGFCY30_N.B
0,1962-01-02,,,,3.22,,3.7,3.88,,4.06,4.07,
1,1962-01-03,,,,3.24,,3.7,3.87,,4.03,4.07,
2,1962-01-04,,,,3.24,,3.69,3.86,,3.99,4.06,
3,1962-01-05,,,,3.26,,3.71,3.89,,4.02,4.07,
4,1962-01-08,,,,3.31,,3.71,3.91,,4.03,4.08,


In [43]:
dollarindex_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2010-01-26,78.169998,78.639999,78.080002,78.43,78.43,0.0
1,2010-01-27,78.550003,78.849998,78.370003,78.68,78.68,0.0
2,2010-01-28,78.660004,79.07,78.540001,78.900002,78.900002,0.0
3,2010-01-29,78.989998,79.5,78.849998,79.459999,79.459999,0.0
4,2010-01-31,,,,,,


Data cleaning

Cleaning spx_df

In [44]:
#copy data
spx_clean_df = spx_df.copy()
#Select data only from date, price, and volume.
spx_clean_df = spx_clean_df[['Date', ' Close/Last', ' Volume']]
#change column names
spx_clean_df.columns = ['day_temp', 'spx_price', 'spx_volume']
#clean ' N/A' data.
spx_clean_df = spx_clean_df.replace(' N/A', np.nan)
#change datatype of day as datetime64
spx_clean_df['day_temp'] = spx_clean_df['day_temp'].astype("datetime64")
#change datatype of volume as float
spx_clean_df['spx_volume'] = spx_clean_df['spx_volume'].astype("float")
#sort data by ascending order by 'day_temp'
spx_clean_df = spx_clean_df.sort_values(by=['day_temp'])
#limit data between year 2013 and 2020
spx_clean_df = spx_clean_df[(spx_clean_df['day_temp'] >= '2013/01/01') & (spx_clean_df['day_temp'] <= '2020/12/31')]
#add yyyymmdd column as 'day'
spx_clean_df['day'] = spx_clean_df['day_temp'].dt.strftime('%Y%m%d').astype(int)
#delete 'day_temp' column
spx_clean_df = spx_clean_df.drop(['day_temp'], axis=1)
#reset index starting from 0
spx_clean_df = spx_clean_df.reset_index(drop=True)
spx_clean_df.isnull().sum()

spx_price      0
spx_volume    46
day            0
dtype: int64

In [45]:
spx_clean_df.head()

Unnamed: 0,spx_price,spx_volume,day
0,1454.75,4050.0,20130102
1,1453.75,1586.0,20130103
2,1457.7,1509781.0,20130104
3,1456.0,2016.0,20130107
4,1452.25,1814.0,20130108


In [46]:
spx_clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2024 entries, 0 to 2023
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   spx_price   2024 non-null   float64
 1   spx_volume  1978 non-null   float64
 2   day         2024 non-null   int64  
dtypes: float64(2), int64(1)
memory usage: 47.6 KB


Data cleaning of nasdaq_df

In [47]:
#procedure is totally same as the one for spx_df.
nasdaq_clean_df = nasdaq_df.copy()
#Select data only from date, price, and volume.
nasdaq_clean_df = nasdaq_clean_df[['Date', ' Close/Last', ' Volume']]
#change column names
#Note: NDX is ticker code of nasdaq 100
nasdaq_clean_df.columns = ['day_temp', 'ndx_price', 'ndx_volume']
#clean ' N/A' data.
nasdaq_clean_df = nasdaq_clean_df.replace(' N/A', np.nan)
#change datatype of day as datetime64
nasdaq_clean_df['day_temp'] = nasdaq_clean_df['day_temp'].astype("datetime64")
#change datatype of volume as float
nasdaq_clean_df['ndx_volume'] = nasdaq_clean_df['ndx_volume'].astype("float")
#sort data by ascending order by 'day_temp'
nasdaq_clean_df = nasdaq_clean_df.sort_values(by=['day_temp'])
#limit data between year 2013 and 2020
nasdaq_clean_df = nasdaq_clean_df[(nasdaq_clean_df['day_temp'] >= '2013/01/01') & (nasdaq_clean_df['day_temp'] <= '2020/12/31')]
#add yyyymmdd column as 'day'
nasdaq_clean_df['day'] = nasdaq_clean_df['day_temp'].dt.strftime('%Y%m%d').astype(int)
#delete 'day_temp' column
nasdaq_clean_df = nasdaq_clean_df.drop(['day_temp'], axis=1)
#reset index starting from 0
nasdaq_clean_df = nasdaq_clean_df.reset_index(drop=True)
nasdaq_clean_df.isnull().sum()

ndx_price      0
ndx_volume    17
day            0
dtype: int64

In [48]:
nasdaq_clean_df.head()

Unnamed: 0,ndx_price,ndx_volume,day
0,2738.5,464.0,20130102
1,2726.0,71.0,20130103
2,2713.0,212859.0,20130104
3,2718.0,105.0,20130107
4,2715.75,94.0,20130108


In [49]:
nasdaq_clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2025 entries, 0 to 2024
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   ndx_price   2025 non-null   float64
 1   ndx_volume  2008 non-null   float64
 2   day         2025 non-null   int64  
dtypes: float64(2), int64(1)
memory usage: 47.6 KB


Data cleaning of gold (SPDR gold ETF)

In [50]:
#procedure is totally same as the one for spx_df.
#But there is some additional operations for data cleaning.
#Also, be careful for data definitions of price and volume of gold.
gold_clean_df = gold_df.copy()
#Select data only from date, price, and volume.
gold_clean_df = gold_clean_df[['Date', ' LBMA Gold Price', ' Daily Share Volume']]
#change column names
#Note: gold_price is the price of physical gold i.e. LBMA Gold Price.
#Note: gold_volume is the volume of SPDR Gold ETF, not physical gold bar itself. 
gold_clean_df.columns = ['day_temp', 'gold_price', 'gold_volume']
#convert ' HOLIDAY' ' NYSE Closed' into nan
gold_clean_df = gold_clean_df.replace([' HOLIDAY',' NYSE Closed',' N/A'], np.nan)
#convert string with $ ad gold_price into float
gold_clean_df['gold_price'] = gold_clean_df['gold_price'].str.replace('$', '')
gold_clean_df['gold_price'] = gold_clean_df['gold_price'].astype("float")
#change datatype of day as datetime64
gold_clean_df['day_temp'] = gold_clean_df['day_temp'].astype("datetime64")
#change datatype of volume as float
gold_clean_df['gold_volume'] = gold_clean_df['gold_volume'].astype("float")
#sort data by ascending order by 'day_temp'
gold_clean_df = gold_clean_df.sort_values(by=['day_temp'])
#limit data between year 2013 and 2020
gold_clean_df = gold_clean_df[(gold_clean_df['day_temp'] >= '2013/01/01') & (gold_clean_df['day_temp'] <= '2020/12/31')]
#add yyyymmdd column as 'day'
gold_clean_df['day'] = gold_clean_df['day_temp'].dt.strftime('%Y%m%d').astype(int)
#delete 'day_temp' column
gold_clean_df = gold_clean_df.drop(['day_temp'], axis=1)
#reset index starting from 0
gold_clean_df = gold_clean_df.reset_index(drop=True)
gold_clean_df.isnull().sum()

gold_price     66
gold_volume    66
day             0
dtype: int64

In [51]:
gold_clean_df.head()

Unnamed: 0,gold_price,gold_volume,day
0,,,20130101
1,1693.75,10372818.0,20130102
2,1679.5,16026642.0,20130103
3,1648.0,19247822.0,20130104
4,1645.25,8101802.0,20130107


In [52]:
gold_clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2082 entries, 0 to 2081
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   gold_price   2016 non-null   float64
 1   gold_volume  2016 non-null   float64
 2   day          2082 non-null   int64  
dtypes: float64(2), int64(1)
memory usage: 48.9 KB


Data cleaning of us government bond yield

In [53]:
#procedure is totally same as the one for spx_df.
#but some additional operation is required for cleaning.
intrate_clean_df = intrate_df.copy()
#drop unnecessary level of column
intrate_clean_df.columns = intrate_clean_df.columns.droplevel()
#Select data only from date and 10 year government bond yield.
intrate_clean_df = intrate_clean_df[['Time Period', 'RIFLGFCY10_N.B']]
#change column names
intrate_clean_df.columns = ['day_temp', 'us10y_yield']
#clean ' N/A' data.
intrate_clean_df = intrate_clean_df.replace([' N/A','ND'], np.nan)
#change datatype of day as datetime64
intrate_clean_df['day_temp'] = intrate_clean_df['day_temp'].astype("datetime64")
#change datatype of volume as float
intrate_clean_df['us10y_yield'] = intrate_clean_df['us10y_yield'].astype("float")
#sort data by ascending order by 'day_temp'
intrate_clean_df = intrate_clean_df.sort_values(by=['day_temp'])
#limit data between year 2013 and 2020
intrate_clean_df = intrate_clean_df[(intrate_clean_df['day_temp'] >= '2013/01/01') & (intrate_clean_df['day_temp'] <= '2020/12/31')]
#add yyyymmdd column as 'day'
intrate_clean_df['day'] = intrate_clean_df['day_temp'].dt.strftime('%Y%m%d').astype(int)
#delete 'day_temp' column
intrate_clean_df = intrate_clean_df.drop(['day_temp'], axis=1)
#reset index starting from 0
intrate_clean_df = intrate_clean_df.reset_index(drop=True)
intrate_clean_df.isnull().sum()

us10y_yield    87
day             0
dtype: int64

In [54]:
intrate_clean_df.head()

Unnamed: 0,us10y_yield,day
0,,20130101
1,1.86,20130102
2,1.92,20130103
3,1.93,20130104
4,1.92,20130107


In [55]:
intrate_clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2088 entries, 0 to 2087
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   us10y_yield  2001 non-null   float64
 1   day          2088 non-null   int64  
dtypes: float64(1), int64(1)
memory usage: 32.8 KB


Cleaning dollar index

In [56]:
#procedure is totally same as the one for spx_df.
#Note: DXY is the ticker code of US dollar index
dxy_clean_df = dollarindex_df.copy()
#Select data only from date and price. No volume data of DXY.
dxy_clean_df = dxy_clean_df[['Date', 'Adj Close']]
#change column names
dxy_clean_df.columns = ['day_temp', 'dxy_price']
#change datatype of day as datetime64
dxy_clean_df['day_temp'] = dxy_clean_df['day_temp'].astype("datetime64")
#sort data by ascending order by 'day_temp'
dxy_clean_df = dxy_clean_df.sort_values(by=['day_temp'])
#limit data between year 2013 and 2020
dxy_clean_df = dxy_clean_df[(dxy_clean_df['day_temp'] >= '2013/01/01') & (dxy_clean_df['day_temp'] <= '2020/12/31')]
#add yyyymmdd column as 'day'
dxy_clean_df['day'] = dxy_clean_df['day_temp'].dt.strftime('%Y%m%d').astype(int)
#delete 'day_temp' column
dxy_clean_df = dxy_clean_df.drop(['day_temp'], axis=1)
#reset index starting from 0
dxy_clean_df = dxy_clean_df.reset_index(drop=True)
dxy_clean_df.isnull().sum()

dxy_price    429
day            0
dtype: int64

In [57]:
dxy_clean_df.head()

Unnamed: 0,dxy_price,day
0,79.849998,20130102
1,80.43,20130103
2,80.5,20130104
3,,20130106
4,80.260002,20130107


In [58]:
dxy_clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2433 entries, 0 to 2432
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   dxy_price  2004 non-null   float64
 1   day        2433 non-null   int64  
dtypes: float64(1), int64(1)
memory usage: 38.1 KB


Integrate macro market data

In [59]:
#merge dataframes
#see below documentation in details
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge
macrodata_df = spx_clean_df.merge(nasdaq_clean_df, how='outer', on='day')
macrodata_df = macrodata_df.merge(gold_clean_df, how='outer', on='day')
macrodata_df = macrodata_df.merge(intrate_clean_df, how='outer', on='day')
macrodata_df = macrodata_df.merge(dxy_clean_df, how='outer', on='day')

#set day as index, then sort index by day order
macrodata_df = macrodata_df.set_index('day') 
macrodata_df = macrodata_df.sort_index()

In [60]:
#reset index and 'day' from index to column
macrodata_df = macrodata_df.reset_index()
#change data type of 'day' from int to daytime
macrodata_df['day'] = pd.to_datetime(macrodata_df['day'].astype(str), format='%Y%m%d')

In [61]:
macrodata_df.tail()

Unnamed: 0,day,spx_price,spx_volume,ndx_price,ndx_volume,gold_price,gold_volume,us10y_yield,dxy_price
2500,2020-12-27,,,,,,,,
2501,2020-12-28,3727.5,731435.0,12832.75,341802.0,1875.0,7730071.0,0.94,90.339996
2502,2020-12-29,3719.9,972308.0,12841.0,391072.0,1874.3,5760324.0,0.94,90.010002
2503,2020-12-30,3724.2,738427.0,12841.5,331658.0,1887.6,5557137.0,0.93,89.629997
2504,2020-12-31,3748.75,917212.0,12885.5,277076.0,1887.6,7498356.0,0.93,89.940002


In [62]:
#Found some data is left as blank. So fill data by previous day's data.
macrodata_df = macrodata_df.fillna(method='ffill')

We can analyze data not by price itself, but by daily % change.

In [63]:
#adding daily % change data
macrodata_df['spx_price_chg'] = macrodata_df["spx_price"].astype(float).pct_change(1)
macrodata_df['spx_volume_chg'] = macrodata_df["spx_volume"].astype(float).pct_change(1)
macrodata_df['ndx_price_chg'] = macrodata_df["ndx_price"].astype(float).pct_change(1)
macrodata_df['ndx_volume_chg'] = macrodata_df["ndx_volume"].astype(float).pct_change(1)
macrodata_df['gold_price_chg'] = macrodata_df["gold_price"].astype(float).pct_change(1)
macrodata_df['gold_volume_chg'] = macrodata_df["gold_volume"].astype(float).pct_change(1)
macrodata_df['us10y_yield_chg'] = macrodata_df["us10y_yield"].astype(float).pct_change(1)
macrodata_df['dxy_price_chg'] = macrodata_df["dxy_price"].astype(float).pct_change(1)

In [64]:
#adding rolling 20 days volatilities for each macro market data.
#Reference: https://pandas.pydata.org/docs/reference/api/pandas.core.window.rolling.Rolling.std.html
#sprt(252) means conversion of daily volatility into annual volatility
#Reference: https://www.fool.com/knowledge-center/how-to-calculate-annualized-volatility.aspx
macrodata_df['spx_vol20d'] = macrodata_df['spx_price_chg'].rolling(20).std()*np.sqrt(252)
macrodata_df['ndx_vol20d'] = macrodata_df['ndx_price_chg'].rolling(20).std()*np.sqrt(252)
macrodata_df['gold_vol20d'] = macrodata_df['gold_price_chg'].rolling(20).std()*np.sqrt(252)
macrodata_df['us10y_vol20d'] = macrodata_df['us10y_yield_chg'].rolling(20).std()*np.sqrt(252)
macrodata_df['dxy_vol20d'] = macrodata_df['dxy_price_chg'].rolling(20).std()*np.sqrt(252)

#adding rolling 60 days volatilities for each cryptos
macrodata_df['spx_vol60d'] = macrodata_df['spx_price_chg'].rolling(60).std()*np.sqrt(252)
macrodata_df['ndx_vol60d'] = macrodata_df['ndx_price_chg'].rolling(60).std()*np.sqrt(252)
macrodata_df['gold_vol60d'] = macrodata_df['gold_price_chg'].rolling(60).std()*np.sqrt(252)
macrodata_df['us10y_vol60d'] = macrodata_df['us10y_yield_chg'].rolling(60).std()*np.sqrt(252)
macrodata_df['dxy_vol60d'] = macrodata_df['dxy_price_chg'].rolling(60).std()*np.sqrt(252)

In [65]:
#Adding 10,50,100,200 days simple moving average (SMA)
#10 days SMA
macrodata_df["spx_SMA10d"] = macrodata_df["spx_price"].rolling(10).mean()
macrodata_df["ndx_SMA10d"] = macrodata_df["ndx_price"].rolling(10).mean()
macrodata_df["gold_SMA10d"] = macrodata_df["gold_price"].rolling(10).mean()
macrodata_df["us10y_SMA10d"] = macrodata_df["us10y_yield"].rolling(10).mean()
macrodata_df["dxy_SMA10d"] = macrodata_df["dxy_price"].rolling(10).mean()

#50 days SMA
macrodata_df["spx_SMA50d"] = macrodata_df["spx_price"].rolling(50).mean()
macrodata_df["ndx_SMA50d"] = macrodata_df["ndx_price"].rolling(50).mean()
macrodata_df["gold_SMA50d"] = macrodata_df["gold_price"].rolling(50).mean()
macrodata_df["us10y_SMA50d"] = macrodata_df["us10y_yield"].rolling(50).mean()
macrodata_df["dxy_SMA50d"] = macrodata_df["dxy_price"].rolling(50).mean()

#100 days SMA
macrodata_df["spx_SMA100d"] = macrodata_df["spx_price"].rolling(100).mean()
macrodata_df["ndx_SMA100d"] = macrodata_df["ndx_price"].rolling(100).mean()
macrodata_df["gold_SMA100d"] = macrodata_df["gold_price"].rolling(100).mean()
macrodata_df["us10y_SMA100d"] = macrodata_df["us10y_yield"].rolling(100).mean()
macrodata_df["dxy_SMA100d"] = macrodata_df["dxy_price"].rolling(100).mean()

#200 days SMA
macrodata_df["spx_SMA200d"] = macrodata_df["spx_price"].rolling(200).mean()
macrodata_df["ndx_SMA200d"] = macrodata_df["ndx_price"].rolling(200).mean()
macrodata_df["gold_SMA200d"] = macrodata_df["gold_price"].rolling(200).mean()
macrodata_df["us10y_SMA200d"] = macrodata_df["us10y_yield"].rolling(200).mean()
macrodata_df["dxy_SMA200d"] = macrodata_df["dxy_price"].rolling(200).mean()

In [66]:
#calculate % difference between price and each SMA, the indicator which is often used in trading strategy.
#If the price is far above the moving average, it indicate that it is "overbought" and vice versa.
#10 days SMA
macrodata_df['spx_DiffSMA10d'] = macrodata_df["spx_price"] / macrodata_df["spx_SMA10d"] - 1.0
macrodata_df['ndx_DiffSMA10d'] = macrodata_df["ndx_price"] / macrodata_df["ndx_SMA10d"] - 1.0
macrodata_df['gold_DiffSMA10d'] = macrodata_df["gold_price"] / macrodata_df["gold_SMA10d"] - 1.0
macrodata_df['us10y_DiffSMA10d'] = macrodata_df["us10y_yield"] / macrodata_df["us10y_SMA10d"]  - 1.0
macrodata_df['dxy_DiffSMA10d'] = macrodata_df["dxy_price"] / macrodata_df["dxy_SMA10d"]  - 1.0

#50 days SMA
macrodata_df['spx_DiffSMA50d'] = macrodata_df["spx_price"] / macrodata_df["spx_SMA50d"] - 1.0
macrodata_df['ndx_DiffSMA50d'] = macrodata_df["ndx_price"] / macrodata_df["ndx_SMA50d"] - 1.0
macrodata_df['gold_DiffSMA50d'] = macrodata_df["gold_price"] / macrodata_df["gold_SMA50d"] - 1.0
macrodata_df['us10y_DiffSMA50d'] = macrodata_df["us10y_yield"] / macrodata_df["us10y_SMA50d"]  - 1.0
macrodata_df['dxy_DiffSMA50d'] = macrodata_df["dxy_price"] / macrodata_df["dxy_SMA50d"]  - 1.0

#100 days SMA
macrodata_df['spx_DiffSMA100d'] = macrodata_df["spx_price"] / macrodata_df["spx_SMA100d"] - 1.0
macrodata_df['ndx_DiffSMA100d'] = macrodata_df["ndx_price"] / macrodata_df["ndx_SMA100d"] - 1.0
macrodata_df['gold_DiffSMA100d'] = macrodata_df["gold_price"] / macrodata_df["gold_SMA100d"] - 1.0
macrodata_df['us10y_DiffSMA100d'] = macrodata_df["us10y_yield"] / macrodata_df["us10y_SMA100d"]  - 1.0
macrodata_df['dxy_DiffSMA100d'] = macrodata_df["dxy_price"] / macrodata_df["dxy_SMA100d"]  - 1.0

#200 days SMA
macrodata_df['spx_DiffSMA200d'] = macrodata_df["spx_price"] / macrodata_df["spx_SMA200d"] - 1.0
macrodata_df['ndx_DiffSMA200d'] = macrodata_df["ndx_price"] / macrodata_df["ndx_SMA200d"] - 1.0
macrodata_df['gold_DiffSMA200d'] = macrodata_df["gold_price"] / macrodata_df["gold_SMA200d"] - 1.0
macrodata_df['us10y_DiffSMA200d'] = macrodata_df["us10y_yield"] / macrodata_df["us10y_SMA200d"]  - 1.0
macrodata_df['dxy_DiffSMA200d'] = macrodata_df["dxy_price"] / macrodata_df["dxy_SMA200d"]  - 1.0

In [67]:
#Creating data for RSI
#What's RSI? Reference: https://www.investopedia.com/terms/r/rsi.asp
macrodata_df['spx_RSI'] = ta.rsi(macrodata_df["spx_price"])
macrodata_df['ndx_RSI'] = ta.rsi(macrodata_df["ndx_price"])
macrodata_df['gold_RSI'] = ta.rsi(macrodata_df["gold_price"])
macrodata_df['us10y_RSI'] = ta.rsi(macrodata_df["us10y_yield"])
macrodata_df['dxy_RSI'] = ta.rsi(macrodata_df["dxy_price"])

In [68]:
macrodata_df.tail()

Unnamed: 0,day,spx_price,spx_volume,ndx_price,ndx_volume,gold_price,gold_volume,us10y_yield,dxy_price,spx_price_chg,spx_volume_chg,ndx_price_chg,ndx_volume_chg,gold_price_chg,gold_volume_chg,us10y_yield_chg,dxy_price_chg,spx_vol20d,ndx_vol20d,gold_vol20d,us10y_vol20d,dxy_vol20d,spx_vol60d,ndx_vol60d,gold_vol60d,us10y_vol60d,dxy_vol60d,spx_SMA10d,ndx_SMA10d,gold_SMA10d,us10y_SMA10d,dxy_SMA10d,spx_SMA50d,ndx_SMA50d,gold_SMA50d,us10y_SMA50d,dxy_SMA50d,spx_SMA100d,ndx_SMA100d,gold_SMA100d,us10y_SMA100d,dxy_SMA100d,spx_SMA200d,ndx_SMA200d,gold_SMA200d,us10y_SMA200d,dxy_SMA200d,spx_DiffSMA10d,ndx_DiffSMA10d,gold_DiffSMA10d,us10y_DiffSMA10d,dxy_DiffSMA10d,spx_DiffSMA50d,ndx_DiffSMA50d,gold_DiffSMA50d,us10y_DiffSMA50d,dxy_DiffSMA50d,spx_DiffSMA100d,ndx_DiffSMA100d,gold_DiffSMA100d,us10y_DiffSMA100d,dxy_DiffSMA100d,spx_DiffSMA200d,ndx_DiffSMA200d,gold_DiffSMA200d,us10y_DiffSMA200d,dxy_DiffSMA200d,spx_RSI,ndx_RSI,gold_RSI,us10y_RSI,dxy_RSI
2500,2020-12-27,3695.0,355183.0,12704.5,171576.0,1875.0,3429273.0,0.94,90.410004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075605,0.10488,0.111572,0.361602,0.043643,0.145707,0.194912,0.163841,0.656438,0.047256,3694.86,12699.95,1875.93,0.942,90.264001,3601.183,12187.665,1861.551,0.9008,91.720401,3493.81,11816.585,1886.352,0.8164,92.5667,3331.84925,11100.16125,1854.588,0.7431,94.339075,3.8e-05,0.000358,-0.000496,-0.002123,0.001618,0.026052,0.042406,0.007225,0.043517,-0.014287,0.057585,0.075141,-0.006018,0.151396,-0.023299,0.108994,0.144533,0.011006,0.264971,-0.041648,60.200128,64.831582,56.086386,54.541246,39.35111
2501,2020-12-28,3727.5,731435.0,12832.75,341802.0,1875.0,7730071.0,0.94,90.339996,0.008796,1.059319,0.010095,0.992132,0.0,1.254143,0.0,-0.000774,0.075127,0.108894,0.110267,0.302532,0.043696,0.144159,0.193564,0.163838,0.654896,0.047058,3698.24,12716.7,1878.235,0.944,90.253001,3610.438,12223.395,1861.414,0.902,91.646401,3495.2925,11820.7975,1885.6315,0.8192,92.5416,3335.84475,11118.23375,1855.44275,0.74435,94.292125,0.007912,0.009126,-0.001722,-0.004237,0.000964,0.032423,0.049852,0.007299,0.042129,-0.014255,0.066434,0.085608,-0.005638,0.147461,-0.02379,0.117408,0.154208,0.01054,0.262847,-0.041914,66.94325,70.210334,56.086386,54.541246,38.093544
2502,2020-12-29,3719.9,972308.0,12841.0,391072.0,1874.3,5760324.0,0.94,90.010002,-0.002039,0.329316,0.000643,0.144148,-0.000373,-0.254816,0.0,-0.003653,0.075621,0.108821,0.110315,0.302532,0.045371,0.144309,0.193569,0.163671,0.650971,0.047,3698.95,12725.7,1876.59,0.944,90.272001,3619.541,12259.29,1861.263,0.9032,91.565801,3497.8765,11831.2025,1884.97,0.8223,92.5143,3339.80225,11136.3475,1856.294,0.7456,94.243525,0.005664,0.00906,-0.00122,-0.004237,-0.002902,0.027727,0.047451,0.007004,0.040744,-0.016991,0.063474,0.08535,-0.005661,0.143135,-0.027069,0.113808,0.153071,0.0097,0.26073,-0.044921,64.203851,70.522652,55.707426,54.541246,32.776404
2503,2020-12-30,3724.2,738427.0,12841.5,331658.0,1887.6,5557137.0,0.93,89.629997,0.001156,-0.240542,3.9e-05,-0.151926,0.007096,-0.035274,-0.010638,-0.004222,0.075195,0.107674,0.108447,0.284287,0.047027,0.144284,0.193588,0.161881,0.65004,0.046735,3700.745,12738.6,1877.375,0.942,90.233001,3628.015,12294.855,1861.217,0.9044,91.477401,3500.9435,11844.13,1884.583,0.8244,92.4826,3343.8095,11154.16,1857.21825,0.7466,94.190475,0.006338,0.008078,0.005446,-0.012739,-0.006683,0.026512,0.044461,0.014175,0.028306,-0.020195,0.06377,0.084208,0.001601,0.128093,-0.030845,0.11376,0.151275,0.016359,0.245647,-0.048418,65.074671,70.542809,61.08723,51.596833,27.940014
2504,2020-12-31,3748.75,917212.0,12885.5,277076.0,1887.6,7498356.0,0.93,89.940002,0.006592,0.242116,0.003426,-0.164573,0.0,0.34932,0.0,0.003459,0.077824,0.107712,0.107605,0.273369,0.048391,0.144503,0.193317,0.160006,0.643106,0.046655,3704.995,12755.9,1878.16,0.94,90.225002,3635.76,12327.25,1860.803,0.905,91.405201,3504.256,11857.4975,1884.196,0.8265,92.454,3348.29075,11173.19625,1858.14425,0.7478,94.140525,0.01181,0.01016,0.005026,-0.010638,-0.003159,0.031077,0.045286,0.014401,0.027624,-0.01603,0.069771,0.086696,0.001807,0.125227,-0.027192,0.119601,0.153251,0.015852,0.243648,-0.04462,69.61893,72.335582,61.08723,51.596833,36.209526


Verifying dataset

In [69]:
#check shape of table.
macrodata_df.shape

(2505, 72)

In [70]:
#num of data and datatype for each columns
macrodata_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2505 entries, 0 to 2504
Data columns (total 72 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   day                2505 non-null   datetime64[ns]
 1   spx_price          2504 non-null   float64       
 2   spx_volume         2504 non-null   float64       
 3   ndx_price          2504 non-null   float64       
 4   ndx_volume         2504 non-null   float64       
 5   gold_price         2504 non-null   float64       
 6   gold_volume        2504 non-null   float64       
 7   us10y_yield        2504 non-null   float64       
 8   dxy_price          2504 non-null   float64       
 9   spx_price_chg      2503 non-null   float64       
 10  spx_volume_chg     2503 non-null   float64       
 11  ndx_price_chg      2503 non-null   float64       
 12  ndx_volume_chg     2503 non-null   float64       
 13  gold_price_chg     2503 non-null   float64       
 14  gold_vol

In [71]:
#checking num of null data
macrodata_df.isnull().sum()

day                    0
spx_price              1
spx_volume             1
ndx_price              1
ndx_volume             1
gold_price             1
gold_volume            1
us10y_yield            1
dxy_price              1
spx_price_chg          2
spx_volume_chg         2
ndx_price_chg          2
ndx_volume_chg         2
gold_price_chg         2
gold_volume_chg        9
us10y_yield_chg        2
dxy_price_chg          2
spx_vol20d            21
ndx_vol20d            21
gold_vol20d           21
us10y_vol20d          21
dxy_vol20d            21
spx_vol60d            61
ndx_vol60d            61
gold_vol60d           61
us10y_vol60d          61
dxy_vol60d            61
spx_SMA10d            10
ndx_SMA10d            10
gold_SMA10d           10
us10y_SMA10d          10
dxy_SMA10d            10
spx_SMA50d            50
ndx_SMA50d            50
gold_SMA50d           50
us10y_SMA50d          50
dxy_SMA50d            50
spx_SMA100d          100
ndx_SMA100d          100
gold_SMA100d         100


In [72]:
#checking num of unique data
macrodata_df.nunique()

day                  2505
spx_price            1748
spx_volume           1968
ndx_price            1924
ndx_volume           1973
gold_price           1628
gold_volume          1977
us10y_yield           233
dxy_price            1148
spx_price_chg        1990
spx_volume_chg       1972
ndx_price_chg        2005
ndx_volume_chg       2000
gold_price_chg       1953
gold_volume_chg      1979
us10y_yield_chg      1217
dxy_price_chg        1975
spx_vol20d           2454
ndx_vol20d           2463
gold_vol20d          2445
us10y_vol20d         2365
dxy_vol20d           2458
spx_vol60d           2044
ndx_vol60d           2041
gold_vol60d          2032
us10y_vol60d         1994
dxy_vol60d           2045
spx_SMA10d           2462
ndx_SMA10d           2476
gold_SMA10d          2454
us10y_SMA10d         2175
dxy_SMA10d           2488
spx_SMA50d           2448
ndx_SMA50d           2453
gold_SMA50d          2448
us10y_SMA50d         2360
dxy_SMA50d           2451
spx_SMA100d          2401
ndx_SMA100d 

In [73]:
#checking duplicated data
macrodata_df.duplicated().sum()

0

In [74]:
macrodata_df.describe()

Unnamed: 0,spx_price,spx_volume,ndx_price,ndx_volume,gold_price,gold_volume,us10y_yield,dxy_price,spx_price_chg,spx_volume_chg,ndx_price_chg,ndx_volume_chg,gold_price_chg,gold_volume_chg,us10y_yield_chg,dxy_price_chg,spx_vol20d,ndx_vol20d,gold_vol20d,us10y_vol20d,dxy_vol20d,spx_vol60d,ndx_vol60d,gold_vol60d,us10y_vol60d,dxy_vol60d,spx_SMA10d,ndx_SMA10d,gold_SMA10d,us10y_SMA10d,dxy_SMA10d,spx_SMA50d,ndx_SMA50d,gold_SMA50d,us10y_SMA50d,dxy_SMA50d,spx_SMA100d,ndx_SMA100d,gold_SMA100d,us10y_SMA100d,dxy_SMA100d,spx_SMA200d,ndx_SMA200d,gold_SMA200d,us10y_SMA200d,dxy_SMA200d,spx_DiffSMA10d,ndx_DiffSMA10d,gold_DiffSMA10d,us10y_DiffSMA10d,dxy_DiffSMA10d,spx_DiffSMA50d,ndx_DiffSMA50d,gold_DiffSMA50d,us10y_DiffSMA50d,dxy_DiffSMA50d,spx_DiffSMA100d,ndx_DiffSMA100d,gold_DiffSMA100d,us10y_DiffSMA100d,dxy_DiffSMA100d,spx_DiffSMA200d,ndx_DiffSMA200d,gold_DiffSMA200d,us10y_DiffSMA200d,dxy_DiffSMA200d,spx_RSI,ndx_RSI,gold_RSI,us10y_RSI,dxy_RSI
count,2504.0,2504.0,2504.0,2504.0,2504.0,2504.0,2504.0,2504.0,2503.0,2503.0,2503.0,2503.0,2503.0,2496.0,2503.0,2503.0,2484.0,2484.0,2484.0,2484.0,2484.0,2444.0,2444.0,2444.0,2444.0,2444.0,2495.0,2495.0,2495.0,2495.0,2495.0,2455.0,2455.0,2455.0,2455.0,2455.0,2405.0,2405.0,2405.0,2405.0,2405.0,2305.0,2305.0,2305.0,2305.0,2305.0,2495.0,2495.0,2495.0,2495.0,2495.0,2455.0,2455.0,2455.0,2455.0,2455.0,2405.0,2405.0,2405.0,2405.0,2405.0,2305.0,2305.0,2305.0,2305.0,2305.0,2490.0,2490.0,2490.0,2490.0,2490.0
mean,2379.304233,1377997.0,5820.723243,303767.9,1346.505292,8495691.0,2.143219,92.586647,0.000424,15.519807,0.000679,26.985114,8.1e-05,inf,9.4e-05,5.5e-05,0.120155,0.147343,0.125194,0.336973,0.056134,0.127484,0.15438,0.128704,0.346216,0.057265,2378.554118,5813.718818,1344.968792,2.145825,92.613332,2375.353551,5784.664678,1338.227293,2.157538,92.730133,2372.576919,5753.422891,1329.788281,2.173492,92.847063,2368.222951,5696.278556,1314.601038,2.208563,93.053734,0.001732,0.002847,0.000269,-0.000932,0.000221,0.009199,0.015521,0.001511,-0.005379,0.001265,0.017794,0.031087,0.004236,-0.010268,0.002427,0.03351,0.061212,0.013872,-0.033915,0.005482,56.699083,57.439794,50.518187,49.499266,51.13598
std,529.277792,738761.8,2308.660912,204458.8,198.968373,5201556.0,0.624826,6.688785,0.009522,118.937357,0.010957,246.163253,0.008631,,0.027467,0.003746,0.093702,0.095856,0.056602,0.285593,0.020039,0.084301,0.083994,0.04796,0.268796,0.016934,524.864588,2288.020931,196.955189,0.621705,6.663339,506.261164,2201.760998,188.635109,0.608214,6.561145,487.161049,2105.500927,177.068752,0.586965,6.452187,455.741751,1921.739827,151.914303,0.528758,6.214281,0.014836,0.016708,0.014437,0.041375,0.006479,0.032688,0.036356,0.032796,0.089345,0.014143,0.041406,0.047255,0.046229,0.131756,0.021504,0.047953,0.059514,0.059557,0.183162,0.033227,12.708157,12.690238,14.25844,12.853016,13.353612
min,1452.25,445.0,2705.0,8.0,1049.4,0.0,0.52,79.129997,-0.103732,-0.999662,-0.108423,-0.999933,-0.091501,-1.0,-0.27027,-0.023702,0.024324,0.039317,0.045422,0.081897,0.019277,0.038493,0.053779,0.055389,0.134703,0.02351,1459.23,2726.475,1063.585,0.555,79.389,1491.732,2738.73,1071.474,0.6286,79.7964,1523.0785,2767.9025,1099.556,0.6595,79.9844,1584.21575,2885.09125,1116.00425,0.697,80.2386,-0.146944,-0.135145,-0.096546,-0.439834,-0.028166,-0.270956,-0.212771,-0.136948,-0.638941,-0.038172,-0.291502,-0.204563,-0.183337,-0.675851,-0.05174,-0.274406,-0.17559,-0.121931,-0.682474,-0.065447,14.816579,16.488986,11.402867,5.049022,14.427694
25%,1982.6875,1054188.0,4169.4375,182126.8,1226.5,5343413.0,1.83,89.292498,-0.001873,-0.149519,-0.002479,-0.15842,-0.003089,-0.215175,-0.010059,-0.001671,0.067032,0.089936,0.08664,0.195672,0.043522,0.082413,0.105722,0.090326,0.205259,0.046748,1979.3875,4192.7625,1226.61,1.825,89.546,1976.4025,4226.4075,1230.0225,1.838,89.8593,1999.664,4289.1925,1222.1575,1.89,90.478899,2012.149,4345.27375,1235.1425,1.99015,91.85735,-0.003666,-0.004789,-0.007787,-0.020709,-0.003689,-0.003067,-0.000815,-0.017935,-0.041945,-0.007967,0.00384,0.009661,-0.024419,-0.06721,-0.010837,0.01596,0.031943,-0.029234,-0.118613,-0.015004,47.96655,48.434488,40.882207,40.470302,41.71308
50%,2260.75,1360594.0,4945.875,271695.0,1286.1,7345108.0,2.25,94.970001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.096247,0.121488,0.110318,0.267455,0.053075,0.106683,0.126129,0.11913,0.266385,0.054074,2259.125,4940.025,1287.15,2.245,94.993999,2258.955,4973.175,1288.461,2.2536,95.346601,2254.7025,5000.0225,1286.1595,2.2625,95.5298,2251.73425,5056.7125,1278.39375,2.27425,95.8887,0.002949,0.0046,0.000464,-0.002185,0.000354,0.014231,0.019659,-0.000418,-0.005039,0.000989,0.023878,0.03715,0.002887,-0.014651,0.001097,0.040952,0.065367,0.011577,-0.032604,0.002409,58.309827,58.487188,50.400562,49.079632,51.133238
75%,2792.85,1748215.0,7301.25,401755.0,1382.875,10287300.0,2.6,97.370003,0.003957,0.156731,0.005174,0.157758,0.003314,0.229412,0.008989,0.001759,0.146479,0.176139,0.146824,0.380045,0.064294,0.149601,0.186627,0.14961,0.370097,0.067548,2794.2125,7299.675,1379.1025,2.599,97.345501,2779.3825,7291.72,1339.8955,2.6015,97.4007,2796.998,7233.035,1330.329,2.6219,97.3651,2755.331,7094.47125,1308.01375,2.6038,97.23635,0.009394,0.012415,0.008577,0.018472,0.004143,0.027235,0.037679,0.022492,0.041484,0.009543,0.03998,0.056088,0.035978,0.052632,0.013037,0.058874,0.093216,0.05567,0.067747,0.022545,66.029821,67.107713,59.868666,58.04062,60.2099
max,3748.75,5499126.0,12885.5,1561506.0,2067.15,93241650.0,3.24,103.290001,0.097951,1915.297659,0.097086,7237.16,0.052675,inf,0.407407,0.020528,0.878947,0.859082,0.397823,2.900352,0.163166,0.594165,0.58576,0.278639,1.837842,0.106003,3704.995,12755.9,2004.6,3.203,103.08,3635.76,12327.25,1955.665,3.1468,101.886399,3504.256,11857.4975,1925.567,3.046,101.3811,3348.29075,11173.19625,1858.14425,2.9873,100.1988,0.085525,0.078206,0.064562,0.380117,0.043906,0.106824,0.131162,0.134966,0.349348,0.05633,0.126827,0.196343,0.170079,0.387905,0.087734,0.169092,0.303996,0.236456,0.468304,0.136585,89.812102,87.805381,93.585308,87.84625,87.297347


In [75]:
#we found infinity data in % changes, such that we replace inf into nan
macrodata_df = macrodata_df.replace([np.inf, -np.inf], np.nan)

In [76]:
macrodata_df.describe()

Unnamed: 0,spx_price,spx_volume,ndx_price,ndx_volume,gold_price,gold_volume,us10y_yield,dxy_price,spx_price_chg,spx_volume_chg,ndx_price_chg,ndx_volume_chg,gold_price_chg,gold_volume_chg,us10y_yield_chg,dxy_price_chg,spx_vol20d,ndx_vol20d,gold_vol20d,us10y_vol20d,dxy_vol20d,spx_vol60d,ndx_vol60d,gold_vol60d,us10y_vol60d,dxy_vol60d,spx_SMA10d,ndx_SMA10d,gold_SMA10d,us10y_SMA10d,dxy_SMA10d,spx_SMA50d,ndx_SMA50d,gold_SMA50d,us10y_SMA50d,dxy_SMA50d,spx_SMA100d,ndx_SMA100d,gold_SMA100d,us10y_SMA100d,dxy_SMA100d,spx_SMA200d,ndx_SMA200d,gold_SMA200d,us10y_SMA200d,dxy_SMA200d,spx_DiffSMA10d,ndx_DiffSMA10d,gold_DiffSMA10d,us10y_DiffSMA10d,dxy_DiffSMA10d,spx_DiffSMA50d,ndx_DiffSMA50d,gold_DiffSMA50d,us10y_DiffSMA50d,dxy_DiffSMA50d,spx_DiffSMA100d,ndx_DiffSMA100d,gold_DiffSMA100d,us10y_DiffSMA100d,dxy_DiffSMA100d,spx_DiffSMA200d,ndx_DiffSMA200d,gold_DiffSMA200d,us10y_DiffSMA200d,dxy_DiffSMA200d,spx_RSI,ndx_RSI,gold_RSI,us10y_RSI,dxy_RSI
count,2504.0,2504.0,2504.0,2504.0,2504.0,2504.0,2504.0,2504.0,2503.0,2503.0,2503.0,2503.0,2503.0,2494.0,2503.0,2503.0,2484.0,2484.0,2484.0,2484.0,2484.0,2444.0,2444.0,2444.0,2444.0,2444.0,2495.0,2495.0,2495.0,2495.0,2495.0,2455.0,2455.0,2455.0,2455.0,2455.0,2405.0,2405.0,2405.0,2405.0,2405.0,2305.0,2305.0,2305.0,2305.0,2305.0,2495.0,2495.0,2495.0,2495.0,2495.0,2455.0,2455.0,2455.0,2455.0,2455.0,2405.0,2405.0,2405.0,2405.0,2405.0,2305.0,2305.0,2305.0,2305.0,2305.0,2490.0,2490.0,2490.0,2490.0,2490.0
mean,2379.304233,1377997.0,5820.723243,303767.9,1346.505292,8495691.0,2.143219,92.586647,0.000424,15.519807,0.000679,26.985114,8.1e-05,0.100752,9.4e-05,5.5e-05,0.120155,0.147343,0.125194,0.336973,0.056134,0.127484,0.15438,0.128704,0.346216,0.057265,2378.554118,5813.718818,1344.968792,2.145825,92.613332,2375.353551,5784.664678,1338.227293,2.157538,92.730133,2372.576919,5753.422891,1329.788281,2.173492,92.847063,2368.222951,5696.278556,1314.601038,2.208563,93.053734,0.001732,0.002847,0.000269,-0.000932,0.000221,0.009199,0.015521,0.001511,-0.005379,0.001265,0.017794,0.031087,0.004236,-0.010268,0.002427,0.03351,0.061212,0.013872,-0.033915,0.005482,56.699083,57.439794,50.518187,49.499266,51.13598
std,529.277792,738761.8,2308.660912,204458.8,198.968373,5201556.0,0.624826,6.688785,0.009522,118.937357,0.010957,246.163253,0.008631,0.566545,0.027467,0.003746,0.093702,0.095856,0.056602,0.285593,0.020039,0.084301,0.083994,0.04796,0.268796,0.016934,524.864588,2288.020931,196.955189,0.621705,6.663339,506.261164,2201.760998,188.635109,0.608214,6.561145,487.161049,2105.500927,177.068752,0.586965,6.452187,455.741751,1921.739827,151.914303,0.528758,6.214281,0.014836,0.016708,0.014437,0.041375,0.006479,0.032688,0.036356,0.032796,0.089345,0.014143,0.041406,0.047255,0.046229,0.131756,0.021504,0.047953,0.059514,0.059557,0.183162,0.033227,12.708157,12.690238,14.25844,12.853016,13.353612
min,1452.25,445.0,2705.0,8.0,1049.4,0.0,0.52,79.129997,-0.103732,-0.999662,-0.108423,-0.999933,-0.091501,-1.0,-0.27027,-0.023702,0.024324,0.039317,0.045422,0.081897,0.019277,0.038493,0.053779,0.055389,0.134703,0.02351,1459.23,2726.475,1063.585,0.555,79.389,1491.732,2738.73,1071.474,0.6286,79.7964,1523.0785,2767.9025,1099.556,0.6595,79.9844,1584.21575,2885.09125,1116.00425,0.697,80.2386,-0.146944,-0.135145,-0.096546,-0.439834,-0.028166,-0.270956,-0.212771,-0.136948,-0.638941,-0.038172,-0.291502,-0.204563,-0.183337,-0.675851,-0.05174,-0.274406,-0.17559,-0.121931,-0.682474,-0.065447,14.816579,16.488986,11.402867,5.049022,14.427694
25%,1982.6875,1054188.0,4169.4375,182126.8,1226.5,5343413.0,1.83,89.292498,-0.001873,-0.149519,-0.002479,-0.15842,-0.003089,-0.215514,-0.010059,-0.001671,0.067032,0.089936,0.08664,0.195672,0.043522,0.082413,0.105722,0.090326,0.205259,0.046748,1979.3875,4192.7625,1226.61,1.825,89.546,1976.4025,4226.4075,1230.0225,1.838,89.8593,1999.664,4289.1925,1222.1575,1.89,90.478899,2012.149,4345.27375,1235.1425,1.99015,91.85735,-0.003666,-0.004789,-0.007787,-0.020709,-0.003689,-0.003067,-0.000815,-0.017935,-0.041945,-0.007967,0.00384,0.009661,-0.024419,-0.06721,-0.010837,0.01596,0.031943,-0.029234,-0.118613,-0.015004,47.96655,48.434488,40.882207,40.470302,41.71308
50%,2260.75,1360594.0,4945.875,271695.0,1286.1,7345108.0,2.25,94.970001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.096247,0.121488,0.110318,0.267455,0.053075,0.106683,0.126129,0.11913,0.266385,0.054074,2259.125,4940.025,1287.15,2.245,94.993999,2258.955,4973.175,1288.461,2.2536,95.346601,2254.7025,5000.0225,1286.1595,2.2625,95.5298,2251.73425,5056.7125,1278.39375,2.27425,95.8887,0.002949,0.0046,0.000464,-0.002185,0.000354,0.014231,0.019659,-0.000418,-0.005039,0.000989,0.023878,0.03715,0.002887,-0.014651,0.001097,0.040952,0.065367,0.011577,-0.032604,0.002409,58.309827,58.487188,50.400562,49.079632,51.133238
75%,2792.85,1748215.0,7301.25,401755.0,1382.875,10287300.0,2.6,97.370003,0.003957,0.156731,0.005174,0.157758,0.003314,0.22881,0.008989,0.001759,0.146479,0.176139,0.146824,0.380045,0.064294,0.149601,0.186627,0.14961,0.370097,0.067548,2794.2125,7299.675,1379.1025,2.599,97.345501,2779.3825,7291.72,1339.8955,2.6015,97.4007,2796.998,7233.035,1330.329,2.6219,97.3651,2755.331,7094.47125,1308.01375,2.6038,97.23635,0.009394,0.012415,0.008577,0.018472,0.004143,0.027235,0.037679,0.022492,0.041484,0.009543,0.03998,0.056088,0.035978,0.052632,0.013037,0.058874,0.093216,0.05567,0.067747,0.022545,66.029821,67.107713,59.868666,58.04062,60.2099
max,3748.75,5499126.0,12885.5,1561506.0,2067.15,93241650.0,3.24,103.290001,0.097951,1915.297659,0.097086,7237.16,0.052675,6.709529,0.407407,0.020528,0.878947,0.859082,0.397823,2.900352,0.163166,0.594165,0.58576,0.278639,1.837842,0.106003,3704.995,12755.9,2004.6,3.203,103.08,3635.76,12327.25,1955.665,3.1468,101.886399,3504.256,11857.4975,1925.567,3.046,101.3811,3348.29075,11173.19625,1858.14425,2.9873,100.1988,0.085525,0.078206,0.064562,0.380117,0.043906,0.106824,0.131162,0.134966,0.349348,0.05633,0.126827,0.196343,0.170079,0.387905,0.087734,0.169092,0.303996,0.236456,0.468304,0.136585,89.812102,87.805381,93.585308,87.84625,87.297347


## Integrate data for both crypto data and macro market data

In [77]:
#merge dataframes
#see below documentation in details
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge
final_df = cryptos_df.merge(macrodata_df, how='inner', on='day')

In [78]:
final_df.tail()

Unnamed: 0,day,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume,btc_price_chg,btc_mktcap_chg,btc_volume_chg,eth_price_chg,eth_mktcap_chg,eth_volume_chg,xrp_price_chg,xrp_mktcap_chg,xrp_volume_chg,ltc_price_chg,ltc_mktcap_chg,ltc_volume_chg,btc_vol20d,eth_vol20d,xrp_vol20d,ltc_vol20d,btc_vol60d,eth_vol60d,xrp_vol60d,ltc_vol60d,btc_SMA10d,eth_SMA10d,xrp_SMA10d,ltc_SMA10d,btc_SMA50d,eth_SMA50d,xrp_SMA50d,ltc_SMA50d,btc_SMA100d,eth_SMA100d,xrp_SMA100d,ltc_SMA100d,btc_SMA200d,eth_SMA200d,xrp_SMA200d,ltc_SMA200d,btc_DiffSMA10d,eth_DiffSMA10d,xrp_DiffSMA10d,ltc_DiffSMA10d,btc_DiffSMA50d,eth_DiffSMA50d,xrp_DiffSMA50d,ltc_DiffSMA50d,btc_DiffSMA100d,eth_DiffSMA100d,xrp_DiffSMA100d,ltc_DiffSMA100d,btc_DiffSMA200d,eth_DiffSMA200d,xrp_DiffSMA200d,ltc_DiffSMA200d,btc_RSI,eth_RSI,xrp_RSI,ltc_RSI,spx_price,spx_volume,ndx_price,ndx_volume,gold_price,gold_volume,us10y_yield,dxy_price,spx_price_chg,spx_volume_chg,ndx_price_chg,ndx_volume_chg,gold_price_chg,gold_volume_chg,us10y_yield_chg,dxy_price_chg,spx_vol20d,ndx_vol20d,gold_vol20d,us10y_vol20d,dxy_vol20d,spx_vol60d,ndx_vol60d,gold_vol60d,us10y_vol60d,dxy_vol60d,spx_SMA10d,ndx_SMA10d,gold_SMA10d,us10y_SMA10d,dxy_SMA10d,spx_SMA50d,ndx_SMA50d,gold_SMA50d,us10y_SMA50d,dxy_SMA50d,spx_SMA100d,ndx_SMA100d,gold_SMA100d,us10y_SMA100d,dxy_SMA100d,spx_SMA200d,ndx_SMA200d,gold_SMA200d,us10y_SMA200d,dxy_SMA200d,spx_DiffSMA10d,ndx_DiffSMA10d,gold_DiffSMA10d,us10y_DiffSMA10d,dxy_DiffSMA10d,spx_DiffSMA50d,ndx_DiffSMA50d,gold_DiffSMA50d,us10y_DiffSMA50d,dxy_DiffSMA50d,spx_DiffSMA100d,ndx_DiffSMA100d,gold_DiffSMA100d,us10y_DiffSMA100d,dxy_DiffSMA100d,spx_DiffSMA200d,ndx_DiffSMA200d,gold_DiffSMA200d,us10y_DiffSMA200d,dxy_DiffSMA200d,spx_RSI,ndx_RSI,gold_RSI,us10y_RSI,dxy_RSI
2400,2020-12-27,26476.130137,491978600000.0,41995150000.0,636.742317,72391400000.0,14640530000.0,0.295383,13418020000.0,7431667000.0,129.757721,8586988000.0,8927412000.0,0.073163,0.073214,0.146692,0.016419,0.013561,0.115394,-0.071368,-0.047568,-0.434958,0.021838,0.021949,0.086009,0.578714,0.623479,2.214227,1.183245,0.551998,0.660465,1.863215,1.033421,23815.983676,630.676429,0.44859,113.675218,19124.40861,556.560606,0.464185,83.825214,15504.202963,465.488615,0.354512,66.945347,12926.582137,391.662966,0.294451,58.54228,0.111696,0.009618,-0.34153,0.141478,0.384416,0.144066,-0.363651,0.547956,0.707674,0.367901,-0.16679,0.938263,1.048193,0.62574,0.003167,1.216479,78.668659,57.648628,34.638574,68.753204,3695.0,355183.0,12704.5,171576.0,1875.0,3429273.0,0.94,90.410004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075605,0.10488,0.111572,0.361602,0.043643,0.145707,0.194912,0.163841,0.656438,0.047256,3694.86,12699.95,1875.93,0.942,90.264001,3601.183,12187.665,1861.551,0.9008,91.720401,3493.81,11816.585,1886.352,0.8164,92.5667,3331.84925,11100.16125,1854.588,0.7431,94.339075,3.8e-05,0.000358,-0.000496,-0.002123,0.001618,0.026052,0.042406,0.007225,0.043517,-0.014287,0.057585,0.075141,-0.006018,0.151396,-0.023299,0.108994,0.144533,0.011006,0.264971,-0.041648,60.200128,64.831582,56.086386,54.541246,39.35111
2401,2020-12-28,26423.228792,493427500000.0,56654980000.0,689.659857,78833070000.0,24721300000.0,0.284147,13068760000.0,7010556000.0,127.899088,8553328000.0,10006160000.0,-0.001998,0.002945,0.349084,0.083107,0.088984,0.688553,-0.038039,-0.026029,-0.056664,-0.014324,-0.00392,0.120836,0.576529,0.679078,2.214313,1.188913,0.546791,0.671132,1.864388,1.029946,24177.740975,635.235888,0.419412,116.278218,19356.504008,561.645436,0.464889,85.204897,15659.163748,468.553485,0.354853,67.740778,13009.323788,393.873556,0.294857,58.948498,0.092874,0.085675,-0.32251,0.09994,0.365083,0.227927,-0.388784,0.501077,0.687397,0.471891,-0.199254,0.888066,1.031099,0.750968,-0.036323,1.169675,78.204641,64.769956,33.963885,67.289803,3727.5,731435.0,12832.75,341802.0,1875.0,7730071.0,0.94,90.339996,0.008796,1.059319,0.010095,0.992132,0.0,1.254143,0.0,-0.000774,0.075127,0.108894,0.110267,0.302532,0.043696,0.144159,0.193564,0.163838,0.654896,0.047058,3698.24,12716.7,1878.235,0.944,90.253001,3610.438,12223.395,1861.414,0.902,91.646401,3495.2925,11820.7975,1885.6315,0.8192,92.5416,3335.84475,11118.23375,1855.44275,0.74435,94.292125,0.007912,0.009126,-0.001722,-0.004237,0.000964,0.032423,0.049852,0.007299,0.042129,-0.014255,0.066434,0.085608,-0.005638,0.147461,-0.02379,0.117408,0.154208,0.01054,0.262847,-0.041914,66.94325,70.210334,56.086386,54.541246,38.093544
2402,2020-12-29,27125.384121,503712200000.0,42186520000.0,732.957029,83575560000.0,22707050000.0,0.246134,11125030000.0,5788819000.0,130.638658,8661848000.0,7456440000.0,0.026573,0.020844,-0.255378,0.06278,0.060159,-0.081478,-0.133779,-0.148731,-0.174271,0.02142,0.012687,-0.254815,0.529077,0.647568,2.239522,1.122251,0.547594,0.678655,1.886685,1.02731,24578.222332,643.089549,0.385678,118.440612,19589.085367,567.197409,0.464738,86.597095,15819.577605,472.028334,0.354795,68.5618,13098.320724,396.378337,0.295138,59.384367,0.103635,0.139743,-0.361814,0.102989,0.384719,0.292243,-0.47038,0.50858,0.714672,0.552782,-0.306264,0.905415,1.070905,0.849135,-0.166035,1.199883,79.899348,69.31615,31.713353,68.358848,3719.9,972308.0,12841.0,391072.0,1874.3,5760324.0,0.94,90.010002,-0.002039,0.329316,0.000643,0.144148,-0.000373,-0.254816,0.0,-0.003653,0.075621,0.108821,0.110315,0.302532,0.045371,0.144309,0.193569,0.163671,0.650971,0.047,3698.95,12725.7,1876.59,0.944,90.272001,3619.541,12259.29,1861.263,0.9032,91.565801,3497.8765,11831.2025,1884.97,0.8223,92.5143,3339.80225,11136.3475,1856.294,0.7456,94.243525,0.005664,0.00906,-0.00122,-0.004237,-0.002902,0.027727,0.047451,0.007004,0.040744,-0.016991,0.063474,0.08535,-0.005661,0.143135,-0.027069,0.113808,0.153071,0.0097,0.26073,-0.044921,64.203851,70.522652,55.707426,54.541246,32.776404
2403,2020-12-30,27424.538955,509680300000.0,38081840000.0,735.590898,83885240000.0,17170430000.0,0.221407,10057610000.0,10590300000.0,129.628167,8581202000.0,5810650000.0,0.011029,0.011848,-0.097298,0.003593,0.003705,-0.243828,-0.100461,-0.095947,0.82944,-0.007735,-0.00931,-0.220721,0.529382,0.644061,2.236159,1.127708,0.547492,0.677487,1.89906,1.026195,24934.290743,650.716758,0.349903,119.33897,19830.869302,573.008202,0.464151,88.004319,15984.589728,475.674564,0.354546,69.387582,13188.095752,398.868324,0.295278,59.808963,0.099872,0.130432,-0.367232,0.086218,0.382922,0.283735,-0.522984,0.472975,0.715686,0.546416,-0.375518,0.868175,1.079492,0.844195,-0.250172,1.16737,80.591759,69.573372,30.306559,67.482775,3724.2,738427.0,12841.5,331658.0,1887.6,5557137.0,0.93,89.629997,0.001156,-0.240542,3.9e-05,-0.151926,0.007096,-0.035274,-0.010638,-0.004222,0.075195,0.107674,0.108447,0.284287,0.047027,0.144284,0.193588,0.161881,0.65004,0.046735,3700.745,12738.6,1877.375,0.942,90.233001,3628.015,12294.855,1861.217,0.9044,91.477401,3500.9435,11844.13,1884.583,0.8244,92.4826,3343.8095,11154.16,1857.21825,0.7466,94.190475,0.006338,0.008078,0.005446,-0.012739,-0.006683,0.026512,0.044461,0.014175,0.028306,-0.020195,0.06377,0.084208,0.001601,0.128093,-0.030845,0.11376,0.151275,0.016359,0.245647,-0.048418,65.074671,70.542809,61.08723,51.596833,27.940014
2404,2020-12-31,28837.288529,535967300000.0,43341140000.0,752.855932,85790180000.0,13293870000.0,0.21207,9633436000.0,6788559000.0,129.244151,8556679000.0,5932641000.0,0.051514,0.051576,0.138105,0.023471,0.022709,-0.225769,-0.042174,-0.042174,-0.358983,-0.002962,-0.002858,0.020994,0.522009,0.62999,2.234735,1.111921,0.553237,0.677847,1.9016,1.026327,25466.187754,662.050808,0.315103,120.76877,20102.04088,579.068996,0.463321,89.431054,16168.567389,479.790352,0.35435,70.247589,13284.934828,401.441136,0.295374,60.229597,0.132376,0.137157,-0.326984,0.070179,0.434545,0.300114,-0.542283,0.445182,0.78354,0.569135,-0.401524,0.839838,1.170676,0.875383,-0.28203,1.145858,83.485009,71.273342,29.769512,67.130674,3748.75,917212.0,12885.5,277076.0,1887.6,7498356.0,0.93,89.940002,0.006592,0.242116,0.003426,-0.164573,0.0,0.34932,0.0,0.003459,0.077824,0.107712,0.107605,0.273369,0.048391,0.144503,0.193317,0.160006,0.643106,0.046655,3704.995,12755.9,1878.16,0.94,90.225002,3635.76,12327.25,1860.803,0.905,91.405201,3504.256,11857.4975,1884.196,0.8265,92.454,3348.29075,11173.19625,1858.14425,0.7478,94.140525,0.01181,0.01016,0.005026,-0.010638,-0.003159,0.031077,0.045286,0.014401,0.027624,-0.01603,0.069771,0.086696,0.001807,0.125227,-0.027192,0.119601,0.153251,0.015852,0.243648,-0.04462,69.61893,72.335582,61.08723,51.596833,36.209526


Verifying dataset

In [79]:
#check shape of table.
final_df.shape

(2405, 140)

In [80]:
#num of data and datatype for each columns
final_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2405 entries, 0 to 2404
Columns: 140 entries, day to dxy_RSI
dtypes: datetime64[ns](1), float64(139)
memory usage: 2.6 MB


In [81]:
#checking num of null data
final_df.isnull().sum()

day                    0
btc_price              0
btc_mktcap             0
btc_volume             0
eth_price            713
eth_mktcap           713
eth_volume           713
xrp_price             84
xrp_mktcap            84
xrp_volume            84
ltc_price              0
ltc_mktcap             0
ltc_volume             0
btc_price_chg          1
btc_mktcap_chg         1
btc_volume_chg       210
eth_price_chg        714
eth_mktcap_chg       714
eth_volume_chg       714
xrp_price_chg         85
xrp_mktcap_chg        85
xrp_volume_chg       210
ltc_price_chg          1
ltc_mktcap_chg         1
ltc_volume_chg       210
btc_vol20d            18
eth_vol20d           730
xrp_vol20d           102
ltc_vol20d            18
btc_vol60d            52
eth_vol60d           764
xrp_vol60d           136
ltc_vol60d            52
btc_SMA10d             8
eth_SMA10d           720
xrp_SMA10d            92
ltc_SMA10d             8
btc_SMA50d            42
eth_SMA50d           755
xrp_SMA50d           126


In [82]:
#checking num of unique data
final_df.nunique()

day                  2405
btc_price            2401
btc_mktcap           2403
btc_volume           2196
eth_price            1692
eth_mktcap           1692
eth_volume           1692
xrp_price            2276
xrp_mktcap           2298
xrp_volume           2183
ltc_price            2401
ltc_mktcap           2402
ltc_volume           2195
btc_price_chg        2403
btc_mktcap_chg       2402
btc_volume_chg       2195
eth_price_chg        1691
eth_mktcap_chg       1690
eth_volume_chg       1691
xrp_price_chg        2298
xrp_mktcap_chg       2299
xrp_volume_chg       2178
ltc_price_chg        2402
ltc_mktcap_chg       2401
ltc_volume_chg       2194
btc_vol20d           2387
eth_vol20d           1675
xrp_vol20d           2302
ltc_vol20d           2387
btc_vol60d           2353
eth_vol60d           1641
xrp_vol60d           2269
ltc_vol60d           2353
btc_SMA10d           2397
eth_SMA10d           1685
xrp_SMA10d           2310
ltc_SMA10d           2397
btc_SMA50d           2363
eth_SMA50d  

In [83]:
#checking duplicated data
final_df.duplicated().sum()

0

In [84]:
final_df.describe()

Unnamed: 0,btc_price,btc_mktcap,btc_volume,eth_price,eth_mktcap,eth_volume,xrp_price,xrp_mktcap,xrp_volume,ltc_price,ltc_mktcap,ltc_volume,btc_price_chg,btc_mktcap_chg,btc_volume_chg,eth_price_chg,eth_mktcap_chg,eth_volume_chg,xrp_price_chg,xrp_mktcap_chg,xrp_volume_chg,ltc_price_chg,ltc_mktcap_chg,ltc_volume_chg,btc_vol20d,eth_vol20d,xrp_vol20d,ltc_vol20d,btc_vol60d,eth_vol60d,xrp_vol60d,ltc_vol60d,btc_SMA10d,eth_SMA10d,xrp_SMA10d,ltc_SMA10d,btc_SMA50d,eth_SMA50d,xrp_SMA50d,ltc_SMA50d,btc_SMA100d,eth_SMA100d,xrp_SMA100d,ltc_SMA100d,btc_SMA200d,eth_SMA200d,xrp_SMA200d,ltc_SMA200d,btc_DiffSMA10d,eth_DiffSMA10d,xrp_DiffSMA10d,ltc_DiffSMA10d,btc_DiffSMA50d,eth_DiffSMA50d,xrp_DiffSMA50d,ltc_DiffSMA50d,btc_DiffSMA100d,eth_DiffSMA100d,xrp_DiffSMA100d,ltc_DiffSMA100d,btc_DiffSMA200d,eth_DiffSMA200d,xrp_DiffSMA200d,ltc_DiffSMA200d,btc_RSI,eth_RSI,xrp_RSI,ltc_RSI,spx_price,spx_volume,ndx_price,ndx_volume,gold_price,gold_volume,us10y_yield,dxy_price,spx_price_chg,spx_volume_chg,ndx_price_chg,ndx_volume_chg,gold_price_chg,gold_volume_chg,us10y_yield_chg,dxy_price_chg,spx_vol20d,ndx_vol20d,gold_vol20d,us10y_vol20d,dxy_vol20d,spx_vol60d,ndx_vol60d,gold_vol60d,us10y_vol60d,dxy_vol60d,spx_SMA10d,ndx_SMA10d,gold_SMA10d,us10y_SMA10d,dxy_SMA10d,spx_SMA50d,ndx_SMA50d,gold_SMA50d,us10y_SMA50d,dxy_SMA50d,spx_SMA100d,ndx_SMA100d,gold_SMA100d,us10y_SMA100d,dxy_SMA100d,spx_SMA200d,ndx_SMA200d,gold_SMA200d,us10y_SMA200d,dxy_SMA200d,spx_DiffSMA10d,ndx_DiffSMA10d,gold_DiffSMA10d,us10y_DiffSMA10d,dxy_DiffSMA10d,spx_DiffSMA50d,ndx_DiffSMA50d,gold_DiffSMA50d,us10y_DiffSMA50d,dxy_DiffSMA50d,spx_DiffSMA100d,ndx_DiffSMA100d,gold_DiffSMA100d,us10y_DiffSMA100d,dxy_DiffSMA100d,spx_DiffSMA200d,ndx_DiffSMA200d,gold_DiffSMA200d,us10y_DiffSMA200d,dxy_DiffSMA200d,spx_RSI,ndx_RSI,gold_RSI,us10y_RSI,dxy_RSI
count,2405.0,2405.0,2405.0,1692.0,1692.0,1692.0,2321.0,2321.0,2321.0,2405.0,2405.0,2405.0,2404.0,2404.0,2195.0,1691.0,1691.0,1691.0,2320.0,2320.0,2195.0,2404.0,2404.0,2195.0,2387.0,1675.0,2303.0,2387.0,2353.0,1641.0,2269.0,2353.0,2397.0,1685.0,2313.0,2397.0,2363.0,1650.0,2279.0,2363.0,2320.0,1607.0,2236.0,2320.0,2234.0,1522.0,2150.0,2234.0,2397.0,1685.0,2313.0,2397.0,2363.0,1650.0,2279.0,2363.0,2320.0,1607.0,2236.0,2320.0,2234.0,1522.0,2150.0,2234.0,2393.0,1680.0,2309.0,2393.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2396.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2305.0,2305.0,2305.0,2305.0,2305.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2405.0,2305.0,2305.0,2305.0,2305.0,2305.0,2405.0,2405.0,2405.0,2405.0,2405.0
mean,4114.282656,72085520000.0,7581565000.0,223.004546,22889800000.0,4209724000.0,0.198102,8006359000.0,742002500.0,39.345102,2248936000.0,905228800.0,0.002733,0.002918,0.207953,0.004864,0.005168,0.229501,0.004408,0.005625,0.216606,0.002482,0.003004,0.195438,0.56603,0.860814,0.916629,0.82521,0.597932,0.89952,0.989978,0.880906,4084.976595,222.374232,0.198476,39.28854,3991.870939,220.791343,0.19761,39.138261,3918.032414,220.384097,0.197902,39.216026,3831.939635,221.686227,0.199968,39.614568,0.010071,0.017028,0.010718,0.008373,0.05749,0.107999,0.070173,0.054075,0.121263,0.235331,0.139595,0.110576,0.237382,0.475498,0.226452,0.230532,53.650857,52.73599,48.559577,49.501238,2414.572328,1409805.0,5946.41632,313337.0,1336.122869,8299583.0,2.152973,93.047283,0.000407,11.501018,0.000692,19.840178,0.00014,0.099151,0.00013,4.3e-05,0.120995,0.148724,0.125208,0.339619,0.056382,0.128002,0.155207,0.128763,0.347717,0.057391,2410.547651,5927.725229,1335.28516,2.154449,93.033481,2393.028895,5847.413358,1332.637613,2.162011,92.963792,2372.576919,5753.422891,1329.788281,2.173492,92.847063,2368.222951,5696.278556,1314.601038,2.208563,93.053734,0.001664,0.002896,0.00055,-0.000791,0.000169,0.009008,0.015627,0.002322,-0.004629,0.00101,0.017794,0.031087,0.004236,-0.010268,0.002427,0.03351,0.061212,0.013872,-0.033915,0.005482,56.509815,57.570049,50.95623,49.545981,50.880224
std,4704.051431,85089580000.0,12682940000.0,232.057123,23349270000.0,5961083000.0,0.298007,11794150000.0,1591516000.0,49.927728,2892431000.0,1458081000.0,0.041718,0.041711,4.941823,0.064778,0.064682,4.010159,0.080239,0.094157,0.994898,0.066201,0.066345,2.07819,0.309704,0.416561,0.794847,0.591435,0.249778,0.319833,0.684561,0.520054,4606.090507,230.128713,0.294704,49.527335,4327.694127,221.140561,0.272053,47.683982,4107.132667,211.546525,0.255722,46.20115,3840.489334,196.888003,0.239227,43.628809,0.071775,0.112492,0.149088,0.119746,0.227878,0.357708,0.490641,0.408655,0.372778,0.620483,0.815916,0.649171,0.607522,1.075758,1.287129,0.966787,14.538026,14.835662,13.816832,13.355535,510.042246,717532.0,2269.270857,202031.8,195.562655,4634840.0,0.635336,6.414226,0.009632,102.994205,0.011093,221.83727,0.008487,0.557976,0.027836,0.003763,0.094986,0.096995,0.055094,0.289809,0.020273,0.08487,0.084414,0.047819,0.270704,0.017037,507.316133,2251.783078,193.606813,0.631373,6.411312,496.264484,2180.634371,186.471453,0.613686,6.422596,487.161049,2105.500927,177.068752,0.586965,6.452187,455.741751,1921.739827,151.914303,0.528758,6.214281,0.015037,0.016937,0.014059,0.041771,0.006538,0.032974,0.036703,0.032267,0.08987,0.014109,0.041406,0.047255,0.046229,0.131756,0.021504,0.047953,0.059514,0.059557,0.183162,0.033227,12.77596,12.826693,14.122182,12.81996,13.330639
min,68.0831,774804400.0,0.0,0.432979,0.0,87074.8,0.002686,21944810.0,0.0,1.148851,38286160.0,0.0,-0.351903,-0.357757,-0.995927,-0.48331,-0.482741,-0.989981,-0.598844,-0.406699,-0.899513,-0.42144,-0.420948,-0.992147,0.096839,0.225407,0.142178,0.087057,0.161956,0.402514,0.290551,0.145117,79.01016,0.514582,0.003276,1.337361,94.91404,0.728577,0.003874,1.46382,105.491797,0.816918,0.004498,1.634103,131.147053,1.367936,0.004813,1.920646,-0.36927,-0.463905,-0.562164,-0.444169,-0.488876,-0.496007,-0.574555,-0.600129,-0.499647,-0.556866,-0.732853,-0.653845,-0.602839,-0.723483,-0.834913,-0.757695,10.477495,15.836875,14.349813,9.800285,1568.25,445.0,2830.5,8.0,1049.4,0.0,0.52,79.139999,-0.103732,-0.999662,-0.108423,-0.999933,-0.058459,-1.0,-0.27027,-0.023702,0.024324,0.039317,0.045422,0.081897,0.019277,0.038493,0.053779,0.055389,0.134703,0.02351,1561.445,2800.375,1063.585,0.555,79.389,1554.425,2797.075,1071.474,0.6286,79.7964,1523.0785,2767.9025,1099.556,0.6595,79.9844,1584.21575,2885.09125,1116.00425,0.697,80.2386,-0.146944,-0.135145,-0.081261,-0.439834,-0.028166,-0.270956,-0.212771,-0.136948,-0.638941,-0.038172,-0.291502,-0.204563,-0.183337,-0.675851,-0.05174,-0.274406,-0.17559,-0.121931,-0.682474,-0.065447,14.816579,16.488986,13.989584,5.049022,14.427694
25%,416.85,5852736000.0,65970400.0,12.662177,1042574000.0,23016560.0,0.006887,217258700.0,440223.0,3.750715,152978500.0,7668541.0,-0.012635,-0.012172,-0.1578,-0.023281,-0.022992,-0.172084,-0.02084,-0.020336,-0.210377,-0.021997,-0.021686,-0.192863,0.349377,0.575329,0.473138,0.47416,0.42371,0.636714,0.530108,0.558239,417.6182,12.679168,0.007001,3.754133,418.25724,12.539293,0.007434,3.792053,418.939112,12.401438,0.007397,3.835785,462.745821,23.858944,0.007398,3.910442,-0.02316,-0.039033,-0.04652,-0.039103,-0.067357,-0.116126,-0.14322,-0.123092,-0.088855,-0.127257,-0.187691,-0.205177,-0.143771,-0.180279,-0.224708,-0.364713,43.529727,42.132678,39.206354,40.507221,2011.5,1076239.0,4267.0,189033.0,1223.5,5283031.0,1.83,90.269997,-0.001882,-0.147387,-0.002423,-0.15611,-0.003087,-0.213551,-0.009967,-0.001677,0.066545,0.090543,0.086787,0.194899,0.043612,0.082095,0.106355,0.090757,0.204653,0.046719,2016.675,4255.95,1224.245,1.82,90.136,1994.584,4251.165,1228.815,1.8356,90.2656,1999.664,4289.1925,1222.1575,1.89,90.478899,2012.149,4345.27375,1235.1425,1.99015,91.85735,-0.003814,-0.00489,-0.007649,-0.020556,-0.003776,-0.003271,-0.00108,-0.017237,-0.041227,-0.008247,0.00384,0.009661,-0.024419,-0.06721,-0.010837,0.01596,0.031943,-0.029234,-0.118613,-0.015004,47.685018,48.325519,41.374045,40.675947,41.296527
50%,1078.274711,17045270000.0,1275837000.0,180.689036,19139500000.0,1289734000.0,0.038668,1271107000.0,25374500.0,20.3577,579183800.0,198538700.0,0.002066,0.002247,-0.010113,-0.000393,-0.000326,-0.008792,-0.000979,-0.001147,-0.001478,-0.0014,-0.000734,-0.019072,0.499786,0.758069,0.654434,0.681702,0.546253,0.800722,0.7451,0.758104,1065.331541,180.082761,0.040359,20.89254,1111.794083,183.070737,0.053489,19.970132,1064.947305,182.958115,0.102172,19.117613,1204.942182,185.108743,0.113152,16.706857,0.005522,0.005767,-0.007834,-0.002322,0.026366,0.03786,-0.035081,-0.014777,0.050999,0.079337,-0.029989,-0.040815,0.122273,0.184571,-0.089969,-0.010312,52.302641,52.065813,46.696609,48.36553,2341.75,1365228.0,5347.5,281130.0,1283.8,7253340.0,2.27,95.169998,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.096397,0.12216,0.111391,0.268418,0.053394,0.106963,0.126857,0.119978,0.270603,0.054272,2342.4,5337.025,1284.3,2.268,95.243001,2298.55,5154.525,1286.432,2.2636,95.457,2254.7025,5000.0225,1286.1595,2.2625,95.5298,2251.73425,5056.7125,1278.39375,2.27425,95.8887,0.002781,0.004755,0.000797,-0.001848,0.000244,0.014043,0.020401,0.000373,-0.003485,0.000748,0.023878,0.03715,0.002887,-0.014651,0.001097,0.040952,0.065367,0.011577,-0.032604,0.002409,58.136571,58.766585,50.821506,49.163358,50.838553
75%,7538.557687,133091800000.0,6834120000.0,304.615866,30113020000.0,7406125000.0,0.294139,12447470000.0,1119363000.0,57.0756,3475383000.0,1040020000.0,0.018287,0.018437,0.201739,0.029326,0.028774,0.223966,0.02007,0.019726,0.300554,0.019358,0.019899,0.212435,0.691885,1.023211,1.088795,0.989577,0.761813,1.153832,1.144895,1.048385,7473.603178,302.67656,0.295263,57.085983,7680.194788,299.111426,0.288228,56.494825,7977.810134,297.535537,0.314797,55.091356,8010.441598,294.137078,0.335477,58.594597,0.041552,0.063793,0.034256,0.036742,0.136554,0.233961,0.101118,0.107085,0.233056,0.344559,0.138104,0.191243,0.399467,0.717216,0.157768,0.361977,63.146737,62.53508,56.294516,56.547594,2810.75,1758589.0,7376.5,409627.0,1346.25,10133570.0,2.62,97.419998,0.003924,0.150288,0.005244,0.145109,0.003302,0.224622,0.008969,0.001757,0.148275,0.178656,0.147126,0.384919,0.064697,0.151013,0.187271,0.149555,0.3708,0.067867,2807.0,7360.6,1342.45,2.613,97.427001,2788.098,7315.835,1334.118,2.6106,97.4316,2796.998,7233.035,1330.329,2.6219,97.3651,2755.331,7094.47125,1308.01375,2.6038,97.23635,0.009483,0.012629,0.008834,0.018868,0.004113,0.027333,0.038041,0.023223,0.042241,0.009158,0.03998,0.056088,0.035978,0.052632,0.013037,0.058874,0.093216,0.05567,0.067747,0.022545,65.922038,67.34223,60.481103,58.04062,59.899638
max,28837.288529,535967300000.0,81406690000.0,1410.000215,136747200000.0,74747420000.0,3.39845,131653000000.0,25054630000.0,360.661762,19609010000.0,10006160000.0,0.332556,0.332724,230.400992,0.552358,0.55305,162.802292,1.413959,2.429747,17.763785,0.932542,0.935298,86.427666,1.830729,3.926493,5.684042,5.01776,1.498939,2.399359,3.747004,3.366572,25466.187754,1278.33284,2.743761,317.17434,20102.04088,1031.401368,1.630031,248.98043,16168.567389,887.829676,1.260152,215.853822,13284.934828,725.406266,0.953323,173.169745,0.539251,0.831022,2.210126,2.205423,1.982207,2.416048,5.490504,8.261777,3.41994,3.506569,8.21509,11.658255,5.208597,6.468063,14.922883,13.823928,94.144798,93.080273,97.720733,97.491189,3748.75,5499126.0,12885.5,1561506.0,2067.15,47619170.0,3.24,103.290001,0.097951,1915.297659,0.097086,7237.16,0.052675,6.709529,0.407407,0.020528,0.878947,0.859082,0.397823,2.900352,0.163166,0.594165,0.58576,0.278639,1.837842,0.106003,3704.995,12755.9,2004.6,3.203,103.08,3635.76,12327.25,1955.665,3.1468,101.886399,3504.256,11857.4975,1925.567,3.046,101.3811,3348.29075,11173.19625,1858.14425,2.9873,100.1988,0.085525,0.078206,0.064562,0.380117,0.043906,0.106824,0.131162,0.134966,0.349348,0.05633,0.126827,0.196343,0.170079,0.387905,0.087734,0.169092,0.303996,0.236456,0.468304,0.136585,89.812102,87.805381,93.585308,87.84625,87.297347


## Finally, we create and save team12_cleandata.csv.

 This file could serve as the basic data infrastructure for whole process in our project.

In [85]:
#download final_df into csv file
final_df.to_csv('team12_cleandata.csv')