## Step 0. Import usual libraries

In [20]:
import pandas as pd
import numpy as np
import poloniex
import datetime
#import ffn

pd.set_option('display.max_rows', 15)

## Step 1. Importing from Poloniex
Poloniex is a cryptocurrency exchange found on https://poloniex.com/exchange, to download the data we use a package made by a poloniex community member.

Importing from API

* To import poloniex we need to install the package in the console using:

`pip install poloniex`
* The help function gives a list of functions included in the package and some descriptions

`help(poloniex.poloniex)`
* We are using the public data so no keys are needed

The package has many functionalities but the information we want is public, therefore we do not need access tokens or secret keys. The currency we are looking for is BitCoin relative to the USD, therefore looking at the documentation we know that we need to search using the term `'USDT_BTC'`, with an additional parameter of seconds between measurements. We chose to take the 24 hours frequency.

In [21]:
polo = poloniex.Poloniex()
#btc300 = pd.DataFrame(polo.returnChartData("USDT_BTC", 300))
btc86400 = pd.DataFrame(polo.returnChartData("USDT_BTC", 86400))

Quick look at the data:

In [22]:
btc86400

Unnamed: 0,close,date,high,low,open,quoteVolume,volume,weightedAverage
0,244.000000,1424304000,0.330000,225.000000,0.330000,0.193117,4.627631e+01,239.627778
1,240.250000,1424390400,245.000000,240.250000,240.250118,0.230429,5.589490e+01,242.568479
2,245.000000,1424476800,245.000000,245.000000,245.000000,0.060091,1.472224e+01,245.000000
3,235.000000,1424563200,249.000000,235.000000,245.000000,0.539055,1.291212e+02,239.532608
4,235.000000,1424649600,235.001000,235.000000,235.000002,0.410926,9.656756e+01,235.000062
5,239.750000,1424736000,239.750000,235.000000,235.000000,0.626749,1.491544e+02,237.981177
6,237.750000,1424822400,239.750000,237.750000,239.750000,0.927550,2.215841e+02,238.891722
...,...,...,...,...,...,...,...,...
1303,6484.078126,1536883200,6580.814013,6382.791500,6483.875043,813.653568,5.278847e+06,6487.830986
1304,6516.581844,1536969600,6567.515680,6474.352125,6478.638200,378.183861,2.464325e+06,6516.209888


## Step 2. Pre-processing

Transform the unix timestamp to datetime:

In [23]:
btc86400.date = pd.to_datetime(btc86400.date, unit = 's')

Setting the date as index:

In [24]:
btc86400.set_index('date', inplace = True)

Taking the period 2017-05-01 to 2018-05-02 (one more day for lagged purposes):

In [25]:
btc_data = btc86400["2017-04-16":"2018-05-02"]
btc_data

Unnamed: 0_level_0,close,high,low,open,quoteVolume,volume,weightedAverage
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-04-16,1212.500000,1217.000000,1190.000000,1193.948393,2691.617745,3.253026e+06,1208.576688
2017-04-17,1239.000000,1240.486240,1205.260000,1212.560000,6124.053002,7.497022e+06,1224.192859
2017-04-18,1268.048100,1280.000000,1236.500002,1239.000000,7500.093280,9.455076e+06,1260.661076
2017-04-19,1262.890000,1293.000000,1257.000000,1265.000000,5163.500413,6.540120e+06,1266.605789
2017-04-20,1306.650000,1328.000000,1260.000000,1262.890000,6683.057880,8.629089e+06,1291.188708
2017-04-21,1328.000000,1335.000000,1305.550000,1306.650000,5175.246810,6.823844e+06,1318.554237
2017-04-22,1346.000000,1349.000000,1291.634036,1328.000000,5207.997420,6.871016e+06,1319.320168
...,...,...,...,...,...,...,...
2018-04-26,9286.000000,9299.000000,8652.585323,8876.899999,2378.235826,2.113739e+07,8887.845739
2018-04-27,8920.010000,9386.000000,8920.000000,9286.000000,1990.549767,1.831713e+07,9202.047096


We decide to take only close and volume since the other columns are extremly correlated to these and may not provide more useful information for our purpose.

In [26]:
returns_data = btc_data[["close"]]

Now let's look at the returns (indeed, the close price itself is not particularly interesting):

In [27]:
returns_data = returns_data.pct_change()

In [28]:
returns_data

Unnamed: 0_level_0,close
date,Unnamed: 1_level_1
2017-04-16,
2017-04-17,0.021856
2017-04-18,0.023445
2017-04-19,-0.004068
2017-04-20,0.034651
2017-04-21,0.016339
2017-04-22,0.013554
...,...
2018-04-26,0.048078
2018-04-27,-0.039413


We are going to add the volatility of the last 14 days:

In [29]:
volatility_data = returns_data.rolling(14).std()
volatility_data = volatility_data.rename(columns = {'close': 'volatility_14'})
volatility_data

Unnamed: 0_level_0,volatility_14
date,Unnamed: 1_level_1
2017-04-16,
2017-04-17,
2017-04-18,
2017-04-19,
2017-04-20,
2017-04-21,
2017-04-22,
...,...
2018-04-26,0.042592
2018-04-27,0.044552


In [30]:
returns_data = returns_data.shift(-1)
returns_data = returns_data.dropna()

Let's rename the columns for later purposes:

In [31]:
returns_data.columns = ["return_day+1"]

In [32]:
returns_data

Unnamed: 0_level_0,return_day+1
date,Unnamed: 1_level_1
2017-04-16,0.021856
2017-04-17,0.023445
2017-04-18,-0.004068
2017-04-19,0.034651
2017-04-20,0.016339
2017-04-21,0.013554
2017-04-22,0.004601
...,...
2018-04-25,0.048078
2018-04-26,-0.039413


Now we put back together the returns at day+1 with the close and volume information from earlier.
We also dropna() for deleting the day 2018-05-02.

In [33]:
data = pd.concat((returns_data, btc_data[["close","volume"]]), axis = 1)
data = pd.concat((data, volatility_data), axis = 1)

In [34]:
data = data.dropna()
data = data["may-2017":"may-2018"]

Creating a csv file for later purposes and checking that reading it works.

In [35]:
data.to_csv("Data/poloniex_data.csv")

In [36]:
data = pd.read_csv("poloniex_data.csv", index_col = 'date')

In [37]:
data

Unnamed: 0_level_0,return_day+1,close,volume,volatility_14
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-05-01,0.020854,1530.000000,2.003840e+07,0.020241
2017-05-02,0.034389,1561.907000,1.157105e+07,0.020173
2017-05-03,-0.007255,1615.620000,1.506086e+07,0.019967
2017-05-04,-0.037720,1603.898572,2.632924e+07,0.020404
2017-05-05,0.035510,1543.400000,3.239718e+07,0.024767
2017-05-06,0.013731,1598.205817,2.139785e+07,0.025617
2017-05-07,0.052788,1620.150000,3.042350e+07,0.025512
...,...,...,...,...
2018-04-25,0.048078,8860.024074,3.846143e+07,0.053795
2018-04-26,-0.039413,9286.000000,2.113739e+07,0.042592


In [38]:
##