# Preprocessing

We will need to preprocess obtained data so that it can be used for our purposes.

In [1]:
from quanvia.finance import utils

asset_list = utils.get_binance_assets()
len(asset_list)

1885

In [2]:
asset_list[:5]

['ETHBTC', 'LTCBTC', 'BNBBTC', 'NEOBTC', 'QTUMETH']

To many assets for a local machine, let us try with crypto to USD prices.

In [3]:
new_asset_list = [x for x in asset_list if x.endswith("USDT")]
len(new_asset_list)

377

In [4]:
new_asset_list[:5]

['BTCUSDT', 'ETHUSDT', 'BNBUSDT', 'BCCUSDT', 'NEOUSDT']

Well, I think for this trial we will just choose a fixed set:

In [5]:
asset_list = ['BNBUSDT','BTCUSDT','ETHUSDT','SOLUSDT','ADAUSDT','XRPUSDT','DOTUSDT','DOGEUSDT']

In [6]:
df = utils.get_binance_data(asset_list)

In [7]:
df

Unnamed: 0,Asset,Open time,Open,High,Low,Close,Volume,Closing time,Quote asset vol,Num traders,Taker buy base asset vol,Taker buy quote asset vol,To be ignored
0,BNBUSDT,1610064000000,43.57280000,43.72200000,40.23130000,42.35600000,3548923.78800000,1610150399999,149951576.85134530,411558,1804018.63200000,76246982.43869240,0
1,BNBUSDT,1610150400000,42.34500000,44.05520000,41.50000000,43.84790000,2720363.63600000,1610236799999,116290473.53861750,294683,1458819.25300000,62429021.61184670,0
2,BNBUSDT,1610236800000,43.84790000,45.16200000,40.00000000,42.40310000,4277406.29000000,1610323199999,185165251.13206460,431771,2147084.16200000,93038871.70609660,0
3,BNBUSDT,1610323200000,42.40330000,42.50940000,35.03740000,38.16740000,6332801.05500000,1610409599999,243017347.96947130,664128,3174405.76900000,121735208.64152900,0
4,BNBUSDT,1610409600000,38.16230000,40.19890000,37.00000000,38.25410000,3261261.81000000,1610495999999,125897888.04795700,342953,1694961.86800000,65468036.38605400,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
361,DOGEUSDT,1641254400000,0.17030000,0.17260000,0.16650000,0.16840000,496439208.00000000,1641340799999,84445791.17980000,127545,258358537.00000000,43963449.95720000,0
362,DOGEUSDT,1641340800000,0.16850000,0.17100000,0.14720000,0.15900000,1084632258.00000000,1641427199999,175586840.15730000,307998,519468771.00000000,84347970.51660000,0
363,DOGEUSDT,1641427200000,0.15910000,0.16210000,0.15390000,0.15990000,705767171.00000000,1641513599999,111252225.62060000,163053,331392118.00000000,52235681.79080000,0
364,DOGEUSDT,1641513600000,0.16000000,0.16040000,0.14890000,0.15500000,1079690606.00000000,1641599999999,165985187.71820000,238457,524873451.00000000,80687647.18760000,0


We would like to extract the mean estimator for the return of investment between intervals (1 day ahead) so we will compute the mean of the expected return for each asset series taking the value at closing time as the reference. Equally, we would like to better understand how different assets correlate so that it helps diversify our portfolio, so the covariance between their values will also be considered in our analysis.

In [8]:
import numpy as np

exp_ret = {}
return_list = []
for asset in asset_list:
    open_price = np.array(df[df["Asset"] == asset]["Open"].astype("float"))
    close_price = np.array(df[df["Asset"] == asset]["Close"].astype("float"))
    # Sign will be used to indicate the value gradient direction
    returns = ((close_price - open_price)/open_price)
    exp_ret[asset] = returns.mean()
    return_list.append(returns)
    
# Compute covariance between returns
cov_mat = np.cov(np.vstack(return_list))

In [9]:
exp_ret

{'BNBUSDT': 0.009221480138738115,
 'BTCUSDT': 0.0010454720329607131,
 'ETHUSDT': 0.004171347506161309,
 'SOLUSDT': 0.014729386544389053,
 'ADAUSDT': 0.006050362742214893,
 'XRPUSDT': 0.005292852018928388,
 'DOTUSDT': 0.005569563926691609,
 'DOGEUSDT': 0.019111877337155296}

In [10]:
cov_mat

array([[0.00608177, 0.00199172, 0.00260677, 0.00320415, 0.00271352,
        0.00298901, 0.00330249, 0.00219796],
       [0.00199172, 0.00174221, 0.00179836, 0.00147756, 0.00163043,
        0.00183617, 0.00220397, 0.00267943],
       [0.00260677, 0.00179836, 0.00300272, 0.00234359, 0.00222711,
        0.00243351, 0.00311108, 0.00299023],
       [0.00320415, 0.00147756, 0.00234359, 0.0069064 , 0.00232973,
        0.0023053 , 0.00279444, 0.00227812],
       [0.00271352, 0.00163043, 0.00222711, 0.00232973, 0.0044516 ,
        0.00265363, 0.00321785, 0.00336979],
       [0.00298901, 0.00183617, 0.00243351, 0.0023053 , 0.00265363,
        0.0062716 , 0.00319   , 0.0026517 ],
       [0.00330249, 0.00220397, 0.00311108, 0.00279444, 0.00321785,
        0.00319   , 0.00593995, 0.00348343],
       [0.00219796, 0.00267943, 0.00299023, 0.00227812, 0.00336979,
        0.0026517 , 0.00348343, 0.05370026]])

There we have the expected results on investing on each cryptocurrency by itself and the mutual effect when investing in more than one of them.