# Leveraged ETF Pairs Selection

* Two strategies will be considered.
    1. Uncorrelated Leveraged ETF Pairs
    2. Inverse Correlated Leveraged ETF Pairs

* Only broad based index Leveraged ETFs will be considered.
    1. Exclude undesired categories. Review categories and decide on exclusion. For example, Inverse, Bear Market, Volatility, and Energy Limited Partnership categories.
    2. Exclude Sector ETF (separate strategy)
    3. Remove any ETF pair manually that is not broad based index Leveraged ETF

* Select Top 20 ETFs by 10-day average daily volume
    
* Calculate daily percent price change correlation between each pair of ETF 

* Narrow a list of pairs for further analysis

In [5]:
import datetime
import pandas as pd
import numpy as np
#import yfinance as yf # https://github.com/ranaroussi/yfinance

from tqdm import tqdm_notebook

import pymysql
import sqlalchemy as db
from sqlalchemy import create_engine

import matplotlib
import matplotlib.pyplot as plt

# connect to DB
engine = create_engine(
    "mysql+pymysql://root:root@127.0.0.1:8889/trading?unix_socket=/Applications/MAMP/tmp/mysql/mysql.sock")

In [6]:
# ETF catgories available
query = "SELECT DISTINCT(category) FROM etf_info;"
result = engine.execute(query)
category_all = [row[0] for row in result]
print(category_all)

['Trading--Leveraged Commodities', 'Energy Limited Partnership', 'Trading--Leveraged Equity', 'Trading--Inverse Equity', 'Trading--Inverse Commodities', 'Trading--Miscellaneous', 'Multicurrency', 'Trading--Leveraged Debt', 'Trading--Inverse Debt', 'Bear Market', 'Large Growth', 'Europe Stock', 'World Stock', 'Trading--Multiasset', 'Trading--Leveraged Real Estate', 'Closed End Fund', 'Volatility']


In [7]:
# Get DF of general info on ETF excluding categories not interested in
category_excl = ['Trading--Inverse Equity', 'Trading--Inverse Commodities', 'Trading--Inverse Debt', 
                 'Bear Market', 'Energy Limited Partnership']

query = "SELECT * FROM etf_info WHERE category NOT IN ({});".format(
    ', '.join("'{}'".format(str(c)) for c in category_excl))
info_df = pd.read_sql(query, engine, index_col='symbol', parse_dates=True)
print('Number of selected ETFs: {}'.format(info_df.shape[0]))

Number of selected ETFs: 121


In [8]:
# Manually build sector ETF list to be excluded, guessing from names of ETFs
sector_etf = ['DIG', 'ROM', 'RXL', 'UGE', 'UPW', 'USD', 'UYG', 'UYM', 'UCC', 'URE', 'UXI', 
              'LTL', 'BDD', 'ERX', 'FAS', 'UCO', 'TECL', 'DRN', 'DAG', 'BIB', 'RETL', 'NUGT', 
              'CURE', 'BOIL', 'JNUG', 'LABU', 'GUSH', 'NAIL', 'DPST', 'DUSL', 'UTSL', 'PILL', 
              'NEED', 'WANT', 'TAWK', 'XCOM', 'BNKU', 'WEBL', 'FNGU', 'FNGO']
bear_etf = ['UDN']
short_etf = ['DRR', 'EUO', 'YCS', 'EUFX', 'CROC']

excl_etf = sector_etf + bear_etf + short_etf
print('Number of ETF to be excluded: {}'.format(len(excl_etf)))

Number of ETF to be excluded: 46


In [9]:
# Exclude sector, bear and short ETFs
info_df.drop(excl_etf, inplace=True)
print('Number of remaining ETFs: {}'.format(info_df.shape[0]))

Number of remaining ETFs: 75


In [10]:
# Sort df by 10 day average daily volume
info_df.sort_values(by = 'averageDailyVolume10Day', ascending = False, inplace = True)
info_df.head()

Unnamed: 0_level_0,totalAssets,category,longName,fundInceptionDate,beta3Year,threeYearAverageReturn,fiveYearAverageReturn,averageDailyVolume10Day
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
UVXY,929582300.0,Volatility,ProShares Ultra VIX Short-Term Futures ETF,2011-10-03,1.49,-0.46,-0.73,27661150
TQQQ,6503733000.0,Trading--Leveraged Equity,ProShares UltraPro QQQ,2010-02-09,3.32,0.53,0.46,26583762
TNA,1205505000.0,Trading--Leveraged Equity,Direxion Daily Small Cap Bull 3X Shares,2008-11-05,3.67,-0.22,-0.1,13300837
SPXL,1516379000.0,Trading--Leveraged Equity,Direxion Daily S&P500 Bull 3X Shares,2008-11-05,3.14,0.12,0.17,7541012
AGQ,246717800.0,Trading--Leveraged Commodities,ProShares Ultra Silver,2008-12-01,1.09,0.03,-0.03,4513075


In [11]:
# Statistics for numerical columns. Particular interest in daily volume. 
# About 50% of ETFs have daily volume > 5,618. Such low volume ETF will be difficult to trade.
# Though, 100,000 volume is preferred, we select 65,000 minimum volume to keep 20 ETFs for correlation analysis.
final_info_df = info_df.loc[info_df['averageDailyVolume10Day'] >= 65000]
print("Final Info DF has {} ETFs".format(final_info_df.shape[0]))
final_info_df

Final Info DF has 25 ETFs


Unnamed: 0_level_0,totalAssets,category,longName,fundInceptionDate,beta3Year,threeYearAverageReturn,fiveYearAverageReturn,averageDailyVolume10Day
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
UVXY,929582300.0,Volatility,ProShares Ultra VIX Short-Term Futures ETF,2011-10-03,1.49,-0.46,-0.73,27661150
TQQQ,6503733000.0,Trading--Leveraged Equity,ProShares UltraPro QQQ,2010-02-09,3.32,0.53,0.46,26583762
TNA,1205505000.0,Trading--Leveraged Equity,Direxion Daily Small Cap Bull 3X Shares,2008-11-05,3.67,-0.22,-0.1,13300837
SPXL,1516379000.0,Trading--Leveraged Equity,Direxion Daily S&P500 Bull 3X Shares,2008-11-05,3.14,0.12,0.17,7541012
AGQ,246717800.0,Trading--Leveraged Commodities,ProShares Ultra Silver,2008-12-01,1.09,0.03,-0.03,4513075
UPRO,1422949000.0,Trading--Leveraged Equity,ProShares UltraPro S&P500,2009-06-23,3.14,0.12,0.17,4352537
NRGU,142262300.0,Trading--Leveraged Equity,MicroSectors U.S. Big Oil Index 3X Leveraged ETNs,2019-04-09,,,,3887025
UDOW,857204200.0,Trading--Leveraged Equity,ProShares UltraPro Dow30,2010-02-09,3.1,0.03,0.14,2182312
QLD,2603412000.0,Trading--Leveraged Equity,ProShares Ultra QQQ,2006-06-19,2.14,0.42,0.36,1860937
SSO,2287876000.0,Trading--Leveraged Equity,ProShares Ultra S&P500,2006-06-19,2.06,0.14,0.16,1815112


In [12]:
# List of ETFs ticker for price history and correlation analysis
tickers = final_info_df.index.tolist()

In [13]:
# Incrementally build a DF of price for 20 Top tickers by volume 

# Initialize with date as index
query = "SELECT DISTINCT(trade_date) AS date FROM etf_history WHERE ticker in ({}) ORDER BY trade_date ASC;".format(str(tickers)[1:-1])
hist_df = pd.read_sql_query(sql = query, con = engine, index_col = 'date', parse_dates = True)

# populate with ticker
for ticker in tickers:
    query = "SELECT trade_date as date, adj_close as price FROM etf_history WHERE ticker = '{}';".format(ticker)
    hist_df[ticker] = pd.read_sql_query(sql = query, con = engine, index_col = 'date', parse_dates = True)
    
hist_df.max(axis = 0, skipna = True)
hist_df.head()

Unnamed: 0_level_0,UVXY,TQQQ,TNA,SPXL,AGQ,UPRO,NRGU,UDOW,QLD,SSO,...,UGL,UWM,CHAU,EDC,KORU,RUSL,INDL,MIDU,SMHB,RMM
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2006-06-21,,,,,,,,,3.961084,14.956375,...,,,,,,,,,,
2006-06-22,,,,,,,,,3.867883,14.797399,...,,,,,,,,,,
2006-06-23,,,,,,,,,3.856369,14.795311,...,,,,,,,,,,
2006-06-26,,,,,,,,,3.870624,14.881072,...,,,,,,,,,,
2006-06-27,,,,,,,,,3.725887,14.617502,...,,,,,,,,,,


In [14]:
# Calculate correlation of daily price change between ETFs
corr_df = hist_df.pct_change().corr()
corr_df.head()

Unnamed: 0,UVXY,TQQQ,TNA,SPXL,AGQ,UPRO,NRGU,UDOW,QLD,SSO,...,UGL,UWM,CHAU,EDC,KORU,RUSL,INDL,MIDU,SMHB,RMM
UVXY,1.0,-0.748467,-0.72322,-0.773848,-0.166672,-0.770747,-0.490514,-0.735052,-0.745385,-0.770326,...,0.010193,-0.720473,-0.42925,-0.688821,-0.558842,-0.551173,-0.524679,-0.717192,-0.536461,-0.338063
TQQQ,-0.748467,1.0,0.832475,0.93126,0.172208,0.930437,0.481681,0.875642,0.998691,0.930152,...,-0.001508,0.835122,0.49428,0.774427,0.620413,0.577558,0.580346,0.845351,0.499422,0.444123
TNA,-0.72322,0.832475,1.0,0.92138,0.21748,0.905705,0.657388,0.872907,0.84969,0.917012,...,0.029189,0.998289,0.419751,0.785439,0.611802,0.615608,0.597839,0.961522,0.744917,0.501504
SPXL,-0.773848,0.93126,0.92138,1.0,0.212006,0.998488,0.598068,0.972693,0.934651,0.99732,...,0.024363,0.923722,0.48399,0.843309,0.663638,0.647131,0.64071,0.950087,0.613016,0.52788
AGQ,-0.166672,0.172208,0.21748,0.212006,1.0,0.218777,0.135849,0.185515,0.187689,0.208765,...,0.792773,0.216945,0.179742,0.315465,0.237663,0.269713,0.234195,0.209765,0.182058,0.229799


In [15]:
# Correlation sorted from low to high
corr_sorted = hist_df.pct_change().corr().unstack().sort_values()
print("Top 10 Inverse Correlated ETF Pairs:")
print(corr_sorted.drop_duplicates().head(10))

Top 10 Inverse Correlated ETF Pairs:
SPXL  UVXY   -0.773848
UVXY  UPRO   -0.770747
SSO   UVXY   -0.770326
UVXY  TQQQ   -0.748467
QLD   UVXY   -0.745385
UVXY  UDOW   -0.735052
DDM   UVXY   -0.731325
TNA   UVXY   -0.723220
UVXY  UWM    -0.720473
      URTY   -0.720055
dtype: float64


In [16]:
# Most uncorrelated ETF pairs
corr_sorted_abs = corr_sorted.abs().sort_values()
corr_sorted_abs.head(10)
print("Top 10 Uncorrelated ETF Pairs:")
print(corr_sorted_abs.drop_duplicates().head(10))

Top 10 Uncorrelated ETF Pairs:
UGL   TQQQ    0.001508
      REML    0.002725
UDOW  UGL     0.003968
TMF   RMM     0.004213
UGL   CHAU    0.005484
      QLD     0.009662
      DDM     0.009740
UVXY  UGL     0.010193
UGL   NRGU    0.011188
      URTY    0.015014
dtype: float64


In [17]:
print("Top 10 Directly Correlated ETF Pairs:")
print(corr_sorted.drop_duplicates().tail(10))

Top 10 Directly Correlated ETF Pairs:
UDOW  SSO     0.973782
SSO   SPXL    0.997320
TNA   URTY    0.997417
UWM   TNA     0.998289
UDOW  DDM     0.998355
UPRO  SPXL    0.998488
UWM   URTY    0.998553
SSO   UPRO    0.998674
QLD   TQQQ    0.998691
UVXY  UVXY    1.000000
dtype: float64


In [18]:
# Look up correlation of all pairs with specific ETF
print("Pairs of TMF:")
print(corr_sorted.loc['TMF'])

Pairs of TMF:
UPRO   -0.474257
SSO    -0.465573
UDOW   -0.465486
SPXL   -0.464750
URTY   -0.451109
DDM    -0.447868
UWM    -0.437987
MIDU   -0.436973
TNA    -0.434851
TQQQ   -0.418810
QLD    -0.411596
EDC    -0.379954
RUSL   -0.334883
NRGU   -0.275309
INDL   -0.270762
CHAU   -0.238306
KORU   -0.188528
BRZU   -0.159211
SMHB   -0.105572
RMM    -0.004213
AGQ     0.018075
REML    0.101495
UGL     0.188575
UVXY    0.346473
TMF     1.000000
dtype: float64


In [19]:
print("Pairs of TQQQ:")
print(corr_sorted.loc['TQQQ'])

Pairs of TQQQ:
UVXY   -0.748467
TMF    -0.418810
UGL    -0.001508
AGQ     0.172208
REML    0.269829
RMM     0.444123
NRGU    0.481681
CHAU    0.494280
SMHB    0.499422
BRZU    0.511162
RUSL    0.577558
INDL    0.580346
KORU    0.620413
EDC     0.774427
TNA     0.832475
UWM     0.835122
URTY    0.836380
MIDU    0.845351
DDM     0.875476
UDOW    0.875642
SSO     0.930152
UPRO    0.930437
SPXL    0.931260
QLD     0.998691
TQQQ    1.000000
dtype: float64


In [20]:
# Look up specific ETF pairs
print("Correlation between UPRO and TMF: {}".format(corr_sorted.loc['UPRO', 'TMF']))
print("Correlation between TQQQ and TMF: {}".format(corr_sorted.loc['TQQQ', 'TMF']))
print("Correlation between TQQQ and UGL: {}".format(corr_sorted.loc['TQQQ', 'UGL']))
print("Correlation between UPRO and UGL: {}".format(corr_sorted.loc['UPRO', 'UGL']))
print("Correlation between TMF and UGL: {}".format(corr_sorted.loc['TMF', 'UGL']))
print("Correlation between TQQQ and UPRO: {}".format(corr_sorted.loc['TQQQ', 'UPRO']))

Correlation between UPRO and TMF: -0.4742573095924511
Correlation between TQQQ and TMF: -0.4188102933069367
Correlation between TQQQ and UGL: -0.0015084436408201649
Correlation between UPRO and UGL: 0.03480686026835254
Correlation between TMF and UGL: 0.18857472658577473
Correlation between TQQQ and UPRO: 0.9304369252775009


In [21]:
# Disconnect from DB    
engine.dispose()

## Next Steps

### Review ETFs

* TQQQ
* UPRO
* TMF
* UGL

### Review Pairs

* TQQQ and TMF
* **UPRO and TMF**
* TQQQ and UGL

### Review Portfolio

* TQQQ, TMF, UGL
* TQQQ, UPRO, TMF, UGL 