- By: Alex Kwon
- Email: alex.kwon [at] hudsonthames [dot] org
- Reference: [Online Portfolio Selection](https://books.google.com/books/about/Online_Portfolio_Selection.html?id=R2fdCgAAQBAJ) by Dr. Bin Li and Dr. Steven Hoi

# Data Selection for Online Portfolio Selection

## Abstract

Data selection is one of the hardest problems in research. With numerous test sets and vast amount of resources available to public, it is tempting to overfit and choose the data that best represent your hypothesis. However, conclusions that are reached from these weak models are more prone to outliers and can have a narrow scope for applications. Online portfolio selection also deals with the same issues as it is heavily dependent on the data available.

Traditional papers for online portfolio selection have consistently used the same datasets and developed their arguments to improve on the performance of the prior papers. Thomas Cover first used a NYSE dataset that contained 36 stocks from 1962 to 1984. Allan Borodin collected three datasets: 88 stocks from the Toronto Stock Exchange from 1994 to 1998, largest 25 stocks by market capitalization on S&P500 from 1998 to 2003, and 30 stocks from DJIA from 2001 to 2003. Bin Li and Steven Hoi introduced the MSCI World Index from 2006 to 2010 to add an additional perspective to the problem.

All of these datasets have different characteristics as Cover’s NYSE dataset all increased in value whereas most assets in DJIA lost value. The S&P 500 data contains both a bull and bear market environment, and the stocks from TSE originate from a less liquid market and a longer bear run. However, these mediations do not seem enough to justify the applications and practicality of the newest module.

To offset these older datasets in my research, I’ll expand the MSCI world index to look back from 1993 to 2020 and also include 44 largest stocks by market capitalization from 2011 to 2020. Through a different lens of selection, I hope to introduce the readers to a more practical and familiar set of stocks to understand the module in a more intuitive way.

# Strategy

Throughout the next couple of weeks, we will be releasing notebooks on the following strategies

[**Benchmarks**](https://github.com/hudson-and-thames/research/blob/master/Online%20Portfolio%20Selection/Introduction%20to%20Online%20Portfolio%20Selection.ipynb)
- Buy and Hold
- Best Stock
- Constant Rebalanced Portfolio
- Best Constant Rebalanced Portfolio

**Momentum**
- Exponential Gradient
- Follow the Leader
- Follow the Regularized Leader

**Mean Reversion**
- Anti-Correlation
- Passive Aggressive Mean Reversion
- Online Moving Average Reversion
- Robust Median Mean Reversion

**Pattern Matching**
- Nonparametric Histogram/Kernel-Based/Nearest Neighbor Log-Optimal
- Correlation Driven Nonparametric Learning
- Nonparametric Kernel-Based Semi-Log-Optimal/Markowitz/GV

**Meta Algorithm**
- Aggregating Algorithm
- Fast Universalization Algorithm
- Online Gradient Updates
- Online Newton Updates
- Follow the Leading History

**Universal Portfolio**
- Universal Portfolio
- CORN-U
- CORN-K
- SCORN-K
- FCORN-K

## Import Data

We will be using 6 different datasets for the exploration part of this module.
1. 36 NYSE Stocks from 1962 to 1984 by Cover
2. 30 DJIA Stocks from 2001 to 2003 by Borodin
3. 88 TSE Stocks from 1994 to 1998 by Borodin
4. 25 Largest S&P500 Stocks from 1998 to 2003 by Borodin
5. 23 MSCI Developed Market Indices from 1993 to 2020 by Alex Kwon
6. 44 Largest US Stocks by from 2011 to 2020 by Alex Kwon

### Dataset 1

Dataset #1 to #4 was downloaded from a previous researcher's [portfolio](http://www.cs.technion.ac.il/~rani/portfolios/).

This is the original NYSE data that Thomas Cover used for his papers. Although it covers a lot of sectors and should have been very useful back when the paper was published, it is difficult to gauge if this dataset adds much value now because of the timeline.

Strategies that worked a year ago could quickly lose their value as the paradigm shifts. This data is collected almost 60 years ago, and markets nowadays have many complex movements that cannot be comprehended with data from a long time ago. Results from this data should be approached with a grain of salt.

In [5]:
import pandas as pd

In [22]:
# Read txt
nyse = pd.read_csv('NYSE.txt', sep="  ", header=None)
# Get index.
nyse.index = yf.download('AA', start='1962-07-03', end='1985-01-01')['Adj Close'].index
# Change column name.
nyse = nyse.rename(columns={0: 'AHP', 1: 'Alcoa', 2: 'American Brands', 3: 'ARCO', 4: 'Coca Cola',\
                            5: 'Commercial Metals', 6: 'Dow Chemical', 7: 'DuPont', 8: 'Espey Manufacturing',\
                            9: 'Exxon', 10: 'Fischbach', 11: 'Ford', 12: 'General Electric', 13: 'GM',\
                            14: 'GT&E', 15 : 'Gulf Oil', 16: 'Hewlett-Packard', 17: 'IBM', 18: 'Ingersoll Rand',\
                            19: 'Iroquois Brands', 20: 'Johnson & Johnson', 21: 'Kimberly Clark', 22: 'Kinark',\
                            23: 'Eastman Kodak', 24: 'Lukens Steel', 25: 'MEI', 26: 'Merck', 27: '3M', 28: 'Mobil',\
                            29: 'Philip Morris', 30: 'P&G', 31: 'Pillsbury', 32: 'Schlumberger', 33: 'Sears',\
                            34: 'Sherwin Williams', 35: 'Texaco'})
# Export to csv.
nyse.to_csv('NYSE.csv')

In [45]:
# Data is preprocessed to show the price relative, so we will take the cumulative product to indicate relative price.
nyse.cumprod()

Unnamed: 0_level_0,AHP,Alcoa,American Brands,ARCO,Coca Cola,Commercial Metals,Dow Chemical,DuPont,Espey Manufacturing,Exxon,...,Merck,3M,Mobil,Philip Morris,P&G,Pillsbury,Schlumberger,Sears,Sherwin Williams,Texaco
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1962-07-03,1.015150,1.027650,1.041830,1.020830,1.006370,1.049380,1.008470,1.019830,1.054260,0.997510,...,1.031480,1.033770,1.010180,1.014950,1.007750,1.005260,1.011760,1.005780,0.996970,0.997520
1962-07-05,1.030306,1.069126,1.030422,1.015624,1.011150,0.999996,1.016941,1.028325,1.100774,1.002498,...,1.040743,1.036365,1.022898,1.014950,1.009685,1.005260,1.031368,1.015415,0.987878,0.999994
1962-07-06,1.030306,1.043777,1.007608,1.013024,0.996822,0.987646,1.014114,1.022659,1.069765,1.002498,...,1.020375,0.994807,1.020351,0.986714,0.996115,0.989467,1.003913,1.015415,1.015153,0.997514
1962-07-09,1.055559,1.050687,1.034229,1.015627,1.015931,0.987646,1.008465,1.029746,1.069765,1.002498,...,1.031487,1.031167,1.033075,0.983389,1.003864,1.005259,1.003913,1.017345,1.018188,1.019788
1962-07-10,1.088281,1.034559,1.038035,1.020837,1.019111,1.037029,1.036713,1.033999,1.131779,1.019951,...,1.042596,1.025970,1.027982,0.970103,1.015489,1.002625,1.019604,1.032757,1.030305,1.029690
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1984-12-24,13.363088,4.332372,15.975170,16.710408,13.384216,52.126458,8.800101,2.964797,14.203320,13.998756,...,14.277373,6.064718,16.126495,53.804543,6.914765,15.476528,42.562995,4.278911,6.303800,5.388019
1984-12-26,13.363088,4.317685,15.849925,16.710408,13.277142,52.126458,8.760325,2.949321,13.872951,13.998756,...,14.200560,6.055197,15.705755,53.468802,6.945329,15.613031,41.856024,4.262181,6.332986,5.427513
1984-12-27,13.233333,4.303005,15.787317,16.806159,13.250322,52.126458,8.680693,2.980289,13.212321,13.959420,...,14.219731,6.026677,15.705755,52.965126,6.960608,15.749645,41.856024,4.228765,6.362182,5.388055
1984-12-28,13.136068,4.332394,15.787317,16.949852,13.357385,52.394388,8.640849,3.019003,13.460052,14.077377,...,14.315714,6.007692,15.775803,53.804624,6.945295,15.795161,42.421499,4.195316,6.537269,5.407775


### Dataset 2
A more recent dataset that involves companies that are well known to us. 2001 to 2003 covers a bear market run that should be useful to see how our strategies are affected in times of downturn. Most of these assets lost in value as seen with the last row showing values that are below 1.

In [20]:
# Read txt.
djia = pd.read_csv('DJIA.txt', sep="  ", header=None)

# Get index.
djia.index = yf.download('AA', start='2001-01-04', end='2003-01-15')['Adj Close'].index

# Change column names.
djia = djia.rename(columns={0: 'Alcoa', 1: 'GE', 2: 'Johnson&Johnson', 3: 'Microsoft',\
                            4:'American Express', 5: 'General Motors', 6: 'JP Morgan Chase', 7: 'P&G', 8: 'Boeing',\
                            9: 'Home Depot', 10: 'Coca Cola', 11: 'SBC Comms', 12: 'Citigroup',\
                            13: 'Honeywell', 14: 'McDonads', 15 : 'AT&T', 16: 'Caterpillar', 17: 'Hewlett-Packard',\
                            18: '3M', 19: 'United Technologies', 20: 'DuPont', 21: 'IBM', 22: 'Philip Morris',\
                            23: 'Walmart', 24: 'Walt Disney', 25: 'Intel', 26: 'Merck', 27: 'ExxonMobil',\
                            28: 'Eastman Kodak', 29: 'International Paper'})
# Export to csv.
djia.to_csv('DJIA.csv')

In [46]:
# Data is preprocessed to show the price relative, so we will take the cumulative product to indicate relative price.
djia.cumprod()

Unnamed: 0_level_0,Alcoa,GE,Johnson&Johnson,Microsoft,American Express,General Motors,JP Morgan Chase,P&G,Boeing,Home Depot,...,DuPont,IBM,Philip Morris,Walmart,Walt Disney,Intel,Merck,ExxonMobil,Eastman Kodak,International Paper
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2001-01-04,1.032426,1.005229,0.978534,1.010430,0.968716,1.042328,1.027059,0.968913,0.975064,1.004860,...,1.027326,0.984783,0.958580,0.961499,1.028332,0.987438,0.953663,0.972085,1.078765,1.038237
2001-01-05,1.013460,0.989542,0.991089,1.024823,0.921702,0.985222,0.966621,1.004154,0.939098,0.967146,...,1.000000,0.993342,0.949822,0.922998,1.014166,0.936605,0.934702,0.976542,1.037037,1.010716
2001-01-08,1.028755,0.952939,0.989874,1.020859,0.874866,0.958949,0.972743,1.025861,0.943095,0.927099,...,1.007718,0.988693,0.995503,0.922998,0.970058,0.933100,0.936834,0.972085,1.003210,0.974184
2001-01-09,0.990517,0.933487,0.996760,1.080726,0.839113,0.988688,0.961683,1.000000,0.942136,0.891913,...,0.968711,0.978125,1.026746,0.902635,0.970058,0.942156,0.942444,0.961764,1.027901,0.933025
2001-01-10,1.001835,0.934742,0.970838,1.103045,0.834644,0.962416,1.006123,0.989012,0.965153,0.922239,...,0.977889,0.987425,1.044497,0.884497,0.978107,0.964067,0.933356,0.950035,1.024691,0.928641
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2003-01-08,0.668400,0.533361,1.128594,1.131415,0.666786,0.697136,0.528738,1.141632,0.535486,0.411353,...,0.909053,0.889676,0.958816,0.855407,0.569221,0.487292,0.661730,0.821018,0.963458,0.862883
2003-01-09,0.688284,0.541728,1.151276,1.164164,0.680908,0.720671,0.539206,1.149404,0.541880,0.417380,...,0.923864,0.919370,0.979408,0.888432,0.582099,0.498393,0.670706,0.838377,0.988890,0.889917
2003-01-10,0.697461,0.536499,1.158364,1.166458,0.682517,0.713556,0.535059,1.154093,0.541081,0.419518,...,0.918440,0.926556,0.975148,0.883299,0.584353,0.508910,0.671042,0.826648,0.993581,0.916951
2003-01-13,0.699602,0.536289,1.134872,1.176262,0.688953,0.725050,0.538811,1.154897,0.548274,0.426905,...,0.912182,0.924760,0.979408,0.877481,0.589183,0.507742,0.670930,0.823364,0.986421,0.923770


### Dataset 3
The Toronto Stock Exchange includes a collection that may be unfamiliar to most researchers. It is an interesting universe with half of the stocks decreasing in value. With a combination of both overperforming and underperforming stocks, it will be interesting to see how our selection strategies perform.

In [24]:
# Read txt.
tse = pd.read_csv('TSE.txt', sep="  ", header=None)

# Get index.
tse.index = yf.download('TRP', start='1994-01-07', end='1999-01-01')['Adj Close'].index

# Change column name.
tse = tse.rename(columns={0: 'Westcoast Energy', 1: 'Seagram', 2: 'TVX Gold', 3: 'Transcanada', 4: 'Thomson',\
                          5: 'Talisman', 6: 'Trilon', 7: 'Teck', 8: 'TD Bank', 9: 'Transalta',\
                          10: 'Telus', 11: 'Suncor', 12: 'Southam', 13: 'Stelco', 14: 'Shell Canada',\
                          15: 'Slocan Forest', 16: 'RBC', 17: 'Repap Enterprise', 18: 'Rio Algom', 19: 'Ranger Oil',\
                          20: 'Renaissance Energy', 21: 'Rogers Comms', 22: 'QLT', 23: 'Pure Gold Minerals',\
                          24: 'Power Corp', 25 :'Potash', 26 :'Poco Petroleum', 27 :'Placer Dome',\
                          28: 'Petro-Canada', 29 :'Northern Telecom', 30 :'Nova Scotia', 31:'Newbridge Networks',\
                          32: 'Nova Corp', 33: 'National Bank of Canada', 34: 'Inco', 35: 'Methanex', 36: 'Molson',\
                          37: 'Mitel Corp', 38: 'Merrill Lynch', 39: 'Magna Int', 40: 'Moore Corp',\
                          41: 'Macmillan Bloedel', 42: 'Miramar Mining Corp', 43: 'Loewen Group', 44: 'Kinross Gold',\
                          45: 'Imasco', 46: 'Imperial Oil', 47: 'Investors Group', 48: 'Intl Forest Products',\
                          49: 'Hudson\'s Bay', 50: 'Gentra', 51: 'Gulf Canada', 52: 'Franco-Nevada Mining Corp',\
                          53: 'Fletcher Challenge Canada', 54: 'First Australia', 55: 'Extendicare',\
                          56: 'Euro-Nevada Mining Corp', 57: 'Canadian 88 Energy Corp', 58: 'Echo Bay Mines',\
                          59: 'Domtar', 60: 'Dofasco', 61: 'Dundee Bancorp', 62: 'Canadian Occidental Petroleum',\
                          63: 'Canadian Utilities', 64: 'Canadian Tire', 65: 'Canadian Natural Resources',\
                          66: 'Canadian Imperial Bank of Commerce', 67: 'Cominco', 68: 'Cambior', 69: 'CAE',\
                          70: 'Breakwater Resources', 71: 'Bank of Nova Scotia', 72: 'Bank of Montreal',\
                          73: 'BEMA Gold Corp', 74: 'BCE Mobile Comms', 75: 'BC Telecom', 76: 'B.C. Gas', 77: 'BCE',\
                          78: 'Cott Corp', 79: 'Bombardier', 80: 'Anderson', 81: 'AUR Resources',\
                          82: 'Alcan Aluminum', 83: 'Agnico-Eagle Mines', 84: 'Alberta Energy', 85: 'Air Canada',\
                          86: 'Aber Resources', 87: 'Barrick Gold Corp'})
# Export to csv.
tse.to_csv('TSE.csv')

In [47]:
# Data is preprocessed to show the price relative, so we will take the cumulative product to indicate relative price.
tse.cumprod()

Unnamed: 0_level_0,Westcoast Energy,Seagram,TVX Gold,Transcanada,Thomson,Talisman,Trilon,Teck,TD Bank,Transalta,...,Cott Corp,Bombardier,Anderson,AUR Resources,Alcan Aluminum,Agnico-Eagle Mines,Alberta Energy,Air Canada,Aber Resources,Barrick Gold Corp
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1994-01-07,0.977273,1.010790,1.057140,0.981366,1.000000,1.004310,1.014290,1.021510,1.000000,0.991803,...,0.992308,0.988095,1.008660,1.000000,1.018100,1.028990,0.993243,1.025000,1.105260,1.063120
1994-01-10,0.988639,1.035969,1.028569,0.981366,0.992308,1.051723,1.028571,1.037640,0.994152,0.991803,...,1.084612,0.988095,1.038960,1.043480,1.040722,1.050733,0.986486,1.000000,1.144729,1.066448
1994-01-11,0.988639,1.053953,1.028569,0.981366,0.992308,1.043102,1.042858,1.037640,0.999998,0.983607,...,1.023074,0.988095,1.064934,1.065216,1.058820,1.043486,0.993243,1.050000,1.184199,1.066448
1994-01-12,0.982957,1.057547,1.042855,0.968944,0.999998,1.047411,1.114283,1.043015,0.999998,0.991800,...,1.015381,1.011908,1.073592,1.108698,1.054295,1.000007,1.020270,1.100001,1.144726,1.029903
1994-01-13,0.988638,1.093514,1.014284,0.981366,1.007688,1.043100,1.128569,1.026885,1.017538,0.983604,...,0.980766,1.020388,1.090909,1.108698,1.072397,1.000007,1.013513,1.150007,1.131569,1.023258
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1998-12-24,1.709509,1.731276,0.309720,1.425260,2.498877,0.903328,4.154285,0.484345,2.842353,2.029976,...,0.184673,4.382357,0.910744,0.451729,1.624094,0.357779,2.008721,1.199974,2.105369,0.794756
1998-12-28,1.706694,1.729760,0.314291,1.431531,2.488814,0.906779,4.099141,0.495611,2.892180,2.021149,...,0.176781,4.412507,0.900356,0.442871,1.653653,0.366799,1.979006,1.229973,2.092211,0.813941
1998-12-29,1.706694,1.805627,0.310862,1.433421,2.482107,0.887815,4.043995,0.488853,2.969661,2.021149,...,0.164154,4.422568,0.928060,0.451729,1.651682,0.375819,1.967120,1.219974,2.144851,0.829015
1998-12-30,1.706694,1.744934,0.312006,1.423886,2.371417,0.886091,4.043995,0.473084,2.961358,2.007911,...,0.162575,4.312004,0.910745,0.442871,1.645769,0.378825,1.943347,1.219974,2.171168,0.811202


### Dataset 4
This dataset also includes the bear and bull run during turbulent periods. It is longer than the DJIA data by 3 more years and include many companies that are familiar to us. This will be a good comparison to our following dataset #6 that looks at a more recent history for most of these companies.

In [26]:
# Read txt.
sp500 = pd.read_csv('SP500.txt', sep="  ", header=None)
# Get index.
sp500.index = yf.download('GE', start='1998-01-03', end='2003-02-01')['Adj Close'].index
# Change column name.
sp500 = sp500.rename(columns={0: 'GE', 1: 'Microsoft', 2: 'Walmart', 3: 'ExxonMobil', 4: 'Pfizer',\
                            5: 'Citigroup', 6: 'Johnson & Johnson', 7: 'AIG', 8: 'IBM',\
                            9: 'Merck', 10: 'P&G', 11: 'Intel', 12: 'Bank of America', 13: 'Coca Cola',\
                            14: 'Cisco', 15 : 'Verizon', 16: 'Wells Fargo', 17: 'Amgen', 18: 'Dell',\
                            19: 'PepsiCo', 20: 'SBC Comms', 21: 'Fannie Mae', 22: 'Chevron',\
                            23: 'Viacom', 24: 'Eli Lilly'})
# Export to csv.
sp500.to_csv('SP500.csv')

In [48]:
# Data is preprocessed to show the price relative, so we will take the cumulative product to indicate relative price.
sp500.cumprod()

Unnamed: 0_level_0,GE,Microsoft,Walmart,ExxonMobil,Pfizer,Citigroup,Johnson & Johnson,AIG,IBM,Merck,...,Verizon,Wells Fargo,Amgen,Dell,PepsiCo,SBC Comms,Fannie Mae,Chevron,Viacom,Eli Lilly
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1998-01-05,1.017737,0.994283,1.009524,0.988891,1.054500,0.992991,0.999038,0.993189,1.007693,1.005807,...,0.969759,0.990180,0.996522,1.029154,1.013889,0.971643,0.998926,0.971131,1.033182,1.043518
1998-01-06,1.004220,1.000000,1.015873,0.953535,1.020643,0.997664,0.983668,0.989215,0.996450,0.991870,...,0.993127,0.980360,0.984938,1.031337,0.977431,0.967472,1.018260,0.946271,1.034688,1.025385
1998-01-07,1.012669,0.988085,1.012698,0.983838,1.032205,0.964950,0.993274,0.979569,0.986982,0.976771,...,0.973195,0.963993,0.986095,1.005831,0.996528,0.973311,1.020408,0.979952,1.060332,1.005440
1998-01-08,1.003377,0.995235,0.998415,0.962628,1.025597,0.939250,1.004801,0.974461,0.986392,0.986644,...,0.959449,0.980360,0.989573,1.005094,0.996528,0.964136,1.009667,0.943063,1.045249,0.993654
1998-01-09,0.978882,0.968543,0.969844,0.939394,0.976876,0.891352,0.993274,0.937003,0.947338,0.955865,...,0.951890,0.934534,0.975668,0.970109,0.965278,0.954963,0.978518,0.912590,0.983406,0.966455
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2003-01-27,0.934458,1.499945,2.392382,1.028526,1.181567,1.311742,1.603994,1.410271,1.484878,0.961858,...,0.779655,1.224222,3.765837,2.239057,1.126389,0.663219,1.102469,0.793585,1.816223,0.873545
2003-01-28,0.938512,1.489268,2.447239,1.055678,1.189098,1.325573,1.630430,1.430191,1.516878,1.010918,...,0.792850,1.220294,3.825907,2.314626,1.125833,0.649074,1.111921,0.807699,1.827324,0.883989
2003-01-29,0.933647,1.522519,2.443176,1.094143,1.210898,1.328937,1.610757,1.415379,1.520854,1.029315,...,0.828039,1.227103,3.870403,2.363139,1.125278,0.649074,1.125670,0.837851,1.898274,0.888341
2003-01-30,0.914188,1.471575,2.416255,1.074102,1.177603,1.278471,1.584321,1.359959,1.482605,1.012962,...,0.817483,1.213748,3.780669,2.255850,1.075278,0.636530,1.091298,0.823737,1.823463,0.861215


### Dataset 5

Unfortunately, I am unable to directly share the data that I downloaded from Factset. However, MSCI monthly data is available to down at their [website](https://www.msci.com/end-of-day-data-country).

I used MSCI Developed Markets Index from 1993/01/01.

It includes 23 countries:
   - Americas: USA, Canada
   - Europe & Middle East: Austria, Belgium, Denmark, Finland, France, Germany, Ireland, Israel, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom
   - Pacific: Australia, Hong Kong, Japan, New Zealand, Singapore
   
Different from traditional assets, the world indexes capture much more than just the price changes of individual assets. With an overarching representation of the countries' market states, these market indexes will present a different idea for applications of OLPS strategies.

In [32]:
msci

Unnamed: 0_level_0,Australia,Austria,Belgium,Canada,Denmark,Finland,France,Germany,Hong Kong,Ireland,...,Netherlands,New Zealand,Norway,Portugal,Singapore,Spain,Sweden,Switzerland,United Kingdom,USA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1970-01-01,100.000000,100.000000,100.000000,100.000000,100.000000,,100.000000,100.000000,100.000000,,...,100.000000,,100.000000,,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
1970-01-02,100.000000,100.000000,100.000000,100.000000,100.000000,,100.000000,100.000000,100.000000,,...,100.000000,,100.000000,,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
1970-01-05,100.000000,100.000000,100.000000,100.000000,100.000000,,100.000000,100.000000,100.000000,,...,100.000000,,100.000000,,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
1970-01-06,100.000000,100.000000,100.000000,100.000000,100.000000,,100.000000,100.000000,100.000000,,...,100.000000,,100.000000,,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
1970-01-07,100.000000,100.000000,100.000000,100.000000,100.000000,,100.000000,100.000000,100.000000,,...,100.000000,,100.000000,,100.000000,100.000000,100.000000,100.000000,100.000000,100.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-04-20,604.641359,713.888157,1063.931203,1403.175719,10461.116309,431.017403,1505.485004,1652.106274,10416.815332,156.130908,...,2963.467748,168.557033,1691.238719,60.849315,2974.804216,299.903604,6254.447868,5879.344155,860.032055,2692.433355
2020-04-21,581.392916,683.811303,1013.110604,1344.400758,10364.108501,416.329214,1448.298495,1588.740842,10181.002893,152.589494,...,2861.091981,161.992536,1631.854348,59.551533,2898.845186,289.943628,5996.494228,5729.337370,820.542785,2608.716015
2020-04-22,583.727898,704.832919,1020.682569,1382.745761,10388.209613,429.808850,1460.761507,1608.447299,10210.902283,156.841081,...,2930.770175,159.844927,1619.443584,60.773371,2895.229087,293.517799,6166.523111,5774.674583,842.396712,2669.372691
2020-04-23,590.102820,729.285719,1044.098476,1390.229972,10519.731506,434.347830,1475.563583,1624.477846,10254.069670,157.861220,...,2934.834695,161.938275,1669.657666,60.942110,2899.756263,295.120247,6274.965618,5757.703898,854.816085,2668.272058


### Dataset 6

For a more recent dataset, I collected the 44 largest US stocks based on market capitalization according to a Financial Times [report](http://media.ft.com/cms/253867ca-1a60-11e0-b003-00144feab49a.pdf).

The companies included are:


Exxon Mobil, Apple, Microsoft, Berkshire Hathaway, General Electric, Walmart, Chevron, IBM, PG, ATT, Johnson and Johnson, JP Morgan, Wells Fargo, Oracle, Coca-Cola, Google, Pfizer, Citi, Bank of America, Intel, Schlumberger, Cisco, Merck, Philip Morris, PepsiCo, ConocoPhillips, Goldman Sachs, McDonalds, Amazon, Qualcomm, Occidental Petroleum, Abbott Laboratories, Walt Disney, 3M, Comcast, Caterpillar, General Motors, Home Depot, Ford, Freeport-McMoran Copper & Gold, United Parcel Service, Amgen, US Bancorp, American Express

Although included in the original report, I did not include United Technologies and Kraft Foods due to M&A activites. I also excluded Hewlett-Packard because of the company split in 2015.

This dataset will be particularly interesting because it also includes the recent bear market data as well. With 10 years of continuous bull run after the financial crisis in 2008. We will examine which portfolio was the most robust to the rapidly changing market paradigm in the last one month.

In [28]:
# Get ticker for the companies.
ticker = ['XOM','AAPL','MSFT','BRK-A','GE','WMT','CVX','IBM','PG','T','JNJ','JPM','WFC','ORCL','KO','Googl','PFE','C','BAC',\
          'INTC','SLB','CSCO','MRK','PM','PEP','COP','GS','MCD','AMZN','QCOM','OXY','ABT','DIS','MMM','CMCSA',\
        'CAT','GM','HD','F','FCX','UPS','AMGN','USB','AXP']

# Download from yfinance.
us_equity = yf.download(ticker)['Adj Close']

# Filter from 2011 and drop NaN.
us_equity = equity.loc['2011-01-01':].dropna()

# Export to csv.
us_equity.to_csv('SP500.csv')

In [30]:
us_equity

Unnamed: 0_level_0,AAPL,ABT,AMGN,AMZN,AXP,BAC,BRK-A,C,CAT,CMCSA,...,PG,PM,QCOM,SLB,T,UPS,USB,WFC,WMT,XOM
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-03,40.868607,17.085857,45.187981,184.220001,37.719135,12.708498,120498.0,44.922279,70.829796,9.417547,...,48.124630,37.988594,38.717251,64.993568,17.702974,55.169281,21.427851,24.231056,43.356121,54.634018
2011-01-04,41.081905,17.246641,46.164131,185.009995,38.197128,12.753277,120200.0,44.922279,70.498779,9.447018,...,48.258369,37.988594,39.326797,63.424110,17.828276,55.048283,21.284679,24.284765,43.522995,54.890518
2011-01-05,41.417946,17.246641,46.147858,187.419998,39.305443,12.986132,121300.0,45.564026,71.108131,9.573311,...,48.146931,37.878529,40.144650,64.169968,17.887941,55.131470,21.332407,24.837217,43.236923,54.743927
2011-01-06,41.384472,17.210915,46.001446,185.860001,39.034901,12.932396,120600.0,45.380680,70.370895,9.581732,...,48.065197,37.373474,40.638466,62.569408,17.645803,54.821392,20.910847,24.668407,42.879337,55.095707
2011-01-07,41.680836,17.282372,46.351238,185.490005,38.712017,12.762233,119681.0,45.289001,70.513832,9.556470,...,47.924019,36.531727,39.913193,63.369705,17.464201,54.564274,20.751772,24.169670,42.974686,55.396179
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-04-21,268.369995,94.050003,230.750000,2328.120117,81.519997,21.639999,275750.0,41.570000,109.849998,35.709999,...,118.887138,72.279999,71.839996,14.690000,29.870001,100.620003,32.730000,26.840000,129.210007,40.959999
2020-04-22,276.100006,95.480003,229.289993,2363.489990,82.540001,21.799999,279660.0,42.240002,110.639999,35.730000,...,118.609001,73.059998,74.680000,15.340000,29.469999,97.610001,33.250000,26.799999,131.589996,42.130001
2020-04-23,275.029999,93.940002,232.490005,2399.449951,82.459999,21.870001,278750.0,42.459999,112.910004,36.090000,...,119.400002,71.779999,73.809998,16.520000,29.500000,99.449997,33.369999,26.530001,128.529999,43.450001
2020-04-24,282.970001,94.059998,236.279999,2410.219971,83.169998,22.180000,279460.0,43.099998,114.040001,37.160000,...,118.779999,73.669998,76.040001,16.110001,29.709999,100.180000,34.000000,26.920000,129.440002,43.730000
