# Get Stock Indexes from Yahoo! Finance

*Yixiao Lu (ylu306)*

This script is designed for Google Colab.

Install Yahoo! finance package `yfinance`.

In [1]:
!pip install yfinance

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


*Restart the runtime after executing.*

Import packages.

In [2]:
import yfinance
import numpy as np

## Clean dataset

Download stock indexes from Yahoo! finance.

The ticker name of stock indexes are from https://finance.yahoo.com/world-indices/ Accessed Aug 12, 2022.

In [3]:
stock_indexes = yfinance.download([
    '^GSPC',  # S&P 500
    '^DJI',  # Dow Jones Industrial Average
    '^IXIC',  # NASDAQ Composite
    '^N100',  # Euronext 100 Index
    '^N225',  # Nikkei 225
    '^HSI',  # HANG SENG INDEX
    '^NZ50',  # S&P/NZX 50 INDEX GROSS
    '000001.SS',  # SSE Composite Index
    '399001.SZ',  # Shenzhen Index
    '^BUK100P',  # Cboe UK 100
], start="2016-01-01", end="2019-12-31", timeout=3, auto_adjust=True)

[*********************100%***********************]  10 of 10 completed


We only use close price. After setting `auto_adjust=True`, the close price has been properly adjusted.

In [4]:
stock_indexes_price = stock_indexes['Close'].copy()
stock_indexes_price

Unnamed: 0_level_0,000001.SS,399001.SZ,^BUK100P,^DJI,^GSPC,^HSI,^IXIC,^N100,^N225,^NZ50
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-01-04,3296.258057,11625.909180,604.640015,17148.939453,2012.660034,21327.119141,4903.089844,885.000000,18450.980469,
2016-01-05,3287.710938,11467.931641,608.780029,17158.660156,2016.709961,21188.720703,4891.430176,890.530029,18374.000000,6278.100098
2016-01-06,3361.840088,11724.749023,603.030029,16906.509766,1990.260010,20980.810547,4835.759766,879.270020,18191.320312,6262.520020
2016-01-07,3125.001953,10760.149414,591.359985,16514.099609,1943.089966,20333.339844,4689.430176,864.859985,17767.339844,6213.390137
2016-01-08,3186.412109,10888.788086,588.890015,16346.450195,1922.030029,20453.710938,4643.629883,850.020020,17697.960938,6158.100098
...,...,...,...,...,...,...,...,...,...,...
2019-12-25,2981.881104,10229.580078,,,,,,,23782.869141,
2019-12-26,3007.354980,10303.719727,,28621.390625,3239.909912,,9022.389648,,23924.919922,
2019-12-27,3005.035889,10233.769531,763.559998,28645.259766,3240.020020,28225.419922,9006.620117,1156.609985,23837.720703,11602.120117
2019-12-30,3040.023926,10365.959961,758.979980,28462.140625,3221.290039,28319.390625,8945.990234,1146.560059,23656.619141,11556.450195


The first row has missing value, so we use backward filling to fill in the hole.

In [5]:
stock_indexes_price.iloc[0, :] = stock_indexes_price.fillna(method='bfill').iloc[0, :].copy()

Fill other rows with previous close price.

In [6]:
stock_indexes_price.fillna(method='ffill', inplace=True)

Assert no missing values exist after processing.

In [7]:
assert np.sum(np.isnan(stock_indexes_price.values)) == 0
stock_indexes_price

Unnamed: 0_level_0,000001.SS,399001.SZ,^BUK100P,^DJI,^GSPC,^HSI,^IXIC,^N100,^N225,^NZ50
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-01-04,3296.258057,11625.909180,604.640015,17148.939453,2012.660034,21327.119141,4903.089844,885.000000,18450.980469,6278.100098
2016-01-05,3287.710938,11467.931641,608.780029,17158.660156,2016.709961,21188.720703,4891.430176,890.530029,18374.000000,6278.100098
2016-01-06,3361.840088,11724.749023,603.030029,16906.509766,1990.260010,20980.810547,4835.759766,879.270020,18191.320312,6262.520020
2016-01-07,3125.001953,10760.149414,591.359985,16514.099609,1943.089966,20333.339844,4689.430176,864.859985,17767.339844,6213.390137
2016-01-08,3186.412109,10888.788086,588.890015,16346.450195,1922.030029,20453.710938,4643.629883,850.020020,17697.960938,6158.100098
...,...,...,...,...,...,...,...,...,...,...
2019-12-25,2981.881104,10229.580078,762.010010,28515.449219,3223.379883,27864.210938,8952.879883,1154.290039,23782.869141,11642.780273
2019-12-26,3007.354980,10303.719727,762.010010,28621.390625,3239.909912,27864.210938,9022.389648,1154.290039,23924.919922,11642.780273
2019-12-27,3005.035889,10233.769531,763.559998,28645.259766,3240.020020,28225.419922,9006.620117,1156.609985,23837.720703,11602.120117
2019-12-30,3040.023926,10365.959961,758.979980,28462.140625,3221.290039,28319.390625,8945.990234,1146.560059,23656.619141,11556.450195


## Difference

Perform the logarithmic transformation.

In [8]:
stock_indexes_log_rrt = stock_indexes_price.apply(lambda x: np.diff(np.log(x)))
stock_indexes_log_rrt

Unnamed: 0,000001.SS,399001.SZ,^BUK100P,^DJI,^GSPC,^HSI,^IXIC,^N100,^N225,^NZ50
0,-0.002596,-0.013682,0.006824,0.000567,0.002010,-0.006510,-0.002381,0.006229,-0.004181,0.000000
1,0.022297,0.022147,-0.009490,-0.014804,-0.013202,-0.009861,-0.011446,-0.012725,-0.009992,-0.002485
2,-0.073054,-0.085852,-0.019542,-0.023484,-0.023986,-0.031346,-0.030727,-0.016524,-0.023583,-0.007876
3,0.019461,0.011884,-0.004186,-0.010204,-0.010898,0.005902,-0.009815,-0.017308,-0.003913,-0.008938
4,-0.054731,-0.064136,-0.007637,0.003183,0.000853,-0.028023,-0.001215,-0.001483,0.000000,-0.009016
...,...,...,...,...,...,...,...,...,...,...
1034,-0.000268,0.003946,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,-0.002004,0.000000
1035,0.008507,0.007221,0.000000,0.003708,0.005115,0.000000,0.007734,0.000000,0.005955,0.000000
1036,-0.000771,-0.006812,0.002032,0.000834,0.000034,0.012880,-0.001749,0.002008,-0.003651,-0.003498
1037,0.011576,0.012834,-0.006016,-0.006413,-0.005798,0.003324,-0.006754,-0.008727,-0.007626,-0.003944


## Export 

Save the result `dataframe`.

In [9]:
stock_indexes_log_rrt.to_pickle('stock_indexes.pkl')