# Price Data Extraction for Post-Index Rebalancing Arbitrage Strategy

Data source: Refinitiv Datastream via WRDS

This notebook executes codes to extract relevant data, based on the historical records of FTSE100 and FTSE250 rebalancing, for the past 10 years (2013Q1 - 2023Q3).
In this exercise, the following stocks are excluded:
- Stocks that are suspended from trading within the analysis period (+/- 20 days from rebalancing date)
- Stocks which rebalancing dates fall within the announcement date and the ex date of a corporate action
- Stocks that we are unable to obtain a reliable historical data on
- All Q3 2023 rebalancing; at time of study we are unable to obtain 20 days after the rebalancing date

In [2]:
# Import WRDS library
import wrds
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import os

## Data Extraction

In [3]:
def read_sql_script(fname):
    fd = open(fname, 'r')
    sqlFile = fd.read()
    fd.close()

    return sqlFile


# Get current path
current_dir = os.getcwd()


# Define sql file names
# these will be used as a global variable
query_historical_prices = read_sql_script('/Users/abigail/Desktop/SMU/QF603/get_historical_prices.sql')
query_shares_outstanding = read_sql_script('/Users/abigail/Desktop/SMU/QF603/get_shares_outstanding.sql')

# Establish live connection; requires user login (passwords will be masked)
db = wrds.Connection() # this will be used as a global variable


def get_historical_prices(isin, start_date, end_date):
    
    print(f'Extracting historical prices for {isin}...')

    df =\
    (
        db
        .raw_sql(
            query_historical_prices.format(isin, start_date, end_date), 
            date_cols = ['trade_date']
            )
    )

    if df.empty:
        print('Dataframe is empty. No results was returned!')
    
    print('--------------------------------------------------')

    return df


Enter your WRDS username [abigail]:abigailcjh
Enter your password:········
WRDS recommends setting up a .pgpass file.
Create .pgpass file now [y/n]?: y
Created .pgpass file successfully.
You can create this file yourself at any time with the create_pgpass_file() function.
Loading library list...
Done


In [6]:
ftse_rebal = pd.read_csv('/Users/abigail/Desktop/SMU/QF603/ftse_10y_rebal_records.csv')
ftse_rebal.head()

# hist_rebal_price = pd.read_csv('')

Unnamed: 0,Post Date,Name,ISIN,FTSE100,FTSE250
0,18/9/2023,888 Holdings,GI000A0F6407,,1.0
1,18/9/2023,Abrdn,GB00BF8Q6K64,-1.0,1.0
2,18/9/2023,Breedon Group,GB00BM8NFJ84,,1.0
3,18/9/2023,CAB Payments Holdings,GB00BMCYKB41,,1.0
4,18/9/2023,Capita,GB00B23K0M20,,-1.0


In [7]:
look_back = 40
look_forward = 40

ftse_rebal["Post Date"] =\
    pd.to_datetime(ftse_rebal["Post Date"], 
                   format = '%d/%m/%Y')

ftse_rebal["start_date"] =\
    (
        ftse_rebal["Post Date"] - timedelta(days = look_back)
    ).dt.strftime('%d/%m/%Y')

ftse_rebal["end_date"] =\
    (
        ftse_rebal["Post Date"] + timedelta(days = look_back)
    ).dt.strftime('%d/%m/%Y')

In [8]:
target_isins = ftse_rebal["ISIN"]
start_dates = ftse_rebal["start_date"]
end_dates = ftse_rebal["end_date"]   


historical_prices =\
    (
        pd.
        concat(
            map(
                get_historical_prices,
                target_isins,
                start_dates,
                end_dates
            )
        )
    )

Extracting historical prices for GI000A0F6407...
--------------------------------------------------
Extracting historical prices for GB00BF8Q6K64...
--------------------------------------------------
Extracting historical prices for GB00BM8NFJ84...
--------------------------------------------------
Extracting historical prices for GB00BMCYKB41...
--------------------------------------------------
Extracting historical prices for GB00B23K0M20...
--------------------------------------------------
Extracting historical prices for GB00BG5KQW09...
--------------------------------------------------
Extracting historical prices for GB00B14SKR37...
--------------------------------------------------
Extracting historical prices for GB0009633180...
--------------------------------------------------
Extracting historical prices for GB0001826634...
--------------------------------------------------
Extracting historical prices for GG00BMD8MJ76...
--------------------------------------------------


--------------------------------------------------
Extracting historical prices for GG00B1ZBD492...
--------------------------------------------------
Extracting historical prices for GB0031544546...
--------------------------------------------------
Extracting historical prices for JE00B6T5S470...
--------------------------------------------------
Extracting historical prices for GB0009039941...
--------------------------------------------------
Extracting historical prices for GB00B018CS46...
--------------------------------------------------
Extracting historical prices for GB0030474687...
--------------------------------------------------
Extracting historical prices for GB00BMV92D64...
--------------------------------------------------
Extracting historical prices for GB0001500809...
--------------------------------------------------
Extracting historical prices for GB00BYV8MN78...
--------------------------------------------------
Extracting historical prices for GB00BJTNFH41...


--------------------------------------------------
Extracting historical prices for GB0002418548...
--------------------------------------------------
Extracting historical prices for GB00BYYW3C20...
--------------------------------------------------
Extracting historical prices for IM00B5VQMV65...
--------------------------------------------------
Extracting historical prices for GB00BYYTFB60...
--------------------------------------------------
Extracting historical prices for GB00BKP36R26...
--------------------------------------------------
Extracting historical prices for GG00BJL5FH87...
--------------------------------------------------
Extracting historical prices for GB0003450359...
--------------------------------------------------
Extracting historical prices for GB0033195214...
--------------------------------------------------
Extracting historical prices for GB0007388407...
--------------------------------------------------
Extracting historical prices for GB00B1JQDM80...


--------------------------------------------------
Extracting historical prices for GB00B03HDJ73...
--------------------------------------------------
Extracting historical prices for GB00BFZNLB60...
--------------------------------------------------
Extracting historical prices for GB0002945029...
--------------------------------------------------
Extracting historical prices for IM00B5VQMV65...
--------------------------------------------------
Extracting historical prices for GB00B012TP20...
--------------------------------------------------
Extracting historical prices for GB00BKX5CN86...
--------------------------------------------------
Extracting historical prices for GB0004915632...
--------------------------------------------------
Extracting historical prices for GB00BJ62K685...
--------------------------------------------------
Extracting historical prices for GB00BGXQNP29...
--------------------------------------------------
Extracting historical prices for GB00B60BD277...


--------------------------------------------------
Extracting historical prices for GB00BLJNXL82...
--------------------------------------------------
Extracting historical prices for GB0007365546...
--------------------------------------------------
Extracting historical prices for GB00B7FC0762...
--------------------------------------------------
Extracting historical prices for GB00B41H7391...
--------------------------------------------------
Extracting historical prices for BMG702782084...
--------------------------------------------------
Extracting historical prices for GB00B1Z4ST84...
--------------------------------------------------
Extracting historical prices for GB00BDVZYZ77...
--------------------------------------------------
Extracting historical prices for GG00BV54HY67...
--------------------------------------------------
Extracting historical prices for GB00BLRLH124...
--------------------------------------------------
Extracting historical prices for GB00BJTNFH41...


--------------------------------------------------
Extracting historical prices for GB00B1QH8P22...
--------------------------------------------------
Extracting historical prices for GB00BVGBWW93...
--------------------------------------------------
Extracting historical prices for IE0002424939...
--------------------------------------------------
Extracting historical prices for GB00BCKFY513...
--------------------------------------------------
Extracting historical prices for GB00B01FLG62...
--------------------------------------------------
Extracting historical prices for GB00BYRJH519...
--------------------------------------------------
Extracting historical prices for GB0004478896...
--------------------------------------------------
Extracting historical prices for GB00BYXJC278...
--------------------------------------------------
Extracting historical prices for GB00B0HZPV38...
--------------------------------------------------
Extracting historical prices for GB0005758098...


--------------------------------------------------
Extracting historical prices for GB00BNGWY422...
--------------------------------------------------
Extracting historical prices for GB00BMQX2Q65...
--------------------------------------------------
Extracting historical prices for GB0001570810...
--------------------------------------------------
Extracting historical prices for GB00BMHTHT14...
--------------------------------------------------
Extracting historical prices for GB00B1YW4409...
--------------------------------------------------
Extracting historical prices for GI000A0F6407...
--------------------------------------------------
Extracting historical prices for GB00BJTNFH41...
--------------------------------------------------
Extracting historical prices for GB00BKRV3L73...
--------------------------------------------------
Extracting historical prices for GB0004228648...
--------------------------------------------------
Extracting historical prices for GB0006834344...


--------------------------------------------------
Extracting historical prices for GB0004866223...
--------------------------------------------------
Extracting historical prices for GB00B0SWJX34...
--------------------------------------------------
Extracting historical prices for GB00B7FC0762...
--------------------------------------------------
Extracting historical prices for GB00B0D5V538...
--------------------------------------------------
Extracting historical prices for GB00B03HDJ73...
--------------------------------------------------
Extracting historical prices for GB00B1VYCH82 ...
Dataframe is empty. No results was returned!
--------------------------------------------------


  pd.


In [9]:
historical_prices.head()

Unnamed: 0,trade_date,security_code,security_name,primary_exchange,refinitiv_code,isin_code,currency,open,high,low,close,volume
0,2023-08-09,18982.0,888 HOLDINGS,LON,26862.0,GI000A0F6407,GBP,1.066,1.136,1.063,1.117,358252.0
1,2023-08-10,18982.0,888 HOLDINGS,LON,26862.0,GI000A0F6407,GBP,1.14,1.14,1.091,1.12,284950.0
2,2023-08-11,18982.0,888 HOLDINGS,LON,26862.0,GI000A0F6407,GBP,1.1,1.331,1.098,1.15,1003896.0
3,2023-08-14,18982.0,888 HOLDINGS,LON,26862.0,GI000A0F6407,GBP,1.147,1.16,1.088108,1.096,1088784.0
4,2023-08-15,18982.0,888 HOLDINGS,LON,26862.0,GI000A0F6407,GBP,1.1,1.126,1.001,1.114,1127118.0


## Data Cleaning

In [164]:


index_close_data = pd.read_csv('/Users/abigail/Desktop/SMU/QF603/FTSE_100_Index_10y.csv', header = 0)

index_close_data["Date"] =\
    (
        index_close_data["Date"]
        .apply(lambda x: datetime.strptime(x,'%d/%m/%Y'))
    )


index_close_data.index(index_close_data["Date"] == lst_start_date[100])

ValueError: time data '17/2/12' does not match format '%d/%m/%Y'

In [159]:
index_close_data.head()

Unnamed: 0,Date,Close,Net,%Chg,Open,Low,High,Volume,Turnover - GBP,Flow
0,17/2/12,5905.07,19.69,0.33%,5885.38,5885.38,5923.62,1101910955,405068.0,-404602.0
1,20/2/12,5945.25,40.18,0.68%,5905.07,5905.07,5956.33,723669610,270963.0,-133639.0
2,21/2/12,5928.2,-17.05,-0.29%,5945.25,5916.58,5948.84,846269111,326004.0,-459643.0
3,22/2/12,5916.55,-11.65,-0.20%,5928.2,5894.6,5937.96,872088943,327016.0,-786659.0
4,23/2/12,5937.89,21.34,0.36%,5916.55,5900.5,5952.47,1071356849,340406.0,-446253.0


In [None]:
full_ftse_data = pd.read_csv('/Users/abigail/Desktop/SMU/QF603/historical_prices_ftse_full.csv', header = 0)

full_ftse_data["trade_date"] =\
    (
        full_ftse_data["trade_date"]
        .apply(lambda x: datetime.strptime(x,'%d/%m/%Y'))
    )



In [90]:
# Here, we only filter for stocks that are listed on LSEG
# There are stocks that somehow the datastream returns the primary stock listed on other exchanges
# Those stocks should not be part of the analysis

lse_historical_prices = full_ftse_data.loc[full_ftse_data.primary_exchange == 'LON', :].copy()
lse_historical_prices.close.isna().sum()

0

In [91]:
rebal_round = {
    1 : 'Q4',
    2 : 'Q1',
    3 : 'Q1',
    4 : 'Q1',
    5 : 'Q2',
    6 : 'Q2',
    7 : 'Q2',
    8 : 'Q3',
    9 : 'Q3',
    10 : 'Q3',
    11 : 'Q4',
    12 : 'Q4',
}

In [92]:
lse_historical_prices['year'] = lse_historical_prices['trade_date'].dt.year
lse_historical_prices['month'] = lse_historical_prices['trade_date'].dt.month
lse_historical_prices['rebal'] =\
(
    (lse_historical_prices['year'] 
     - 1*(lse_historical_prices['month'] == 1)).astype(str)
    + lse_historical_prices['month'].map(rebal_round)
)

lse_historical_prices.head()

Unnamed: 0,trade_date,security_code,security_name,primary_exchange,refinitiv_code,isin_code,currency,open,high,low,close,volume,ftse100_close,year,month,rebal
0,2023-08-09,18982,888 HOLDINGS,LON,26862,GI000A0F6407,GBP,1.066,1.136,1.063,1.117,358252.0,7587.3,2023,8,2023Q3
1,2023-08-10,18982,888 HOLDINGS,LON,26862,GI000A0F6407,GBP,1.14,1.14,1.091,1.12,284950.0,7618.6,2023,8,2023Q3
2,2023-08-11,18982,888 HOLDINGS,LON,26862,GI000A0F6407,GBP,1.1,1.331,1.098,1.15,1003896.0,7524.16,2023,8,2023Q3
3,2023-08-14,18982,888 HOLDINGS,LON,26862,GI000A0F6407,GBP,1.147,1.16,1.088108,1.096,1088784.0,7507.15,2023,8,2023Q3
4,2023-08-15,18982,888 HOLDINGS,LON,26862,GI000A0F6407,GBP,1.1,1.126,1.001,1.114,1127118.0,7389.64,2023,8,2023Q3


In [93]:
# Remove Q3 2023 Rebal due to incomplete data
lse_historical_prices =\
    lse_historical_prices[lse_historical_prices['rebal'] != '2023Q3']

lse_historical_prices.head()

Unnamed: 0,trade_date,security_code,security_name,primary_exchange,refinitiv_code,isin_code,currency,open,high,low,close,volume,ftse100_close,year,month,rebal
900,2023-05-10,4801,ASOS,LON,9204,GB0030927254,GBP,6.31,6.390098,4.873999,4.873999,3884045.0,7741.33,2023,5,2023Q2
901,2023-05-11,4801,ASOS,LON,9204,GB0030927254,GBP,4.917998,5.159998,4.449998,5.0,6081492.0,7730.58,2023,5,2023Q2
902,2023-05-12,4801,ASOS,LON,9204,GB0030927254,GBP,5.05,5.482,4.964507,5.05,3685914.0,7754.62,2023,5,2023Q2
903,2023-05-15,4801,ASOS,LON,9204,GB0030927254,GBP,4.649998,4.8,3.80595,4.005,5816469.0,7777.7,2023,5,2023Q2
904,2023-05-16,4801,ASOS,LON,9204,GB0030927254,GBP,3.96,4.273999,3.932,3.988999,1900093.0,7751.08,2023,5,2023Q2


In [95]:
lse_historical_prices =\
    lse_historical_prices\
    .sort_values(by = ['security_name', 'trade_date'])\
    .reset_index(drop = True)

In [162]:
# full_ftse_data.head()
lst_start_date[100]

'11/08/2021'

## Get FTSE100/Stock to get Beta
Generate FTSE 100 Index Close based on Trade dates of Historical Price DF - get covariance for stock & index and variance for index

ERRORSSSS: Trying to get data with beta and cov and all but kept getting errors
- was using start date and post date to get price data in a list and run cov/cor/var

In [133]:
lst_isin = list(ftse_rebal["ISIN"])
lst_start_date = list(ftse_rebal["start_date"])
lst_post_date = list(ftse_rebal["Post Date"])
lst_stock_price = list(full_ftse_data["close"])
lst_index_close = list(full_ftse_data["ftse100_close"])

In [146]:
def get_beta(df, isin, start_date, post_date, stock_price, index_close):
    lst_cov = []
    lst_var = []
    lst_beta = []
    lst_corr = []
    
    lst_merge = []
    
    for i in range(len(isin)):
        try:
            index_start = df.index[(df["trade_date"] == start_date[i]) & (df["isin_code"] == isin[i])][0]
            index_end = df.index[(df["trade_date"] == post_date[i]) & (df["isin_code"] == isin[i])][0]

            lst_stock_price = stock_price[index_start:index_end]
            lst_index_close = index_close[index_start:index_end]
            lst_joint = []
            lst_joint.append(lst_stock_price)
            lst_joint.append(lst_index_close)

            corr = np.corrcoef(lst_joint)[0][1]
            cov = np.cov(lst_joint)[0][1]
            var = np.var(lst_index_close)

            beta = cov / var

            lst_cov.append(cov)
            lst_var.append(var)
            lst_beta.append(beta)
            lst_corr.append(corr)
        
        except IndexError:
            
            print(i, isin[i], start_date[i], post_date[i])
            lst_cov.append(0)
            lst_var.append(0)
            lst_beta.append(0)
            lst_corr.append(0)
        next

    lst_merge.append(lst_cov)
    lst_merge.append(lst_var)
    lst_merge.append(lst_beta)
    lst_merge.append(lst_corr)

    return lst_merge
    

In [150]:
# cal_relation = get_beta(full_ftse_data, lst_isin, lst_start_date, lst_post_date, lst_stock_price, lst_index_close)

# ftse_rebal["covariance"] = cal_relation[0]
# ftse_rebal["index_var"] = cal_relation[1]
# ftse_rebal["beta"] = cal_relation[2]
# ftse_rebal["correlation"] = cal_relation[3]


TRIED USING THE CODE TOU WROTE TO GET AROUND IT BUT IT ENDED UP SKIPPING THE ENTIRE LIST :(
- was using post date and (post date index - 20)to get price data in a list and run cov/cor/var

In [152]:
target_isins
rebal_dates = ftse_rebal["Post Date"].dt.strftime('%d/%m/%Y')

stock_price = list(full_ftse_data["close"])
index_close = list(full_ftse_data["ftse100_close"])

target_rebal_prices = []


for isin, rebal_date in zip(target_isins, rebal_dates):
    # Remove stocks that are suspended from trading during the analysis period
    if (isin, rebal_date) in [('GB00BJP5HK17', '19/12/2022'), 
                              ('GB00B1VNST91', '18/06/2018'),
                              ('GB0007892358', '19/06/2017')]:
        continue
    if lse_historical_prices[(lse_historical_prices.isin_code == isin) 
                             & (lse_historical_prices.trade_date == rebal_date)].empty:
        print(f'ISIN {isin} for {rebal_date} is excluded from studies!')
    
    else:
        sub_df = lse_historical_prices[lse_historical_prices.isin_code == isin]
        rebal_idx =\
        (
            sub_df
            .index[sub_df.trade_date == rebal_date]
            [0]
        )
        
        for delta in [-20, -5, -3, -1, 3, 5, 10, 20]:
            # Ensure that the prices for the days required exist
            assert sub_df['rebal'].loc[rebal_idx] == sub_df['rebal'].loc[rebal_idx + delta],\
            f'ISIN {isin} faced insufficient data pre-rebal on {rebal_date} for delta {delta} days'
        
        pre_20_pd = sub_df.close.loc[rebal_idx - 20]
        pre_5_pd = sub_df.close.loc[rebal_idx - 5]
        pre_3_pd = sub_df.close.loc[rebal_idx - 3]
        pre_1_pd = sub_df.close.loc[rebal_idx - 1]
        post_3_pd = sub_df.close.loc[rebal_idx + 3]
        post_5_pd = sub_df.close.loc[rebal_idx + 5]
        post_10_pd = sub_df.close.loc[rebal_idx + 10]
        post_20_pd = sub_df.close.loc[rebal_idx + 20]
        
        start_idx = rebal_idx - 20
        lst_stock_price = stock_price[start_idx:rebal_idx]
        lst_index_close = index_close[start_idx:rebal_idx]
        lst_joint = []
        lst_joint.append(lst_stock_price)
        lst_joint.append(lst_index_close)
        
        cov = np.cov(lst_joint)[0][1]
        var = np.var(lst_joint)
        beta = cov/var
        corr = np.corrcoef(lst_joint)[0][1]
        
        
        target_rebal_prices.append({
            'Name' : sub_df.security_name.values[0],
            'ISIN' : isin,
            'post_date' : rebal_date,
            'pre_twenty_pd' : pre_20_pd,
            'pre_five_pd' : pre_5_pd,
            'pre_three_pd' : pre_3_pd,
            'pre_one_pd' : pre_1_pd,
            'post_three_pd' : post_3_pd,
            'post_five_pd' : post_5_pd,
            'post_ten_pd' : post_10_pd,
            'post_twenty_pd' : post_20_pd,
            'Cov': cov,
            'Var': var,
            'Beta': cov,
            'Corr': corr,
            
        })
    

ISIN GI000A0F6407 for 18/09/2023 is excluded from studies!
ISIN GB00BF8Q6K64 for 18/09/2023 is excluded from studies!
ISIN GB00BM8NFJ84 for 18/09/2023 is excluded from studies!
ISIN GB00BMCYKB41 for 18/09/2023 is excluded from studies!
ISIN GB00B23K0M20 for 18/09/2023 is excluded from studies!
ISIN GB00BG5KQW09 for 18/09/2023 is excluded from studies!
ISIN GB00B14SKR37 for 18/09/2023 is excluded from studies!
ISIN GB0009633180 for 18/09/2023 is excluded from studies!
ISIN GB0001826634 for 18/09/2023 is excluded from studies!
ISIN GG00BMD8MJ76 for 18/09/2023 is excluded from studies!
ISIN GB00B0LCW083 for 18/09/2023 is excluded from studies!
ISIN BMG4593F1389 for 18/09/2023 is excluded from studies!
ISIN GB00BZ4BQC70 for 18/09/2023 is excluded from studies!
ISIN GB0031274896 for 18/09/2023 is excluded from studies!
ISIN GB00BY7QYJ50 for 18/09/2023 is excluded from studies!
ISIN GB00BMT9K014 for 18/09/2023 is excluded from studies!
ISIN GB0006825383 for 18/09/2023 is excluded from studie

## Identify stocks to remove due to corporate actions

Unnamed: 0,security_code,security_name,primary_exchange,refinitiv_code,isin_code,announceddate,exdate,corpactcode,corporateaction
0,48655.0,SYNTHOMER,LON,46186.0,GB00BNTVWJ75,2023-09-07,2023-09-26,CONS,Consolidation
1,48655.0,SYNTHOMER,LON,46186.0,GB00BNTVWJ75,2023-09-07,2023-09-28,RGHT,Rights Issue
2,48655.0,SYNTHOMER,LON,46186.0,GB00BNTVWJ75,2023-09-07,2023-09-26,WRDN,Write Down
3,6543.0,CAPRICORN ENERGY,LON,46290.0,GB00BQ98V038,2023-04-27,2023-05-16,REVS,Reverse Stock Split
4,6543.0,CAPRICORN ENERGY,LON,46290.0,GB00BQ98V038,2023-04-27,2023-05-16,CAPR,Capital Repayment


False

<class 'pandas.core.frame.DataFrame'>
Int64Index: 499 entries, 0 to 527
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   Name            499 non-null    object        
 1   ISIN            499 non-null    object        
 2   post_date       499 non-null    datetime64[ns]
 3   pre_twenty_pd   499 non-null    float64       
 4   pre_five_pd     499 non-null    float64       
 5   pre_three_pd    499 non-null    float64       
 6   pre_one_pd      499 non-null    float64       
 7   post_three_pd   499 non-null    float64       
 8   post_five_pd    499 non-null    float64       
 9   post_ten_pd     499 non-null    float64       
 10  post_twenty_pd  499 non-null    float64       
 11  Post Date       499 non-null    datetime64[ns]
 12  FTSE100         137 non-null    float64       
 13  FTSE250         488 non-null    float64       
dtypes: datetime64[ns](2), float64(10), object(2)
memory usage: