<b>This project deals with analyzing the FED (Federal Reserve) announcements. 
All historic Fed announcements can be found here:
https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm
    
Here is for instance the minutes of the last Fed meeting:
https://www.federalreserve.gov/monetarypolicy/files/fomcminutes20211103.pdf
    
My goal is to predict the S&P500 index direction in the next week based on what is written in the FED minutes announcement. I will start by scraping the relevant pages with Selenium, then I will build a classifier based on the various features I can extract from the FED minutes. I will train the classifier based on the historic movements of the S&P500 index. 
Another approach I will be using is to vectorize the texts by Tfidf using Loughran-McDonald financial dictionary and calculate the cosine similarity between two consecutive meetings. This value is the degree of change in the text direction (i.e. cosine of vectors), which may indicate the policy change, and would be tested for correlation with the S&P500 index movement. 
    
Edit: as for 15.02.2022 I have only finished the scraping section, then extracting nominal FED market recommendations out of the text and adding S&P500 data from the days following each meeting. 
Stay tuned for the upcoming modeling training and prediction using tf-idf and cosine similarity.</b>

In [1]:
# %pip install selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
import chromedriver_autoinstaller
chromedriver_autoinstaller.install()

import time     
from datetime import timedelta

import re

import pandas as pd
import pandas_datareader.data as web

import numpy as np
import datetime

In [2]:
# SELENIUM_URL = 'http://127.0.0.1:4444/wd/hub'

In [3]:
driver = webdriver.Chrome(options=webdriver.ChromeOptions())
driver.implicitly_wait(10)

##### Scrape FOMC meeting dates and meeting urls

In [4]:
FOMC_URL = 'https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm'

driver.get(FOMC_URL)
time.sleep(7)

In [5]:
# Create a selenium webobject for all minutes HTML urls

main_page_urls = driver.find_elements(By.XPATH, "//*[contains(@href,'fomcminutes') and contains(@href,'.htm')]")

In [6]:
# Crete a list af minutes urls:

minute_urls = [elem.get_attribute("href") for elem in main_page_urls]


# Extract meeting dates for S&P label data and df index:

meeting_dates = []

for minute_url in minute_urls:
    date = minute_url[-12:-4]
    meeting_dates.append(date)

#### Create a df for all meetings' scraped data

In [7]:
# Prepare series for df creation:

full_text_s = []
recommendations_s = []
num_of_recommendations_s = []

for minute_url in minute_urls:
    
    # Begin selenium instance
    driver.get(minute_url) 
    time.sleep(10)
    
    # Extract full text of minutes
    minutes_raw = driver.find_element(By.XPATH, "//div[@id='article']")
    full_text = minutes_raw.text.replace('\n', ' ')
    full_text_s.append(full_text)
    
    # Extract recommendations section into list:
    recs = []
    
    # If recommendations are in ul format:
    recommendations_list_of_obj = minutes_raw.find_elements(By.XPATH,
        "//p[contains(.,'Effective') and contains(.,'Federal Open Market Committee directs the Desk')]//following-sibling::ul//li")
    
    # If recommendations are in p format:
    if len(recommendations_list_of_obj) == 0:
        recommendations_list_of_obj = minutes_raw.find_elements(By.XPATH,
            "//p[contains(.,'Effective') and contains(.,'Federal Open Market Committee directs the Desk')]")
        recommendations_list_of_obj2 = minutes_raw.find_elements(By.XPATH,
            "//p[contains(.,'Effective') and contains(.,'Federal Open Market Committee directs the Desk')]//following-sibling::p")
        for obj in recommendations_list_of_obj2:
            recommendations_list_of_obj.append(obj)
            
    # If recommendations are in unknown format:
    if len(recommendations_list_of_obj) == 0:
        print('problem')    
        
    # Extract text from objects and add them to df series:
    for recommendations_obj in recommendations_list_of_obj:
        recs.append(recommendations_obj.text)
#         print('recs', type(recs), len(recs))
    recommendations_s.append(recs)
    num_of_recommendations_s.append(len(recs))
#     print(recommendations_s)
    print(f"data from {minute_url} was appended")
    
print(len(full_text_s), 'minutes were scraped')

data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20210127.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20210317.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20210428.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20210616.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20210728.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20210922.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20211103.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20211215.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20200129.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes20200315.htm was appended
data from https://www.federalreserve.gov/monetarypolicy/fomcminutes202

In [8]:
print('meeting_dates', len(meeting_dates))
print('full_text_s', len(full_text_s))
print('recommendations_s', len(recommendations_s))
print('num_of_recommendations_s', len(num_of_recommendations_s))

meeting_dates 40
full_text_s 40
recommendations_s 40
num_of_recommendations_s 40


In [9]:
# Cast percent targets in recommendations as floats

def text_to_float(text):
    
    if '/' in text:
        split_fraction = text.split('/')
        num = int(split_fraction[0])/int(split_fraction[1]) 
    else: num = float(text)
        
    return num

    
def text_to_percent(text):
    
    percent = 0
    if '-' in text:
        split_text = text.split('-')
        for split in split_text:
            percent += text_to_float(split)
    else:   
        percent = text_to_float(text)

    return percent

In [10]:
# Extract nominal targets from recommendations' text:

recs_s = recommendations_s

ffr_1 = []
ffr_2 = []
                                
for i in range(len(recs_s)):
    counter1 = 1
    ffrs = 0
#     print(ffrs)
#     print(f'''ffrs): {ffrs}, len(recs): {len(recs_s[i])}''')
    while counter1 <= len(recs_s[i]):
#         print(counter1)
        rec = recs_s[i][counter1-1].lower()
        
        # step 1 - Add federal fund rate to ffr_1 and ffr_2:

        if 'maintain the federal funds rate' in rec:
#             add_to_ffr(rec, ffr_1, ffr_2)                                     # TODO: export to a func 
#             print(i, 'found text in rec', counter)
            ffr = re.search(r'target range of (.+?) to (.+?) percent', rec)
    #         print(ffr.group(0), ffr.group(1), ffr.group(2))
            ffr_1.append(text_to_percent(ffr.group(1)))
            ffr_2.append(text_to_percent(ffr.group(2)))
            ffrs += 1 
        
        counter1 += 1
        
    # When there is more then one federal funds rate recommendation in same meeting - take largest range:
    if ffrs > 1:
        many_list = ffr_1[-ffrs:]
        del ffr_1[-ffrs:]
        ffr_1.append(min(many_list)) # keep lower value
        many_list = ffr_2[-ffrs:]
        del ffr_2[-ffrs:]
        ffr_2.append(max(many_list)) # keep higher value
        

In [11]:
ffr_change_s = [ffr_2[i] - ffr_1[i] for i in range(len(ffr_1))]

In [12]:
print('meeting_dates', len(meeting_dates))
print('full_text_s', len(full_text_s))
print('recommendations_s', len(recommendations_s))
print('num_of_recommendations_s', len(num_of_recommendations_s))
print('num_of_ffrs_s', len(ffr_1), len(ffr_2))

meeting_dates 40
full_text_s 40
recommendations_s 40
num_of_recommendations_s 40
num_of_ffrs_s 40 40


In [13]:
fomc_data = pd.DataFrame(data = {'date': meeting_dates, 'full_text': full_text_s,
                                 'recommendations': recommendations_s, 'num_of_recommendations': num_of_recommendations_s,
                                 'ffr_min': ffr_1, 'ffr_max': ffr_2})
fomc_data = fomc_data.astype({'date':str, 'full_text':str, #'recommendations':str, 
                              'num_of_recommendations':int, 'ffr_min':float, 'ffr_max':float})
# fomc_data.set_index('date', inplace = True)

In [14]:
fomc_data.head()

Unnamed: 0,date,full_text,recommendations,num_of_recommendations,ffr_min,ffr_max
0,20210127,Minutes of the Federal Open Market Committee J...,[Undertake open market operations as necessary...,8,0.0,0.25
1,20210317,Minutes of the Federal Open Market Committee M...,[Undertake open market operations as necessary...,8,0.0,0.25
2,20210428,Minutes of the Federal Open Market Committee A...,[Undertake open market operations as necessary...,8,0.0,0.25
3,20210616,Minutes of the Federal Open Market Committee A...,[Undertake open market operations as necessary...,8,0.0,0.25
4,20210728,Minutes of the Federal Open Market Committee J...,[Undertake open market operations as necessary...,8,0.0,0.25


In [16]:
def write_df_to_csv(df, destination_folder, file_name):
    print("writing", df.shape[0], "rows")
    file_path =  destination_folder + file_name + ".csv"
    df.to_csv(file_path, index=False, encoding='utf-8-sig')
    print('done creating file')
    
file_name = "fomc_data"
destination_folder = "/Users/dannystatland/Drive/MBA/text_mining/final_project"
write_df_to_csv(fomc_data, destination_folder, file_name)

writing 40 rows
done creating file


#### Add S&P500 data from the days following each meeting

In [17]:
# Create a series of date objects from meeting dates:
f = lambda x: datetime.date(int(x[:4]), int(x[4:6]), int(x[6:]))
meeting_dates_obj_list = [f(x) for x in meeting_dates]

In [18]:
meeting_dates_obj_list[:5]

[datetime.date(2021, 1, 27),
 datetime.date(2021, 3, 17),
 datetime.date(2021, 4, 28),
 datetime.date(2021, 6, 16),
 datetime.date(2021, 7, 28)]

In [19]:
# Create a df to hold the S&P500 values on the meeting dates and the consecutive days:

sp500_values = pd.DataFrame(data = {'date': meeting_dates_obj_list})
for i in range(0,8):
    sp500_values[f'day_{i}'] = [np.nan for i in range(len(meeting_dates_obj_list))]
sp500_values.set_index('date', inplace = True)

In [20]:
sp500_values.head()

Unnamed: 0_level_0,day_0,day_1,day_2,day_3,day_4,day_5,day_6,day_7
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2021-01-27,,,,,,,,
2021-03-17,,,,,,,,
2021-04-28,,,,,,,,
2021-06-16,,,,,,,,
2021-07-28,,,,,,,,


In [21]:
# Add S&P500 values from FRED for each meeting's day and consecutive days:

for meeting_date in meeting_dates_obj_list:
    
    # Create a list of dates starting with the meeting day and ending 7 days later:
    all_dates = [meeting_date + timedelta(days=i) for i in range(0,8)]
    start = all_dates[0]
    end = all_dates[-1]
    
    # Retrieve FRED data for all days in list
    sp500_temp = web.DataReader('sp500', 'fred', start, end)
    
    # Change format of fred reply's index from datetime to date:
    f = lambda x: x.date()
    sp500_temp.index = [f(x) for x in sp500_temp.index]
    
    # For each date with a response, find its index in the consecutive days list and add it to the final df:
    for date_temp in sp500_temp.index:
        if date_temp in all_dates:
            indx = all_dates.index(date_temp)
            sp500_values.loc[meeting_date, sp500_values.columns[indx]] = sp500_temp.loc[date_temp].values[0]

In [22]:
sp500_values.head()

Unnamed: 0_level_0,day_0,day_1,day_2,day_3,day_4,day_5,day_6,day_7
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2021-01-27,3750.77,3787.38,3714.24,,,3773.86,3826.31,3830.17
2021-03-17,3974.12,3915.46,3913.1,,,3940.59,3910.52,3889.14
2021-04-28,4183.18,4211.47,4181.17,,,4192.66,4164.66,4167.59
2021-06-16,4223.7,4221.86,4166.45,,,4224.79,4246.44,4241.84
2021-07-28,4400.64,4419.15,4395.26,,,4387.16,4423.15,4402.66
