In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin 
import re
import time
import nltk
from nltk.tokenize import word_tokenize
import warnings
pd.set_option("display.precision", 5)
warnings.filterwarnings("ignore")

# HW 1 TOPICS IN ECONOMICS - GROUP 2

<span style="color:blue">Aman Krishna </span> <br>
<br>
<span style="color:#406A5F">Tim Taylor </span> <br>
<br>
<span style="color:purple">Yazmin Ramirez Delgado </span>

## 1 Measuring Hawkish/Dovish Tone of FOMC Statements (100 points)

### 1. 

Scrape the text of the FOMC statements from January 2000 to present. You will need to use https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm for 2018-2023 and https://www.federalreserve.gov/monetarypolicy/fomc_historical_year.htm for 2000-2017. See Appendix A for hints on scraping.

(a) How many statements do you obtain?

In [19]:
import requests
from bs4 import BeautifulSoup
import statistics

# Function to scrape FOMC statements from a given URL (NEW)
def scrape_statements_new(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    statements = []

    # Find and collect the links to FOMC statements
    links = soup.find_all('a', href=True)
    for link in links:
        if "fomcpresconf" in link['href']:
            statement_url = urljoin(url, link['href'])  # Combine with base URL
            statements.append(statement_url)

    return statements

# Function to scrape older FOMC statments which are kind of archived (OLD)
def scrape_statements_old(url="https://www.federalreserve.gov/monetarypolicy/fomchistorical", year=2017):
    url = url + str(year) + ".htm"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    statements = []
    for link in soup.find_all('a', href=True):
        if link.text.lower() == "statement":
            statement_url = urljoin(url, link['href'])
            statements.append(statement_url)
            
    return statements

# Function to calculate the number of words in a text
def count_words(text):
    words = text.split()
    return len(words)

# Function to calculate summary statistics
def calculate_summary_statistics(word_counts):
    mean = statistics.mean(word_counts)
    stdev = statistics.stdev(word_counts)
    minimum = min(word_counts)
    q1 = statistics.quantiles(word_counts, n=4)[0]
    median = statistics.median(word_counts)
    q3 = statistics.quantiles(word_counts, n=4)[2]
    maximum = max(word_counts)
    
    return mean, stdev, minimum, q1, median, q3, maximum

# Define the base URLs for the different time periods
base_url_2000_2017 = "https://www.federalreserve.gov/monetarypolicy/fomchistorical"
base_url_2018_2023 = "https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"

# Scrape statements for 2018-2023
statements_2018_2023 = scrape_statements_new(base_url_2018_2023)
statements_2000_2017 = []
for year in range(2000, 2018):
    statements_2000_2017 += scrape_statements_old(base_url_2000_2017, year)

# Combine the statements from both time periods
all_statements = statements_2000_2017 + statements_2018_2023

print("Number of statements: ", len(all_statements))

Number of statements:  196


(b) Provide summary statistics (mean, standard deviation, minimum, first quartile, median, third quartile, maximum) for the number of words in each statement.

In [22]:
# Initialize a list to store word counts for each statement
word_counts = []

# Loop through the list of statement URLs, download the statements, and count words
for statement_url in all_statements:
    statement_response = requests.get(statement_url)
    statement_soup = BeautifulSoup(statement_response.text, 'html.parser')
    statement_text = statement_soup.get_text()
    word_count = count_words(statement_text)
    word_counts.append(word_count)

# Calculate summary statistics
mean, stdev, minimum, q1, median, q3, maximum = calculate_summary_statistics(word_counts)

# Print the summary statistics
print("Summary Statistics for the Number of Words in FOMC Statements:")
print(f"Mean: {mean}")
print(f"Standard Deviation: {stdev}")
print(f"Minimum: {minimum}")
print(f"First Quartile: {q1}")
print(f"Median: {median}")
print(f"Third Quartile: {q3}")
print(f"Maximum: {maximum}")

Summary Statistics for the Number of Words in FOMC Statements:
Mean: 1112.0561224489795
Standard Deviation: 546.4454667358578
Minimum: 160
First Quartile: 334.0
Median: 1208.5
Third Quartile: 1531.75
Maximum: 1950


### 2.

Use the methodology described in section 3.1 (pages 4-8) of Tadle (2022) to measure the tone of each speech.2 Download the Fed Funds Effective Rate from https://fred.stlouisfed.org/series/DFF. Plot both both the statement tone and the Fed Funds Effective Rate over time.

### 3.

Comment on the Tadle (2022) methodology. What do you like about it? What are its shortcomings?

### 4.

Describe and implement a different way to measure hawkish/dovish tone of FOMC statements. How does your alternative measure address some of the shortcomings in the Tadle (2022) method? What is the correlation between the Tadle (2022) measure and your measure?

### 5.

Redo the plot from problem 2, adding your tone measure. Your plot should include the Tadle (2022) measure, your measure and the Fed Funds Effective Rate.

### 6.

Complete this problem twice: once with the Tadle (2022) measure of hawkishness and once with your measure. The steps will guide you through using the Fama and MacBeth (1973) procedure to estimate the monetary policy risk premium in industry returns data. In particular:

(a) Estimate an AR(1) model from the hawkishness data. Compute the residual. We will call this the “text-based monetary policy shock.”

(b) Use the value-weighted returns from the daily industry returns file for this problem. For each of the 49 industries, regress returns on the day of the Fed announcement on the text-based monetary policy shock. Create a table with three columns: column 1 has the industry name, column 2 has the OLS regression coefficient and column 3 has the p-value for that coefficient. Sort the table from largest to smallest coefficient. To be concrete, the regression you are running at this stage is:

$ R_{it} = \alpha_{i} + \beta_{i} \Delta H A W K_{t} + \epsilon_{it} $

for each of the 49 industries, indexed by $i$. Time $t$ here indexes Fed announcement days and $ \Delta H A W K_{t} $ is the text-based monetary policy shock at time $ t $.

(c) Comment on the ordering of the industries. Is it in line with what you would have expected?

(d) Now turn to the monthly returns data. Again use the value-weighted returns. Separately for each month, regress returns of each industry on its “beta” from step (b). To be concrete, for each month, indexed by $T$, you are running the following regression:

$ R_{i,T} = \gamma_{T} + \lambda_{T} \beta_{i} + \eta_{iT} $

(e) What is the average $\lambda$ across all months? This is the risk premium associated with holding assets exposed to
monetary policy risk. Comment on how its sign can be interpreted.

(f) What is the standard deviation of $\lambda$? Use this to compute the $t-statistic$. Is the risk premium significantly different from zero at the 10%, 5% or 1% level?

## Reading Response (15 points)

### Optional for extra credit. Must be completed individually.

Read Aruoba and Drechsel (2022) and answer the following questions.

1. Describe the natural language processing technique that the authors use. What do you like about it? What are its shortcomings?

2. Summarize (in your own words) the findings discussed in Section 4.

3. What questions do you have after reading the paper?