# **FED Textual Analysis Software (Beta version)**

We acknowledge the support of Elia Landini, Jessie Cameron & Lina Avril (Pantheon-Sorbonne University) in the development of this project.

### **INTRODUCTION**

The project aims to conduct textual analysis of the Federal Reserve's (FED) Monetary Policy Reports through the deployment of Python-based software. This report is written semi-annually and provided to Congress containing discussions on the "conduct of monetary policy and economic developments and prospects for the future." 

First, we develop a web scraping script to extract textual data from the FED's website. Subsequently, we use the Natural Language Toolkit (NLTK) package to preprocess the text, including tokenization, stemming, and converting words to lowercase. Next, the Loughran McDonald Sentiment Dictionary is employed to transform the cleaned qualitative text data into a quantitative measure of the FED's communication tone. This communication measure is then be regressed against the output gap and inflation gap, obtained via API, to assess the sensitivity of the FED's communication to these macroeconomic variables. Throughout the project, we employ various visualisation and analysis packages to explore the data and conduct preliminary analysis. 

Finally, we plan to develop a user-friendly interface for easy access and interpretation of our findings. **IF WE GET TIME**

### **INSTALL PACKAGES**

In [3]:
!pip install pandas
!pip install matplotlib
!pip install requests-html
!pip install seaborn
!pip install numpy
!pip install schedule
!pip install statsmodels
!pip install reportlab
!pip install scipy
!pip install linearmodels
!pip install openai
!pip install fredapi



### **1. IMPORT TEXTUAL DATA: FED WEB SCRAPPING**

With the following function, we aim to retrieve and filter text-based sources concerning monetary policy decisions undertaken by the FED itself and released to the public on semiannual press conferences.  
Customize the function to scrape articles from the ECB/Eurostystem website within the folder named "Monetary Policy Report"
The function is also designed to include filtering options to select specific text-based sources according to topic and typology of the publication. However in our specific case we will be interesented only in semiannual reports concerning monetary policy.
Base URL-FED: https://www.federalreserve.gov/monetarypolicy/publications/mpr_default.htm

In [4]:
import requests
from bs4 import BeautifulSoup

In [20]:
# FED scraping function
# With the following function, we aim to retrieve and filter text-based sources concerning monetary policy decisions undertaken by the FED itself and released to the public on semiannual press conferences.  
# Customize the function to scrape articles from the ECB/Eurostystem website within the folder named "Monetary Policy Report"
# The function is also designed to include filtering options to select specific text-based sources according to topic and typology of the publication. However in our specific case we will be interesented only in semiannual reports concerning monetary policy.
# Base URL-FED: https://www.federalreserve.gov/monetarypolicy/publications/mpr_default.htm    

def fed_get_articles(topic, publication_type, sub_class):
    
    # Base URL settings 
    base_url = f"https://www.federalreserve.gov/{topic}/{publication_type}/mpr_default.htm"
    base_domain = f"https://www.federalreserve.gov"
    
    article_urls = []

    # From the base URL, we now extract all the available URLs on the page by deploying Request and BeatifulSoup packages 
    response = requests.get(base_url)
    soup = BeautifulSoup(response.content, "html.parser")
    print("Scraping URL:", base_url)

    # Find and filter article URLs, ruling out other structural URL non-inherent to the analysis
    # We are also interested only in full report publications and not summaries. In our case this difference is highlighted by the subclass "testimony" in which full report publications are stored
    # It is also worth noticing that the publications' URL has slightly changed throughout time (from 2016 onwards)
    for link in soup.find_all("a", href=True):
        article_url = link["href"]

        # To limit the research to the first 5 results, we may want to activate this loop
        # if len(article_urls) >= 5:  
            # break
        if not article_url.startswith("http"):
            article_url = base_domain + article_url
        
        if f"{sub_class}" in article_url:
            article_urls.append(article_url)

    # Delete the first element of the article_urls list representing the general folder    
    for url in article_urls:     
        if url == f"https://www.federalreserve.gov/newsevents/{sub_class}.htm":
            article_urls.remove(url)
            break

    return article_urls

In [21]:
# Retrieving FED semiannual reports' URLs through the previous function by specifying the value of each parameter to fit our research scope
topic = "monetarypolicy"
publication_type = "publications"
sub_class = "testimony"

print(fed_get_articles(topic, publication_type, sub_class))

Scraping URL: https://www.federalreserve.gov/monetarypolicy/publications/mpr_default.htm
['https://www.federalreserve.gov/newsevents/testimony/powell20240306a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20230307a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20230621a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20220302a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20220622a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20210223a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20210714a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20200211a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20200616a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20190226a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20190710a.htm', 'https://www.federalreserve.gov/newsevents/testimony/powell20180226a.htm', 'https://w

### **Text-based sources retrieving**

In [7]:
import requests
from bs4 import BeautifulSoup

In [22]:
# Text-based soruces retrieving 
def fed_article_txt(article_urls):
    articles_text = []

    for article_url in article_urls:

        # Fetch article content (the "response" function takes the HTML text from the URL)
        response = requests.get(article_url)
        soup = BeautifulSoup(response.content, "html.parser")
        article_text = soup.get_text()
        articles_text.append(article_text)
    
    return articles_text

In [30]:
# Article 1 example
article_urls = fed_get_articles(topic, publication_type, sub_class)[0:1]
print(fed_article_txt(article_urls))

Scraping URL: https://www.federalreserve.gov/monetarypolicy/publications/mpr_default.htm
['\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nTestimony by Chair Powell on the semiannual Monetary Policy Report to the Congress  - Federal Reserve Board\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSkip to main content\n\n\n\nBack to Home\nBoard of Governors of the Federal Reserve System\n\n\nStay Connected\n\n\n\n\nFederal Reserve Facebook Page\n\n\n\n\nFederal Reserve Instagram Page\n\n\n\n\nFederal Reserve YouTube Page\n\n\n\n\nFederal Reserve Flickr Page\n\n\n\n\nFederal Reserve LinkedIn Page\n\n\n\n\nFederal Reserve Threads Page\n\n\n\n\nFederal Reserve Twitter Page\n\n\n\n\nSubscribe to RSS\n\n\n\n\nSubscribe to Email\n\n\n\n\nRecent Postings\n\n\nCalendar\n\n\nPublications\n\n\nSite Map\n\n\nA-Z index\n\n\nCareers\n\n\nFAQs\n\n\nVideos\n\n\nContact\n\n\n\n\nSearch\n\n\n\nSubmit Search Button\n\n\n\n\n\nAdvanced\n\n\n\n\nToggle Dropdown Menu\n\n\n\n\n\n\n\nBoard of Governors of the Federal Reserve System\r\n     

### **2. PREPARE AND CLEAN TEXTUAL DATA**

This section cleans the data so that is ready for analysis. It involves doing X, Y, Z.

#### 2.1 Remove Paragraph Headings

#### 2.2 Textual Adjustments

#### 2.3 Convert Text to Lowercase

#### 2.4 Remove Punctuation

#### 2.5 Remove Stopwords

#### 2.6 Lemmatisation

### **3. PRELIMINARY ANALYSIS TEXTUAL DATA**

In [18]:
# Jessie - I have added this to help with structure. 
# These analysis need to be done BEFORE Elia's cleaning just using the raw textual data before any cleaning
# Update the markdown code below once you add all the textual analysis variables (the ones in the thesis file and any other you think of!)
# Describe the dictionary used for stopwords etc and each line of code - the more comments the better 


This section focuses on creating new variables to analyse the textual data from the Federal Reserve (FED). Specifically, it calculates:

**1. Word Count**: Total number of words per statement.

**2. Sentence Count**: Total number of sentences per statement.

**3. Average Words per Sentence**: Average number of words per sentence.

**X. Ratio of Complex Words**: Share of complex words (words with three or more syllables) to the total word count.

**X. Ratio of Stop Words**: Proportion of stop words (common words like "and", "the", "is", etc.) to the total word count.

---

After computing these variables, the data are summarised using descriptive statistics tables and visually. This exploratory analysis provides insights into the textual characteristics of the FED data before conducting regression analysis.


#### 3.1 Create Textual Variables 

#### 3.1.1 Number of Meeting Minutes

#### 3.1.2 Word Count for Each Statement

#### 3.1.3 Sentence Count for Each Statement

#### 3.1.X Stop Words in Each Statement

#### 3.1.X Other Variables....

#### 3.2 Descriptive Statstics 

In [None]:
# Summary table of data
df.describe().round(2)

#### 3.3 Visualisation of Textual Variables

In [2]:
## Create charts - maybe instead of single charts you could find a package which lets you view charts side-by-side? Could make the code look cleaner? 
# e.g. word count over time, stop word ratio over time etc.

#### 3.4 Word Frequency

In [None]:
# table with top 20 words, word cloud map

### **4. CREATE COMMUNICATION VARIABLES**

This section converts the qualtative text data to quantiative measure of readability for analysis. 

#### 4.1 Readability Measure

#### 4.2 Sentiment Measure

### **5. IMPORT MACROECONOMIC DATA**

This section imports the macroeconomic variables using an API .... 

In [10]:
# Install Required Packages
!pip install pandas_datareader
import pandas as pd
from pandas_datareader import fred



In [13]:
# Grab data using FredReader
data = fred.FredReader(symbols=['GDPC1', 
                                'FPCPITOTLZGUSA'], 
                       start='1900-01-01', 
                       end=None).read()

# Save the data to a CSV file
data.to_csv('macro_vars.csv')

# Read the data back from the CSV file to a DataFrame
df_macro_vars = pd.read_csv('macro_vars.csv')

# Check data has imported correctly
print(df_macro_vars.tail())

           DATE      GDPC1  FPCPITOTLZGUSA
304  2023-01-01  22112.329             NaN
305  2023-04-01  22225.350             NaN
306  2023-07-01  22490.692             NaN
307  2023-10-01  22679.255             NaN
308  2024-01-01  22768.866             NaN


In [17]:
# Assuming df_vars is your DataFrame
new_column_names = ['date', 'GDP', 'CPI']

# Rename the columns
df_macro_vars = df_macro_vars.rename(columns=dict(zip(df_macro_vars.columns, new_column_names)))
df_macro_vars.head(5)

Unnamed: 0,date,GDP,CPI
0,1947-01-01,2182.681,
1,1947-04-01,2176.892,
2,1947-07-01,2172.432,
3,1947-10-01,2206.452,
4,1948-01-01,2239.682,


### **6. DESCRIPTIVE STATSTICS**

In [None]:
This section does... 

#### 6.1 Summary Table

#### 6.2 Density charts 

#### 6.3 Correlation Analysis

#### 6.4 Visualisation of Variables

### **7. REGRESSION ANALYSIS**

Description of model equations... 

#### 7.1 Model Specification

#### 7.2 Robustness Tests