# **ECB Text Score Software (Beta version)**

We acknowledge the support of Elia Landini, Jessie Cameron & Lina Abril (Pantheon-Sorbonne University) in the development of this project.

### **Introduction**

The project aims to conduct textual analysis of the European Central Bank's (ECB) Monetary Policy Statements through the deployment of Python-based software. These monetary policy decisions take place every six months and following the meeting, the President and the Vice President of the ECB explain the decision at the press conference and answer questions from journalists. Firstly, we will develop a web scraping script to extract textual data from the ECB's website. Subsequently, we'll use the Natural Language Toolkit (NLTK) package to preprocess the text, including tokenization, stemming, and converting words to lowercase. Next, the Loughran McDonald Sentiment Dictionary will be employed to transform the cleaned qualitative text data into a quantitative measure of the ECB's communication tone. This communication measure will then be regressed against the output gap and inflation gap, obtained via API, to assess the sensitivity of the ECB's communication to these macroeconomic variables. Throughout the project, we'll employ various visualization and analysis packages to explore the data and conduct preliminary analysis. Finally, we plan to develop a user-friendly interface for easy access and interpretation of our findings..

### **Preliminary steps**

In [132]:
!pip install pandas
!pip install matplotlib
!pip install requests-html
!pip install seaborn
!pip install numpy
!pip install schedule
!pip install statsmodels
!pip install reportlab
!pip install scipy
!pip install linearmodels
!pip install openai
!pip install selenium



### **ECB web scraping**

In [133]:
import requests
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
import re

In [134]:
# General scraping function
# Customize the function to scrape articles from the ECB/Eurostystem website within the folder named "All news & publications"
# The function is also designed to include filtering options to select specific text-based sources, according to topic, pubblication year, board member & categorization. 
# However, with the following scraping fucntion we cannot directly identify our target articles, as these latter are additionaly clustered into sub-folders. With general_get_articles we aim indeed to identify these mentioned folders (or indexes) to then extrapolate our target articles' URLs. 
# Base URL-ECB: https://www.ecb.europa.eu/press/pubbydate/html/index.en.html?  

def general_get_articles_form1(topic, categorization, year):
    
    # Base URLs settings according to potential different inputs
    if topic=="All" and categorization=="All" and year=="All":
        url = f"https://www.ecb.europa.eu/press/pubbydate/html/index.en.html?"
          
    elif topic=="All" and categorization=="All":
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?year={year}"

    elif topic=="All" and year=="All":
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?name_of_publication={categorization}"
        
    elif categorization=="All" and year=="All":
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?topic={topic}"

    else:
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?topic={topic}&name_of_publication={categorization}&year={year}"
    
    article_urls = []

    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    print("Scraping URL:", url)

    # Find and extract article URLs
    for link in soup.find_all("a", href=True):

        # To limit the research to the first 5 results, we may want to activate this loop
        # if len(article_urls) >= 5:  
            # break

        article_url = link["href"]
        print("Found link:", article_url)
        article_urls.append(article_url)

    return article_urls

In [135]:
# Running the general scraping function to get infos on available folders without restrictions on parameters 
topic = "All"
categorization = "All"
year = "All"

print(general_get_articles_form1(topic, categorization, year))

Scraping URL: https://www.ecb.europa.eu/press/pubbydate/html/index.en.html?
Found link: /home/html/index.en.html
Found link: /home/html/index.en.html
Found link: index.bg.html
Found link: index.cs.html
Found link: index.da.html
Found link: index.de.html
Found link: index.el.html
Found link: index.en.html
Found link: index.es.html
Found link: index.et.html
Found link: index.fi.html
Found link: index.fr.html
Found link: index.ga.html
Found link: index.hr.html
Found link: index.hu.html
Found link: index.it.html
Found link: index.lt.html
Found link: index.lv.html
Found link: index.mt.html
Found link: index.nl.html
Found link: index.pl.html
Found link: index.pt.html
Found link: index.ro.html
Found link: index.sk.html
Found link: index.sl.html
Found link: index.sv.html
Found link: /home/html/index.en.html
Found link: #
Found link: /mopo/html/index.en.html
Found link: /mopo/html/index.en.html
Found link: /ecb/educational/explainers/tell-me/html/what-is-monetary-policy.en.html
Found link: /mop

In [136]:
# Once identified the folder of our interest, we proceed to modify the previous function to return only valuable results in terms of articles'URL retrieving
# Our target folder is "/press/press_conference/monetary-policy-statement/html/index.en.html", but we will construct a borader and customizable function to extract article folders, according to the realated field.

def general_get_articles_form2(topic, categorization, year, field):
    
    # Base URLs settings according to potential different inputs
    if topic=="All" and categorization=="All" and year=="All":
        url = f"https://www.ecb.europa.eu/press/pubbydate/html/index.en.html?"
          
    elif topic=="All" and categorization=="All":
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?year={year}"

    elif topic=="All" and year=="All":
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?name_of_publication={categorization}"
        
    elif categorization=="All" and year=="All":
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?topic={topic}"

    else:
        base_url = "https://www.ecb.europa.eu/press"
        url = f"{base_url}/pubbydate/html/index.en.html?topic={topic}&name_of_publication={categorization}&year={year}"
    
    article_urls = []
    structural_url = "https://www.ecb.europa.eu/"

    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    # Filter and extract articles URLs based on field
    for link in soup.find_all("a", href=True):
        article_url = link["href"]
        if f"/press/press_conference/{field}/html/" in article_url:
            article_urls.append(structural_url+article_url)

    return article_urls

In [137]:
# Extract monetary policy statements folder
topic = "All"
categorization = "All"
year = "All"
field = "monetary-policy-statement"

print(general_get_articles_form2(topic, categorization, year, field))

['https://www.ecb.europa.eu//press/press_conference/monetary-policy-statement/html/index.en.html']


In [138]:
# Monetary policy statements scraping function
# With the following function, we aim to retrieve and filter text-based sources concerning monetary policy decisions undertaken by the ECB itself throughout the year.  
# indirect base URL-ECB for our specific analysis (monetary policy statements): https://www.ecb.europa.eu/press/press_conference/monetary-policy-statement/html/index.en.html   

def mps_get_articles(topic, categorization, year, field):
    
    # Base URLs settings 
    base_url_list = general_get_articles_form2(topic, categorization, year, field)
    base_url = base_url_list[0]
    url = f"{base_url}"
    
    article_urls = []

    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    print("Scraping URL:", url)

    # Find and extract article URLs
    for link in soup.find_all("a", href=True, class_="title"):

        # To limit the research to the first 5 results, we may want to activate this loop
        # if len(article_urls) >= 5:  
            # break

        article_url = link["href"]
        print("Found link:", article_url)
        article_urls.append(article_url)
    
    return article_urls

In [139]:
topic = "All"
categorization = "All"
year = "All"
field = "monetary-policy-statement"

print(mps_get_articles(topic, categorization, year, field))

Scraping URL: https://www.ecb.europa.eu//press/press_conference/monetary-policy-statement/html/index.en.html
[]


In [140]:
# 1) Function to retrieve articles related to a specific topic from the internet and returns a list of article URLs.
def get_articles(topic):

    # Customize this function to scrape articles from preferred sources
    if topic == "Japan's GDP":
        url = "https://asia.nikkei.com/Economy/Japan-avoids-recession-as-strong-capex-boosts-Q4-GDP"
    elif topic == "Interest rates in Japan":
        url = "https://www.asahi.com/ajw/articles/15194959"
    elif topic == "Inflation rate in Japan":
        url = "https://mainichi.jp/english/articles/20240311/p2g/00m/0bu/011000c"
    else:
        raise ValueError("Invalid topic")
    
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    # Find the first article URL associated with the topic
    article_link = soup.find("a", href=True)
    if article_link:
        article_url = article_link["href"]
        print("Found article URL:", article_url)
        return article_url
    else:
        print("No article URL found.")
        return None

    # Find and extract article URLs
    # article_url = soup.find("a", href=True)["href"]
    # article_urls = [link["href"] for link in soup.find_all("a", href=True)]
    #return article_url

In [141]:
# 2) Function to create a summary out of an article using ChatGPT
# https://github.com/openai/openai-python/blob/main/src/openai/types/chat/completion_create_params.py 

# here we include in a new function our previous output (article_urls) as a parameter
def summarize_article(article_url):

    # Fetch article content (the "response" function takes the HTML text from the URL)
    response = requests.get(article_url)
    soup = BeautifulSoup(response.content, "html.parser")
    article_text = soup.get_text()

    # Summarize using ChatGPT
    MODEL = "gpt-3.5-turbo"
    prompt = f"You are working for the Bank of Italy (Asia-Pacific delegation) and you are ask to summarize the following article in less than 30 words (mandatory), as part of a daily economic report on the Asia-Pacific area (in British English):\n{article_text}"
    summary = openai.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
        max_tokens = 20
    )
    return summary

In [142]:
# 3) Main function to summarize articles on specified topics
def main(topics):

    summarized_info = {}

    # for each topic, we retrieve its respective url
    for topic in topics:
        articles = get_articles(topic)
        summarized_info[topic] = []

    # for each url, we retrieve its respective summary
        for article_url in articles:
            summary = summarize_article(article_url)

            # we want to include in output both the source and its respective summary
            summarized_info[topic].append({
                "source": article_url,
                "summary": summary
            })
    return summarized_info

if __name__ == "__main__":

    # List of topics to search for articles
    topics = ["Inflation rate in Japan", "Interest rates in Japan", "Japan's GDP"]
    summarized_info = main(topics)
    print(summarized_info)

Found article URL: #main


MissingSchema: Invalid URL '#': No scheme supplied. Perhaps you meant https://#?

### CHECK - JESSIE 