# **ECB Text Score Software (Beta version)**

We acknowledge the support of Elia Landini, Jessie Cameron & Lina Avril (Pantheon-Sorbonne University) in the development of this project.

### **Introduction**

The project aims to conduct textual analysis of the European Central Bank's (ECB) Monetary Policy Statements through the deployment of Python-based software. These monetary policy decisions take place every six months and following the meeting, the President and the Vice President of the ECB explain the decision at the press conference and answer questions from journalists. Firstly, we will develop a web scraping script to extract textual data from the ECB's website. Subsequently, we'll use the Natural Language Toolkit (NLTK) package to preprocess the text, including tokenization, stemming, and converting words to lowercase. Next, the Loughran McDonald Sentiment Dictionary will be employed to transform the cleaned qualitative text data into a quantitative measure of the ECB's communication tone. This communication measure will then be regressed against the output gap and inflation gap, obtained via API, to assess the sensitivity of the ECB's communication to these macroeconomic variables. Throughout the project, we'll employ various visualization and analysis packages to explore the data and conduct preliminary analysis. Finally, we plan to develop a user-friendly interface for easy access and interpretation of our findings.

### **Preliminary steps**

In [1]:
!pip install pandas
!pip install matplotlib
!pip install requests-html
!pip install seaborn
!pip install numpy
!pip install schedule
!pip install statsmodels
!pip install reportlab
!pip install scipy
!pip install linearmodels
!pip install openai
!pip install wbgapi
!pip install fredapi



### **ECB web scraping**

In [3]:
import requests
from bs4 import BeautifulSoup
from datetime import datetime, timedelta

In [None]:
def get_articles1(topic):

    # Customize the function to scrape articles from your preferred sources
    # Set sources
    # URL-journal-examples: https://japannews.yomiuri.co.jp, 

    urls = [f"https://japannews.yomiuri.co.jp/?s={topic}", f"https://www.asahi.com/ajw/search/results/?keywords={topic}"]
    article_urls = []

    for url in urls:
        response = requests.get(url)
        soup = BeautifulSoup(response.content, "html.parser")
        print("Scraping URL:", url)

    # Date settings (days=how many days you want to go back in the past starting from today)
        today = datetime.today()
        day = today - timedelta(days = 0)
        day_date = day.strftime("%Y%m%d")
        DD_day_date = day.strftime("%d")
        print(day_date, DD_day_date)

    # if you want "today()" date: 
    ## today_date = datetime.today().strftime("%Y%m%d")
    # or you may set days = 0

    # Find and extract article URLs
        for link in soup.find_all("a", href=True):
            #if len(article_urls) >= 5:  # Limit to the first five articles
                #break
            article_url = link["href"]
            print("Found link:", article_url)
    
    # Filter out non-article URLs (e.g., navigation links) and keep only those article that has been published 
        if url.startswith("https://japannews.yomiuri.co.jp"):
            if article_url.startswith(f"https://japannews.yomiuri.co.jp/business/economy/{day_date}-"):
                article_urls.append(article_url)
                
        elif url.startswith("https://www.asahi.com"):
            if  article_url.startswith(f"/ajw/articles/{DD_day_date}"):   
                article_urls.append(article_url)
            
    return article_urls

In [None]:
japan_inflation = "japan+inflation+rate"
topic = japan_inflation

print(get_articles1(topic))

In [None]:
# 1) Function to retrieve articles related to a specific topic from the internet and returns a list of article URLs.
def get_articles(topic):

    # Customize this function to scrape articles from preferred sources
    if topic == "Japan's GDP":
        url = "https://asia.nikkei.com/Economy/Japan-avoids-recession-as-strong-capex-boosts-Q4-GDP"
    elif topic == "Interest rates in Japan":
        url = "https://www.asahi.com/ajw/articles/15194959"
    elif topic == "Inflation rate in Japan":
        url = "https://mainichi.jp/english/articles/20240311/p2g/00m/0bu/011000c"
    else:
        raise ValueError("Invalid topic")
    
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    # Find the first article URL associated with the topic
    article_link = soup.find("a", href=True)
    if article_link:
        article_url = article_link["href"]
        print("Found article URL:", article_url)
        return article_url
    else:
        print("No article URL found.")
        return None

    # Find and extract article URLs
    # article_url = soup.find("a", href=True)["href"]
    # article_urls = [link["href"] for link in soup.find_all("a", href=True)]
    #return article_url

In [None]:
# 2) Function to create a summary out of an article using ChatGPT
# https://github.com/openai/openai-python/blob/main/src/openai/types/chat/completion_create_params.py 

# here we include in a new function our previous output (article_urls) as a parameter
def summarize_article(article_url):

    # Fetch article content (the "response" function takes the HTML text from the URL)
    response = requests.get(article_url)
    soup = BeautifulSoup(response.content, "html.parser")
    article_text = soup.get_text()

    # Summarize using ChatGPT
    MODEL = "gpt-3.5-turbo"
    prompt = f"You are working for the Bank of Italy (Asia-Pacific delegation) and you are ask to summarize the following article in less than 30 words (mandatory), as part of a daily economic report on the Asia-Pacific area (in British English):\n{article_text}"
    summary = openai.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
        max_tokens = 20
    )
    return summary

In [None]:
# 3) Main function to summarize articles on specified topics
def main(topics):

    summarized_info = {}

    # for each topic, we retrieve its respective url
    for topic in topics:
        articles = get_articles(topic)
        summarized_info[topic] = []

    # for each url, we retrieve its respective summary
        for article_url in articles:
            summary = summarize_article(article_url)

            # we want to include in output both the source and its respective summary
            summarized_info[topic].append({
                "source": article_url,
                "summary": summary
            })
    return summarized_info

if __name__ == "__main__":

    # List of topics to search for articles
    topics = ["Inflation rate in Japan", "Interest rates in Japan", "Japan's GDP"]
    summarized_info = main(topics)
    print(summarized_info)