### LSE Data Analytics Online Career Accelerator

# Employer Project: Bank of England

## Creation of Economic Indicators Dataset

### Introduction

The Bank of England (BOE) plays a crucial role in maintaining economic stability and market confidence through its strategic use of speeches. These speeches influence financial markets, enhance policy predictability, and impact economic outcomes. This project aims to explore the dynamics between BOE speeches and the broader economic context, addressing key questions:

1. __Sentiment Over Time:__ How has sentiment expressed in BOE speeches evolved over time?
2. __Correlation with Events:__ What is the correlation between speech sentiment and significant economic events?
3. __Correlation with Economic Indicators:__ How does speech sentiment relate to economic indicators?
4. __Predictive Power:__ Can speech sentiment predict market behavior?

By analysing speech sentiments, we seek insights into whether BOE's communication is reactive, proactive, predictive, or prescriptive, and how it influences economic narratives, policies, and market reactions. Understanding this relationship is essential for comprehending BOE's impact on economic outcomes.

### Data Sourcing

Economic indicators investigated as part of this analysis:

1. GDP
2. GDP Growth
3. Unemployment 
4. Inflation Indices (CPI and RPI) 
5. Bank Rate/ Interest Rate
6. Inflation 
7. FTSE 100 Index

- The raw data for the economic indicators were sourced from reputable platforms, including the ONS, World Bank and investing.com. 
- GDP data is in US dollars for standardisation and ease of international comparison. 
- Data collection spanned from 1997 to 2022, aligning with the availability and relevance of BOE speeches.

### Data Cleaning

The team members responsible for data importation and cleaning, excel was utilised for a large part of the  initial pre-processing addressing missing values and inaccuracies through functions and filtering techniques. This work was not recorded, we do however have the final data set containing the economic indicators in our GitHub repository titled:

[Econ_indicators_final.xlsx]

Python was also used to standardise date formats to YYYY-MM in order to merge with all speeches dataset containing speeches delivered by the Bank of England.

Below shows the script used for importing some of the economic indicators we investigated.

## bank rates ##   
                                             

In [24]:
#Import data 
df_bank_rates = pd.read_excel('bank_rates.xlsx')

#change Nan to 0 
df_bank_rates= df_bank_rates.fillna(0)

#view
df_bank_rates

Unnamed: 0,Year,Day,Month,Rate %
0,2006.0,3.0,Aug,4.75
1,0.0,9.0,Nov,5.0
2,0.0,0.0,0,0.0
3,2007.0,11.0,Jan,5.25
4,0.0,10.0,May,5.5
5,0.0,5.0,Jul,5.75
6,0.0,6.0,Dec,5.5
7,0.0,0.0,0,0.0
8,2008.0,7.0,Feb,5.25
9,0.0,10.0,Apr,5.0


## CPIH DATA ## 

In [25]:
#import dataset
# view 

df_cpih_clean = pd.read_excel('cpih_clean.xlsx')

df_cpih_clean

Unnamed: 0,Year,Quarter,Rate%
0,2018,Q1,2.5
1,2018,Q2,2.2
2,2018,Q3,2.3
3,2018,Q4,2.1
4,2019,Q1,1.8
5,2019,Q2,2.0
6,2019,Q3,1.8
7,2019,Q4,1.4
8,2020,Q1,1.7
9,2020,Q2,0.8


## RPI ##

In [79]:
#Import data set 
#view 

df_RPI_Quaterly= pd.read_excel('RPI_Quaterly.xlsx')

df_RPI_Quaterly

Unnamed: 0,Year,Quater,Rate%
0,2000,Q1,2.3
1,2000,Q2,3.1
2,2000,Q3,3.2
3,2000,Q4,3.1
4,2001,Q1,2.6
...,...,...,...
90,2022,Q3,12.4
91,2022,Q4,13.9
92,2023,Q1,13.6
93,2023,Q2,11.2


In [80]:
#Import data set 
#view 

df_RPI= pd.read_excel('RPI.xlsx')

df_RPI

Unnamed: 0,Year,Month,Rate%
0,2000,JAN,2.0
1,2000,FEB,2.3
2,2000,MAR,2.6
3,2000,APR,3.0
4,2000,MAY,3.1
...,...,...,...
280,2023,MAY,11.3
281,2023,JUN,10.7
282,2023,JUL,9.0
283,2023,AUG,9.1


 ## Employment rate data ##
    
#Employment rate (aged 16 to 64, seasonally adjusted): %


In [26]:
#Import data set 
#view 

df_Employment_rate= pd.read_excel('Employment_rate.xlsx')

df_Employment_rate

Unnamed: 0,Year,Quarter,Rate%
0,2018,Q1,75.6
1,2018,Q2,75.5
2,2018,Q3,75.6
3,2018,Q4,75.8
4,2019,Q1,76.1
5,2019,Q2,76.1
6,2019,Q3,76.0
7,2019,Q4,76.5
8,2020,Q1,76.3
9,2020,Q2,75.7


## Unemployment rate ## 
Unemployment rate (aged 16 and over, seasonally adjusted): %






In [28]:
#Import data set 
#view 
df_Unemployment_rate_q= pd.read_excel('Unemployment_rate_quater.xlsx')

df_Unemployment_rate_q


Unnamed: 0,Year,Quarter,Rate%
0,2018,Q1,4.2
1,2018,Q2,4.0
2,2018,Q3,4.1
3,2018,Q4,4.0
4,2019,Q1,3.8
5,2019,Q2,3.9
6,2019,Q3,3.8
7,2019,Q4,3.8
8,2020,Q1,4.0
9,2020,Q2,4.1


In [37]:
#Import data set 
#view 

df_Unemployment_rate = pd.read_excel('Unemployment_rate.xlsx')

df_Unemployment_rate

Unnamed: 0,Year,Month,Rate%
0,1971,FEB,3.8
1,1971,MAR,3.9
2,1971,APR,4.0
3,1971,MAY,4.1
4,1971,JUN,4.1
...,...,...,...
624,2023,FEB,3.9
625,2023,MAR,3.8
626,2023,APR,4.0
627,2023,MAY,4.2


## GDP
Chained volume measures


In [41]:
#Import data set 
#view 

df_GDP= pd.read_excel('GDP.xlsx')

df_GDP

Unnamed: 0,Quarter,GDP%,GDP per head%
0,Q1 2018,0.1,-0.1
1,Q2 2018,0.5,0.4
2,Q3 2018,0.6,0.4
3,Q4 2018,0.2,0.1
4,Q1 2019,0.7,0.5
5,Q2 2019,-0.2,-0.3
6,Q3 2019,0.5,0.4
7,Q4 2019,0.0,-0.1
8,Q1 2020,-2.8,-3.0
9,Q2 2020,-19.5,-19.6


## Importing Economic Indicators

In [None]:
# Import data from 1997  
# import pandas
import pandas as pd

#define metrics 
df_uk_metrics = pd.read_excel('CPI_Unemployment_RPI_Bank rates_GDP.xlsx')

df_uk_metrics

In [None]:
# Convert 'Year' and 'Month' to the 'YYYY-MM' format
df_uk_metrics['YYYY-MM'] = pd.to_datetime(df_uk_metrics['Year'].astype(str) + '-' + df_uk_metrics['Month'], format='%Y-%b').dt.to_period('M')

# Reorder columns with 'YYYY-MM' as the first column
df_uk_metrics = df_uk_metrics[['YYYY-MM'] + [col for col in df_uk_metrics.columns if col not in ['Year', 'Month', 'YYYY-MM']]]

# Display the updated DataFrame
df_uk_metrics

In [None]:
#check for missing values 
missing_values = df_uk_metrics.isnull().sum()
print(missing_values)

In [None]:
#check for nan values 
nan_values = df_uk_metrics.isna().sum()
print(nan_values)