*This study was conducted for skills demonstration purposes only*

# **Forecasting the UK Construction Sector with Macroeconomic Indicators**
# Section 2. Data Collection

The UK construction sector is a critical component of the national economy, influenced by macroeconomic indicators such as GDP growth, interest rates, and inflation. To investigate these relationships and develop predictive models, this project requires a reliable dataset, which includes both macroeconomic and construction-specific variables. I this section we will collect time-series data from 2005 to 2025, sourced from authoritative repositories like the Office for National Statistics (ONS), the Bank of England (BoE), and the Department for Business & Trade. The data collection process will involve gathering 12 key indicators, including Consumer Price Index (CPIH), GDP, employment rates, construction output, and material prices, at varying frequencies (annual, quarterly, monthly). Custom Python functions will be created to retrieve and process data in multiple formats (CSV, ODS, XLS). This phase together with the subsequent data preprocessing establishes the foundation for addressing the research questions by securing high-quality, reliable data for exploratory analysis and modeling.

### 1. Data Requirements and Data Sources

We will collect time series data (2005–2025) on macroeconomic and construction sector indicators from official statistical sources.


| N | **Indicators** | Frequency| Data Source | Link          |
|--| ----------------------------------------------------- | -----| -----------| ---------------|
||**Macroeconomic Indicators (independent variables)**
|1| Consumer Price Index incl. housing (CPIH) / Inflation | annually, quarterly, monthly| Office for National Statistics (ONS) | [[1]](https://www.ons.gov.uk/economy/inflationandpriceindices/timeseries/l522/mm23) |
|2| GDP growth rate (UK, real) |annually, quarterly| Office for National Statistics (ONS) | [[2]](https://www.ons.gov.uk/economy/grossdomesticproductgdp/timeseries/ybha/pn2) 
|3|Employment rate or unemployment rate|annually, quarterly, monthly| Office for National Statistics (ONS)|[[3]](https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/timeseries/lf24/lms) |
|4|Interest rate (Bank of England base rate)|monthly| Bank of England | [[4]](https://www.bankofengland.co.uk/boeapps/database/) |
|5|Exchange rate (GBP/USD and GBP/EUR)|monthly| Bank of England | [[5]](https://www.bankofengland.co.uk/boeapps/database/) |
|6|Business investment (gross fixed capital formation)| quarterly|Office for National Statistics (ONS) | [[6]](https://www.ons.gov.uk/economy/grossdomesticproductgdp/timeseries/kg7p/)|
|7|Government spending|annually, quarterly, monthly| Office for National Statistics (ONS) | [[7]](https://www.ons.gov.uk/economy/governmentpublicsectorandtaxes/publicsectorfinance/timeseries/ebft/pusf)|
| | | ||
||**Construction Sector Indicators (dependent variables)**
|8| Construction output (total, residential, commercial) |annually, quarterly, monthly| Office for National Statistics (ONS) |[[8]](https://www.ons.gov.uk/datasets/output-in-the-construction-industry/editions/time-series/versions/44) |
|9| Construction material prices |monthly| Department for Business & Trade Gov.uk| [[9.1]](https://www.data.gov.uk/dataset/75ee36ed-21f7-4d7b-9e7c-f5bf4546145d/monthly_statistics_of_building_materials_and_components) , [[9.2]](https://webarchive.nationalarchives.gov.uk/ukgwa/timeline/https://www.gov.uk/government/collections/building-materials-and-components-monthly-statistics-2012) |
|10| Number of construction firms|annually|Office for National Statistics (ONS)| [[10]](https://www.ons.gov.uk/businessindustryandtrade/constructionindustry/datasets/constructionstatisticsannualtables) |
|11| Number of employees in the construction sector|annually|Office for National Statistics (ONS)| [[11]](https://www.ons.gov.uk/businessindustryandtrade/constructionindustry/datasets/constructionstatisticsannualtables) |
|12| Number and value of new construction contracts/orders |annually, quarterly|Office for National Statistics (ONS)| [[12]](https://www.ons.gov.uk/businessindustryandtrade/constructionindustry/datasets/newordersintheconstructionindustry/current) |

### 2. Tools and Libraries

In [253]:
pip install odfpy

Note: you may need to restart the kernel to use updated packages.


In [254]:
pip install ezodf odfpy

Note: you may need to restart the kernel to use updated packages.


In [255]:
pip install xlrd

Note: you may need to restart the kernel to use updated packages.


In [256]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
from io import BytesIO
from io import StringIO
import ezodf

### 3. Collecting Data

**Functions for collecting and primary cleaning data:**

*Office for National Statistics (ONS):*

In [257]:
# Collecting CSV data, skipping metadata rows (number_skiprows) to extract time series 

def get_csv_time_series(download_url, number_skiprows, column_list=None):
    response = requests.get(download_url)
        #Creating df:
    if response.ok:
        data = StringIO(response.text)
        df = pd.read_csv(data, skiprows=number_skiprows)
        if column_list == None:
            return df
        else:
            df.columns = column_list
        return df
    else:
        print("Download failed:", response.status_code)

In [258]:
# Collecting ODS files:

def get_ods_time_series(download_url, sheet_name):
    # Download the file into memory
    response = requests.get(download_url)
    ods_data = BytesIO(response.content)
    
    # Load the 'Table_1' sheet
    df = pd.read_excel(ods_data, engine="odf", sheet_name=sheet_name, header=None)

    return df

In [259]:
# Collecting XLS files:

def get_xls_time_series(download_url, sheet_name):
    # Download the file into memory
    response = requests.get(download_url)
    xls_data = BytesIO(response.content)
    
    # Load the 'Table_1' sheet
    df = pd.read_excel(xls_data, sheet_name=sheet_name, header=None)

    return df

*From Bank of England (BoE):*

In [260]:
# html data collection (web-scraping) function from Bank of England website:
def get_time_series_from_BoE(download_url, variable_name):
    headers = {
    'accept-language': 'en-US,en;q=0.9',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
    }

    response = requests.get(download_url, headers=headers)
    html = StringIO(response.text)

    df = pd.read_html(html)[0]
    df.columns = ["Date", variable_name]

    # Using explicit format to avoid warning
    df["Date"] = pd.to_datetime(df["Date"], format="%d %b %y")
    df[variable_name] = pd.to_numeric(df[variable_name], errors="coerce")
    df.sort_values("Date", inplace=True)

    return df

#### Indicator 1: Consumer Price Index incl. housing (CPIH)

- Source: Office for National Statistics (ONS)
- Frequency: annually, quarterly, monthly
- Coverage: Jan 1988 - May 2025
- CPIH INDEX 00: ALL ITEMS 2015=100
- Release date: 18-Jun-2025
- Next release: 16-Jul-2025

In [261]:
# Link for downloading
url_1 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/inflationandpriceindices/timeseries/l522/mm23'
columns = ['Date', 'Consumer Price Index incl. housing (CPIH), 2015=100']

# Downloading data
df_1 = get_csv_time_series(url_1, 7, columns)
print(df_1.head(5))

# Saving data
df_1.to_csv('df_1_raw.csv', index=False)

   Date  Consumer Price Index incl. housing (CPIH), 2015=100
0  1988                                               48.2  
1  1989                                               51.0  
2  1990                                               55.1  
3  1991                                               59.2  
4  1992                                               61.9  


#### Indicator 2: Gross Domestic Product (GDP): chained volume measures: Seasonally adjusted £m

- Source: Office for National Statistics (ONS)
- Frequency: annually, quarterly
- Coverage: Q1 1955 - Q1 2025
- Release date: 15-May-2025
- Next release: 30-Jun-2025

In [262]:
# Link for downloading
url_2 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/grossdomesticproductgdp/timeseries/abmi/pn2'
columns = ['Date', 'GDP, Seasonally adjusted £m']

# Downloading data
df_2 = get_csv_time_series(url_2, 7, columns)
print(df_2.head(5))

# Saving data
df_2.to_csv('df_2_raw.csv', index=False)

   Date  GDP, Seasonally adjusted £m
0  1948                       422621
1  1949                       436620
2  1950                       451212
3  1951                       467977
4  1952                       474994


#### Indicator 3: Employment rate (aged 16 to 64, seasonally adjusted): %

- Source: Office for National Statistics (ONS)
- Frequency: annually, quarterly, monthly
- Coverage: Feb 1971 - Mar 2025
- Release date: 10-Jun-2025
- Next release: 17-Jul-2025

In [263]:
# Link for downloading
url_3 = 'https://www.ons.gov.uk/generator?format=csv&uri=/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/timeseries/lf24/lms'
columns = ['Date', 'Employment rate (aged 16 to 64, seasonally adjusted), %']

# Downloading data
df_3 = get_csv_time_series(url_3, 7, columns)
print(df_3.head(5))

# Saving data
df_3.to_csv('df_3_raw.csv', index=False)

   Date  Employment rate (aged 16 to 64, seasonally adjusted), %
0  1971                                               71.8      
1  1972                                               72.0      
2  1973                                               72.9      
3  1974                                               73.0      
4  1975                                               72.6      


#### Indicator 4: Month average Bank Rate

- Source: Bank of England (BoE)
- Frequency: monthly
- Coverage: Jan 2000 - Jun 2025
- Release date: 30-Jun-2025
- Next release: 31-Jul-2025

In [264]:
# Link for downloading
url_4 = 'https://www.bankofengland.co.uk/boeapps/database/fromshowcolumns.asp?Travel=NIxIRxSUx&FromSeries=1&ToSeries=50&DAT=RNG&FD=1&FM=Jan&FY=2005&TD=1&TM=Feb&TY=2025&FNY=&CSVF=TT&html.x=257&html.y=39&C=KV&Filter=N'

# Downloading data
df_4 = get_time_series_from_BoE(url_4, 'Month average BoE Rate, %')
print(df_4.tail(5))

# Saving data
df_4.to_csv('df_4_raw.csv', index=False)

          Date  Month average BoE Rate, %
236 2024-09-30                     5.0000
237 2024-10-31                     5.0000
238 2024-11-30                     4.7976
239 2024-12-31                     4.7500
240 2025-01-31                     4.7500


#### Indicators 5.1 and 5.2: Exchange rates (GBP/USD and GBP/EUR)

- Source: Bank of England (BoE)
- Frequency: monthly average
- Coverage: Jan 2005 - Jul 2025
- Link update : 15-Jul-2025
- Release: every day

**GBP/EUR**

In [265]:
# Link for downloading
url_5_1 = 'https://www.bankofengland.co.uk/boeapps/database/fromshowcolumns.asp?Travel=NIxIRxSUx&FromSeries=1&ToSeries=50&DAT=RNG&FD=1&FM=Jan&FY=2005&TD=16&TM=Jul&TY=2025&FNY=&CSVF=TT&html.x=92&html.y=45&C=5UW&Filter=N'

# Downloading data
df_5_1 = get_time_series_from_BoE(url_5_1, 'GBP/EUR')
print(df_5_1.tail(5))

# Saving data
df_5_1.to_csv('df_5_1_raw.csv', index=False)

          Date  GBP/EUR
241 2025-02-28   1.2039
242 2025-03-31   1.1941
243 2025-04-30   1.1709
244 2025-05-31   1.1852
245 2025-06-30   1.1768


**GBP/USD**

In [266]:
# Link for downloading
url_5_2 = 'https://www.bankofengland.co.uk/boeapps/database/fromshowcolumns.asp?Travel=NIxIRxSUx&FromSeries=1&ToSeries=50&DAT=RNG&FD=1&FM=Jan&FY=2005&TD=31&TM=Dec&TY=2025&FNY=&CSVF=TT&html.x=128&html.y=48&C=5YE&Filter=N'

# Downloading data
df_5_2 = get_time_series_from_BoE(url_5_2, 'GBP/USD')
print(df_5_2.tail(5))

# Saving data
df_5_2.to_csv('df_5_2_raw.csv', index=False)

          Date  GBP/USD
241 2025-02-28   1.2545
242 2025-03-31   1.2911
243 2025-04-30   1.3131
244 2025-05-31   1.3366
245 2025-06-30   1.3566


#### Indicator 6: Business Investment (CVM, Seasonally Adjusted, in £ millions, % change)

- Source: Office for National Statistics (ONS)
- Frequency: quarterly
- Coverage: Q2 1997 - Q1 2025
- Release date: 15-May-2025
- Next release: 30-Jun-2025

In [267]:
# Link for downloading
url_6 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/grossdomesticproductgdp/timeseries/kg7p/pn2'
columns = ['Date', 'Business Investment, CVM, SA, in £m, % change']

# Downloading data
df_6 = get_csv_time_series(url_6, 7, columns)
print(df_6.head(5))

# Saving data
df_6.to_csv('df_6_raw.csv', index=False)

      Date  Business Investment, CVM, SA, in £m, % change
0  1997 Q2                                            4.4
1  1997 Q3                                            6.1
2  1997 Q4                                            2.6
3  1998 Q1                                            1.6
4  1998 Q2                                           -0.4


#### Indicator 7: Total managed expenditure: £m

- Source: Office for National Statistics (ONS)
- Frequency: annually, quarterly, monthly
- Coverage: Apr 1997 - May 2025
- Release date: 20-Jun-2025
- Next release: 20-Jul-2025

In [268]:
# Link for downloading
url_7 = 'https://www.ons.gov.uk/generator?format=csv&uri=/economy/governmentpublicsectorandtaxes/publicsectorfinance/timeseries/ebft/pusf'
columns = ['Date', 'Total managed expenditure: £m']

# Downloading data
df_7 = get_csv_time_series(url_7, 7, columns)
print(df_7.head(5))

# Saving data
df_7.to_csv('df_7_raw.csv', index=False)

   Date  Total managed expenditure: £m
0  1946                           4353
1  1947                           3925
2  1948                           4327
3  1949                           4655
4  1950                           4768


#### Indicator 8: Construction output (Seasonally Adjusted, total, residential, commercial): £m

- Source: Office for National Statistics (ONS)
- Frequency: annualy, quarterly, monthly (from 2010)
- Coverage: Q1 1997 - May 2025
- Release date: 11-Jul-2025
- Next release: 14-Aug-2025

In [269]:
# Link for downloading
url_8 = 'https://static.ons.gov.uk/datasets/f829c881-2ec5-403a-b9ed-8a800acd3aa5/output-in-the-construction-industry-time-series-v44-filtered-2025-07-14T12-20-00Z.csv'

# Downloading data
df_8 = get_csv_time_series(url_8, 0)
print(df_8.head(5))

# Saving data
df_8.to_csv('df_8_raw.csv', index=False)

      v4_1  Data Marking years-quarters-months        Time  \
0   1570.0           NaN              2013-aug  2013 - Aug   
1   2588.0           NaN              2013-aug  2013 - Aug   
2    393.0           NaN              2013-aug  2013 - Aug   
3  13367.0           NaN              2013-aug  2013 - Aug   
4   2593.0           NaN              2013-aug  2013 - Aug   

  administrative-geography      Geography  seasonal-adjustment  \
0                K03000001  Great Britain  seasonal-adjustment   
1                K03000001  Great Britain  seasonal-adjustment   
2                K03000001  Great Britain  seasonal-adjustment   
3                K03000001  Great Britain  seasonal-adjustment   
4                K03000001  Great Britain  seasonal-adjustment   

    SeasonalAdjustment construction-series-type SeriesType  \
0  Seasonally adjusted           pounds-million   £million   
1  Seasonally adjusted           pounds-million   £million   
2  Seasonally adjusted           pounds-mill

#### Indicator 9: Construction material Price Indices

A Single Dataset covering the period from 2005 to 2025 was not found. Therefore, the dataset for this indcator consists of **4 parts**:

**9_1. Construction Material price indices for 2020 - 2025 years**

- Source: Department for Business & Trade Gov.uk
- Frequency: monthly
- Coverage: Jan 2020 - Jan 2025
- Release date: 02-Jul-2025
- Next release: 06-Aug-2025
- Base Index: 2015 = 100

In [270]:
# Link for downloading
url_9_1 = "https://assets.publishing.service.gov.uk/media/68385e045150d70c85aafadb/Construction_building_materials_-_tables_May_2025.ods"

# Downloading data
df_9_1_raw = get_ods_time_series(url_9_1, "1a")
print(df_9_1_raw.tail(10))

# Saving data
df_9_1.to_csv('df_9_1_raw.csv', index=False)

       0             1      2      3      4      5
58  2024         April    153  156.2  152.6  153.7
59  2024           May  154.5  156.3    154  154.4
60  2024          June  154.7    156  154.3  154.5
61  2024          July  153.9  155.4  153.5  154.1
62  2024        August  153.6  154.9  153.3  153.8
63  2024     September  153.3  154.4  152.8    153
64  2024       October  152.9  153.7  152.1  152.2
65  2024  November [p]    153    154  152.1  152.9
66  2024  December [p]  152.9  152.9  152.1  152.1
67  2025   January [p]  152.4  153.2  151.8  151.8


**9_2. Construction Material price indices for 2015 - 2020 years**

- Source: Department for Business & Trade Gov.uk
- Frequency: monthly
- Coverage: Jan 2015 - Dec 2020
- Release date: 03-Feb-2021
- Base Index: 2015 = 100

In [271]:
# Link for downloading
url_9_2 = "https://assets.publishing.service.gov.uk/media/6017be098fa8f53fc62c5886/21-cs2_-_Construction_Building_Materials_-_Tables_January_2021.ods"

# Downloading data
df_9_2_raw = get_ods_time_series(url_9_2, "Table_1")
print(df_9_2_raw.iloc[9:-10, :])

# Saving data
df_9_2.to_csv('df_9_2_raw.csv', index=False)

                     0      1   2      3   4      5   6      7   8      9   \
9               January  100.5 NaN   98.5 NaN  102.8 NaN  107.7 NaN  111.7   
10             February  100.6 NaN   99.1 NaN  103.9 NaN  109.3 NaN  112.4   
11                March  100.6 NaN   98.9 NaN  104.3 NaN  109.3 NaN  113.4   
12                April  100.6 NaN   99.4 NaN  104.8 NaN  109.7 NaN  113.1   
13                  May  100.9 NaN   99.8 NaN  105.2 NaN  110.1 NaN  112.8   
..                  ...    ...  ..    ...  ..    ...  ..    ...  ..    ...   
65                 June  100.6 NaN   99.5 NaN  103.9 NaN  109.7 NaN  112.3   
66                 July  100.6 NaN   99.6 NaN  104.1 NaN  110.1 NaN  112.7   
67               August   99.8 NaN   99.8 NaN  104.4 NaN  110.4 NaN  112.2   
68            September   99.1 NaN  100.2 NaN  105.1 NaN  111.0 NaN  112.3   
69              October   98.7 NaN  100.2 NaN  106.0 NaN  111.0 NaN  111.9   

    10     11   12  
9  NaN  110.9  NaN  
10 NaN  111.4  NaN  


**9_3. Construction Material price indices for 2010 - 2015 years**

- Source: Department for Business & Trade Gov.uk
- Frequency: monthly
- Coverage: Jan 2010 - Dec 2015
- Release date: 03-Feb-2016
- Base Index: 2010 = 100

In [272]:
# Link for downloading
url_9_3 = "https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/497165/15-313n_-_Construction_Building_Materials_-_Open_Document_January_2016.ods"

# Downloading data
df_9_3_raw = get_ods_time_series(url_9_3, "Table_1")
print(df_9_3_raw.iloc[9:-10, :])

# Saving data
df_9_3.to_csv('df_9_3_raw.csv', index=False)

                             0           1   2           3   4           5   \
9                       January   96.100000 NaN  102.200000 NaN  106.000000   
10                     February   96.500000 NaN  103.100000 NaN  106.400000   
11                        March   96.900000 NaN  103.700000 NaN  107.600000   
12                        April   98.400000 NaN  104.200000 NaN  107.900000   
13                          May   99.900000 NaN  104.800000 NaN  107.900000   
14                         June  100.800000 NaN  105.700000 NaN  107.300000   
15                         July  101.300000 NaN  106.000000 NaN  107.300000   
16                       August  101.700000 NaN  106.600000 NaN  107.200000   
17                    September  102.000000 NaN  106.500000 NaN  107.100000   
18                      October  102.000000 NaN  106.700000 NaN  107.200000   
19                     November  102.300000 NaN  106.500000 NaN  107.300000   
20                     December  102.000000 NaN  106

**9_4. Construction Material price indices for 2006 - 2011 years**

- Source: Department for Business & Trade Gov.uk
- Frequency: monthly
- Coverage: Jan 2006 - Dec 2011
- Release date: 01-Feb-2012
- Base Index: 2005 = 100

In [273]:
# Link for downloading
url_9_4 = 'https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/16051/Building_materials_and_components_-_January_2012.xls'

# Downloading data
df_9_4_raw = get_xls_time_series(url_9_4, "Table 1")
print(df_9_4_raw.iloc[9:-10, :])

# Saving data
df_9_4.to_csv('df_9_4_raw.csv', index=False)

                             0           1   2           3   4           5   \
9                       January  101.600000 NaN  109.800000 NaN  115.300000   
10                     February  102.600000 NaN  111.700000 NaN  116.800000   
11                        March  103.400000 NaN  112.400000 NaN  117.500000   
12                        April  103.600000 NaN  112.300000 NaN  118.200000   
13                          May  104.600000 NaN  113.700000 NaN  119.000000   
14                         June  105.800000 NaN  114.400000 NaN  118.700000   
15                         July  106.500000 NaN  115.200000 NaN  120.000000   
16                       August  107.600000 NaN  116.300000 NaN  120.500000   
17                    September  107.700000 NaN  116.300000 NaN  120.400000   
18                      October  108.800000 NaN  116.400000 NaN  121.100000   
19                     November  109.500000 NaN  116.000000 NaN  120.400000   
20                     December  110.000000 NaN  115

#### Indicator 10: Number of construction firms

- Source: Office for National Statistics (ONS)
- Frequency: annually
- Coverage: 1997 - 2023
- Release date: 22-Nov-2024
- Next release: unknown

In [274]:
# Link for downloading
url_10 = 'https://www.ons.gov.uk/file?uri=/businessindustryandtrade/constructionindustry/datasets/constructionstatisticsannualtables/2023/csatablesaccessiblefinal.xlsx'

# Downloading data
df_10_raw = get_xls_time_series(url_10, "Table 3.1")
print(df_10_raw.iloc[9:-10, :])

# Saving data
df_10.to_csv('df_10_raw.csv', index=False)

      0      1      2      3      4      5      6      7      8      9   ...  \
9    2-3  47644  47918  49350  48773  50653  50306  53022  55027  57320  ...   
10   4-7  15737  16391  16969  16584  22455  23963  25704  26865  28435  ...   
11  8-13   3787   3988   4148   3790   8044   9819  10508  10982  11599  ...   

         18       19       20       21       22       23       24        25  \
9   66135.0  72128.0  76845.0  82783.0  88297.0  91843.0  94651.0  100218.0   
10  29142.0  30855.0  32339.0  33933.0  35434.0  36071.0  36725.0   38353.0   
11  11455.0  11923.0  12255.0  12665.0  12890.0  12908.0  13269.0   13509.0   

          26        27  
9   103938.0  103337.0  
10   41044.0   40345.0  
11   14378.0   14283.0  

[3 rows x 28 columns]


In [275]:
df_10_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 28 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       22 non-null     object 
 1   1       16 non-null     object 
 2   2       16 non-null     object 
 3   3       16 non-null     object 
 4   4       16 non-null     object 
 5   5       16 non-null     object 
 6   6       16 non-null     object 
 7   7       16 non-null     object 
 8   8       16 non-null     object 
 9   9       16 non-null     object 
 10  10      16 non-null     object 
 11  11      16 non-null     object 
 12  12      16 non-null     object 
 13  13      16 non-null     object 
 14  14      16 non-null     float64
 15  15      16 non-null     float64
 16  16      16 non-null     float64
 17  17      16 non-null     float64
 18  18      16 non-null     float64
 19  19      16 non-null     float64
 20  20      16 non-null     float64
 21  21      16 non-null     float64
 22  22  

#### Indicator 11: Number of employees in the construction sector

- Source: Office for National Statistics (ONS)
- Frequency: annually
- Coverage: 1997 - 2023
- Release date: 22-Nov-2024
- Next release: unknown
- Employees by businesses classified to construction  - 3rd quarter each year: thousands

In [276]:
# Link for downloading
url_11 = 'https://www.ons.gov.uk/file?uri=/businessindustryandtrade/constructionindustry/datasets/constructionstatisticsannualtables/2023/csatablesaccessiblefinal.xlsx'

# Downloading data
df_11_raw = get_xls_time_series(url_11, "Table 3.4")
print(df_11_raw.iloc[9:, :])

# Saving data
df_11.to_csv('df_11_raw.csv', index=False)

                0      1      2      3      4      5      6       7       8   \
9             8-13   39.7   41.6   48.0   45.3   93.7  106.7   112.6    99.9   
10           14-24   56.0   59.5   63.2   65.7   97.9  103.6   104.2    99.7   
11           25-34   34.5   35.1   36.0   40.2   58.6   47.7    56.5    52.8   
12           35-59   51.9   56.6   58.1   53.7   62.6   78.8    82.7    99.5   
13           60-79   27.1   28.8   29.3   28.1   28.8   29.0    37.7    36.0   
14          80-114   28.3   30.9   31.7   29.2   32.2   37.6    45.5    43.8   
15         115-299   68.1   71.1   78.5   68.8   80.9   87.6   100.7    93.4   
16         300-599   45.5   52.1   47.3   43.9   47.0   47.2    60.0    57.6   
17       600-1,199   50.8   49.1   51.6   50.4   44.5   49.3    61.9    54.2   
18  1,200 and over   76.3   82.0   92.3   76.0   98.4  120.7   165.0   149.6   
19       All firms  778.5  813.6  958.8  945.9  972.6  989.9  1142.8  1074.1   

        9   ...      18      19      20

#### Indicator 12: Number and value of new construction contracts/orders
- Source: Office for National Statistics (ONS)
- Frequency: annually, quarterly
- Coverage: 1964 Q1 - 2025 Q1
- Release date: 15-May-2025
- Next release: 14-Aug-2025
- Seasonally adjusted data
- £ million

In [277]:
# Link for downloading
url_12 = 'https://www.ons.gov.uk/file?uri=/businessindustryandtrade/constructionindustry/datasets/newordersintheconstructionindustry/current/bulletindataset7.xlsx'

# Downloading data
df_12_raw = get_xls_time_series(url_12, "Table 2")
print(df_12_raw.iloc[9:-10, :])

# Saving data
df_12.to_csv('df_12_raw.csv', index=False)

                  0      1      2      3     4      5      6      7      8   \
9               1968  14992  25488  40480   NaN  22072  10244   9626  41942   
10              1969  11188  21315  32503   NaN  21375  11043  10024  42442   
11              1970  11079  22312  33391   NaN  19554  10339  11591  41484   
12              1971   9322  25603  34925   NaN  23656   7447  14275  45378   
13              1972   9815  30499  40314   NaN  24731   7449  14371  46551   
..               ...    ...    ...    ...   ...    ...    ...    ...    ...   
296  Jul to Sep 2021    431   3589   4020  1655   1469   1498   3276   7898   
297  Oct to Dec 2021    323   3515   3838  2184   1521   2139   3908   9752   
298  Jan to Mar 2022    337   3414   3750  2378   1307   1350   4059   9094   
299  Apr to Jun 2022    375   3251   3626  2455   1464   1540   2903   8362   
300  Jul to Sep 2022    301   3152   3453  2603   1504   1642   3880   9629   

        9      10     11  
9    82422 -0.096    NaN

### 4. Conclusion
Time-series data for 12 indicators were successfully collected from ONS, BoE, and other sources, covering 2005–2025. Using Python libraries (pandas, requests, ezodf), data in CSV, ODS, and XLS formats were retrieved and prepared for preprocessing. This phase ensures a robust dataset for analyzing macroeconomic impacts on the UK construction sector, with the next step involving data cleaning and standardization.

### Authors


[Alisa Makhonina](https://www.linkedin.com/in/alisa-makhonina-data-science/) Data scientist with over 8 years of experience in construction cost engineering. Structural Engineering graduate.