# 📘 CA1 - Machine Learning Project

**Student Name:** Diogenes Costa Gomes  
**Student Number:** 2025018  
**Module Title:** Machine Learning  
**Assessment Type:** Continuous Assessment 1 (CA1)  
**Project Topic:** Forecasting Housing Growth Based on ESB Residential Connections in Ireland  
**Institution:** CCT College Dublin  
**Submission Date:** 18/04/2025  
**GitHub Repository:** https://github.com/Diogenes2-dcg86/Machine-Learning_CA1_Diogenes_Costa_Gomes  


# Housing Growth Forecasting - ESB Connections in Ireland

# Project Overview

In this project, we utilize real government data, specifically the monthly ESB residential connections in Ireland (2014–present) to analyse and predict where housing will grow over time across the region. These connections are of newly built and occupied homes across counties in Ireland.

# Dataset Details

- **Source**: [data.gov.ie](http://data.gov.ie/dataset/esb-connections-by-area-monthly-2014-to-date)  
- **Transformed version**: Wide format converted to long format to enable predictive analysis  
- **Rows**: 1,155  
- **Columns**: County, Date, Connections  
- **License**: Open Government Licence - Ireland

# Machine Learning Goal

Create and compare regression models, that would estimate the new residential connections per region and period. This insight writes strong for forecasting infrastructure and housing demand.

# Next Steps

- Exploratory Data Analysis (EDA)
- Model selection and training (e.g., Linear Regression vs Random Forest)
- Hyperparameter tuning and evaluation



# Housing Growth Forecasting - ESB Connections in Ireland

This notebook explores monthly residential electricity connections (ESB) across Irish counties from 2014 to date.
The goal is to predict housing growth per region using machine learning models.

# Step 1 

# Title: Data Preparation - Reshaping the Raw ESB Dataset into Long Format

This notebook is part of the CA1 Machine Learning Capstone Project at CCT College Dublin. The dataset contains information of individual counties in Ireland on the monthly new housing connections from 2014 onwards. However, the original dataset is in a wide format with each month as a column, which does not lend itself well to analysis techniques or machine learning models. Therefore, the first thing we must do is clean and reshape the data into a long format we can subsequently use for time based analyses, plotting and training models.

In [31]:
# Import necessary library
import pandas as pd


In [33]:
# Step 1: Load the raw CSV file
# header=1 is used to skip the first row which contains metadata
df = pd.read_csv("esb_connections_by_area_monthly_2014_to_date.csv", header=1)


In [35]:
# Step 2: Rename the first column to 'County'
df = df.rename(columns={"Unnamed: 0": "County"})


In [37]:
# Step 3: Remove the first data row containing 'County Councils', which is not a valid record
df = df[df["County"] != "County Councils"]


In [39]:
# Step 4: Drop all columns that are completely empty (optional but useful cleanup)
df = df.dropna(axis=1, how="all")


In [41]:
# Step 5: Reshape the data to long format
# This turns all the month-year columns into a single column ('Date') with their corresponding values ('Connections')
df_long = df.melt(id_vars="County", var_name="Date", value_name="Connections")


In [43]:
# Step 6: Remove any rows where 'Connections' is missing
df_long = df_long.dropna(subset=["Connections"])


In [45]:
# Step 7: Export the cleaned and reshaped data to a new CSV file
df_long.to_csv("esb_connections_long.csv", index=False)


In [55]:
# Step 8: Checking number of columns and rows.
df_long = pd.read_csv("esb_connections_long.csv")
df_long.shape

(1120, 3)

In [51]:
# Data Preparation - ESB Connections

df_long.head(10)

Unnamed: 0,County,Date,Connections
0,Carlow,2014,8
1,Cavan,2014,18
2,Clare,2014,31
3,Cork,2014,58
4,Donegal,2014,29
5,D/L Rathdown,2014,61
6,Fingal,2014,36
7,Galway,2014,36
8,Kerry,2014,22
9,Kildare,2014,28
