# Total Production Units for Self-Consumption

Master Data Science and Engineering - FEUP

**Group 4**

Beatriz Iara Nunes Silva

Inês Clotilde da Costa Neves

Mariana Rocha Cristino

Patrícia Crespo da Silva

## Methodology of Statistical Research

Based on a given dataset the six-step statistical investigation method will be applied:

1. **(2) Ask a research question**
2. **(1) Design a study and collect data**
3. **Explore the data**
4. **Draw inference**
5. **Formulate conclusions**
6. **Look back and ahead**

---


## Phase 1: Study Design and Data Collection

A research question is posed with a proposed data set: Total Production Units for Self-Consumption.  
The data was collected and provided by **e-Redes – Redes Energéticas Nacionais, S.A.**, the Portuguese  
electricity distribution company responsible for managing and monitoring electricity networks across Portugal.

**Dataset link:** [Total Production Units for Self-Consumption (e-Redes)](https://e-redes.opendatasoft.com/explore/dataset/8-unidades-de-producao-para-autoconsumo/information/)

---

## Phase 2: Research questions

### General Research Question

RQ: Compare how seasonal (winter vs summer), regional, and technical factors shape self-consumption energy production patterns in Portugal between 2023 and 2024.

### Specific Research Questions

• RQ1: Compare the average installed capacity per UPAC across different power levels and municipalities in 2023 and 2024.

• RQ2: Compare the evolution of installed capacity between 2023 and 2024 across residential and industrial UPACs to assess differences in growth patterns.

• RQ3: Compare the total installed capacity for self-consumption across different power scales (installed capacity ranges) and seasons (winter vs. summer) in selected Portuguese districts during 2023 and 2024.

---


### Imports

In [1]:
import pandas as pd


## Phase 3: Exploratory Data Analysis

Read the CSV

In [2]:
df = pd.read_csv('../Data/UPAC_Total_Production.csv', sep=';', decimal='.')

### 3.1. Initial Data Overview

First rows of the dataset:

In [3]:
print("First rows of the dataset:")
display(df.head())

First rows of the dataset:


Unnamed: 0,Quarter,District,Municipality,Parish,Zip Code,Technology Type,Voltage level,Installed power range (kW),Number of installations,Total installed power (kW),DistrictCode,Municipality Code,DistrictMunicipalityParishCode,CPEs (#),relacao_instalacoes_por_cpe,relacao_potencia_por_cpe
0,2023T1,Coimbra,Condeixa-a-Nova,Furadouro,3150,Solar,BTN,"]0, 4]",2,3.0,6,604,60407,9537.0,0.00021,0.000315
1,2023T1,Coimbra,Condeixa-a-Nova,Zambujal,3150,Solar,BTN,"]0, 4]",2,4.32,6,604,60410,9537.0,0.00021,0.000453
2,2023T1,Coimbra,Condeixa-a-Nova,Condeixa-a-Velha e Condeixa-a-Nova,3150,Não Atribuído,BTN,"]0, 4]",1,1.05,6,604,60411,9537.0,0.000105,0.00011
3,2023T1,Coimbra,Condeixa-a-Nova,Vila Seca e Bem da Fé,3150,Solar,BTN,"]0, 4]",17,28.14,6,604,60413,9537.0,0.001783,0.002951
4,2023T1,Coimbra,Figueira da Foz,São Pedro,3090,Não Atribuído,BTN,"]0, 4]",2,3.28,6,605,60514,50436.0,4e-05,6.5e-05


Dataset info:

In [4]:
print("\nDataset info:")
print(df.info())


Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 121294 entries, 0 to 121293
Data columns (total 16 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   Quarter                         121294 non-null  object 
 1   District                        121294 non-null  object 
 2   Municipality                    121294 non-null  object 
 3   Parish                          121294 non-null  object 
 4   Zip Code                        121294 non-null  int64  
 5   Technology Type                 121283 non-null  object 
 6   Voltage level                   121292 non-null  object 
 7   Installed power range (kW)      121294 non-null  object 
 8   Number of installations         121294 non-null  int64  
 9   Total installed power (kW)      121294 non-null  float64
 10  DistrictCode                    121294 non-null  int64  
 11  Municipality Code               121294 non-null  int64  
 12  D

Missing Values summary:

In [5]:
missing_df = pd.DataFrame({
    'Missing Values': df.isnull().sum(),
    'Percentage': (df.isnull().sum() / len(df)) * 100
})
print("\nMissing Values summary:")
display(missing_df[missing_df['Missing Values'] > 0])


Missing Values summary:


Unnamed: 0,Missing Values,Percentage
Technology Type,11,0.009069
Voltage level,2,0.001649


Summary statistics:

In [6]:
print("\nSummary statistics:")
display(df.describe())


Summary statistics:


Unnamed: 0,Zip Code,Number of installations,Total installed power (kW),DistrictCode,Municipality Code,CPEs (#),relacao_instalacoes_por_cpe,relacao_potencia_por_cpe
count,121294.0,121294.0,121294.0,121294.0,121294.0,121294.0,121294.0,121294.0
mean,4512.469149,17.989274,128.831695,9.463329,954.854098,45296.78327,0.000852,0.00619
std,1666.376721,59.894156,396.866964,5.338584,535.108914,60637.188219,0.002444,0.029443
min,1000.0,1.0,0.0,1.0,101.0,1260.0,3e-06,0.0
25%,3105.0,1.0,15.0,4.0,407.0,10873.0,3.6e-05,0.000496
50%,4600.0,2.0,30.35,10.0,1012.0,27430.0,0.000119,0.00151
75%,5160.0,9.0,87.4275,13.0,1318.0,57414.0,0.000509,0.004283
max,8970.0,2227.0,19600.0,18.0,1824.0,399456.0,0.059168,3.203661


Number of installations by District:

In [7]:
print("\nNumber of installations by District:")
print(df.groupby('District')['Number of installations'].sum())


Number of installations by District:
District
Aveiro              180569
Beja                 34205
Braga               269828
Bragança             32217
Castelo Branco       50869
Coimbra             115383
Faro                118203
Guarda               34548
Leiria              145569
Lisboa              255949
Portalegre           24479
Porto               306030
Santarém            139969
Setúbal             209972
Viana do Castelo     61271
Vila Real            55118
Viseu               107050
Évora                40762
Name: Number of installations, dtype: int64


Total installed power (kW) by District:

In [8]:
print("\nTotal installed power (kW) by District:")
print(df.groupby('District')['Total installed power (kW)'].sum())


Total installed power (kW) by District:
District
Aveiro              1931737.50
Beja                 397450.20
Braga               1836848.38
Bragança             156470.01
Castelo Branco       408961.60
Coimbra              798073.90
Faro                 725488.42
Guarda               206524.84
Leiria              1368945.83
Lisboa              1806371.79
Portalegre           188540.46
Porto               2282327.34
Santarém            1004364.54
Setúbal              909618.39
Viana do Castelo     358383.77
Vila Real            217550.71
Viseu                701931.05
Évora                326922.83
Name: Total installed power (kW), dtype: float64


Number of installations by Technology Type:

In [9]:
print("\nNumber of installations by Technology Type:")
print(df.groupby('Technology Type')['Number of installations'].sum())


Number of installations by Technology Type:
Technology Type
Biogás                          42
Biomassa                         7
Cogeração não renovável         11
Eólica                         139
Fotovoltaica                     6
Hídrica                         13
Não Atribuído                28680
Solar                      2153082
Name: Number of installations, dtype: int64


### 3.2 Data Distributions

### 3.3. Relationships Between Variables

### 3.4. Temporal and Seasonal Trends

### 3.5. Geographical Patterns

### 3.6. Correlation Analysis

## <a id="phase4"></a>Phase 4: Draw inference
## <a id="phase5"></a>Phase 5: Formulate conclusions
## <a id="phase6"></a>Phase 6: Look back and ahead
