# Prosumer Energy Prediction

Il termine "prosumer" è un portmanteau che combina le parole "producer" (produttore) e "consumer" (consumatore). Il concetto di prosumer si riferisce a individui o entità che non sono soltanto consumatori di prodotti e servizi, ma anche produttori degli stessi. Questo fenomeno è cresciuto considerevolmente con l'avvento di Internet e delle tecnologie digitali, che hanno reso più facile per le persone non solo consumare contenuti, ma anche crearli e condividerli.

Nel contesto dell'energia, il termine "prosumer" assume una particolare rilevanza con riferimento a quelle persone o aziende che producono energia—tipicamente da fonti rinnovabili come il solare o l'eolico—per il proprio uso e vendono l'eccesso alla rete elettrica. Questo è diventato più accessibile grazie alla diminuzione dei costi delle tecnologie rinnovabili e alla maggiore disponibilità di opzioni per il finanziamento. I prosumer in questo settore sono quindi attori chiave nella transizione verso sistemi energetici più distribuiti e sostenibili, contribuendo alla riduzione delle emissioni di gas serra e alla promozione di un'economia circolare.

In un contesto sociale e economico più ampio, il concetto di prosumer rappresenta un cambio di paradigma rispetto al modello tradizionale di produzione e consumo. Invece di essere passivi consumatori, i prosumer sono attivamente coinvolti nel processo di creazione di valore, che può portare a nuovi modelli di business, a una maggiore personalizzazione dei prodotti e servizi e a una maggiore enfasi sulla sostenibilità.

## Import LIBs and Data

In [17]:
import pandas as pd
import numpy as np

# Visualization Libraries 📊
# ------------------------------
import seaborn as sns
import matplotlib.pyplot as plt
import missingno as msno

# Customize to Remove Warnings and Better Observation 🔧
# --------------------------------------------------------
from termcolor import colored
import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 300)
pd.set_option('display.max_rows', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [3]:
train = pd.read_csv('C:/Users/giaco/Desktop/repos/prosumer-energy-prediction/data/train.csv')
test = pd.read_csv('C:/Users/giaco/Desktop/repos/prosumer-energy-prediction/data/example_test_files/test.csv')
submission = pd.read_csv('C:/Users/giaco/Desktop/repos/prosumer-energy-prediction/data/example_test_files/sample_submission.csv')

## Data Exploration

In [8]:
# TRAIN DATAFRAME

print("The shape of the TRAIN df is:")
print(train.shape)
train.head()

The shape of the TRAIN df is:
(2018352, 9)


Unnamed: 0,county,is_business,product_type,target,is_consumption,datetime,data_block_id,row_id,prediction_unit_id
0,0,0,1,0.713,0,2021-09-01 00:00:00,0,0,0
1,0,0,1,96.59,1,2021-09-01 00:00:00,0,1,0
2,0,0,2,0.0,0,2021-09-01 00:00:00,0,2,1
3,0,0,2,17.314,1,2021-09-01 00:00:00,0,3,1
4,0,0,3,2.904,0,2021-09-01 00:00:00,0,4,2


- `county` - An ID code for the county (contea).
- `is_business` - Boolean for whether or not the prosumer is a business.
- `product_type` - ID code with the following mapping of codes to contract types: {0: "Combined", 1: "Fixed", 2: "General service", 3: "Spot"}.
- `target` - The consumption or production amount for the relevant segment for the hour. The segments are defined by the county, is_business, and product_type.
- `is_consumption` - Boolean for whether or not this row's target is consumption or production.
- `datetime` - The Estonian time in EET (UTC+2) / EEST (UTC+3).
- `data_block_id` - All rows sharing the same data_block_id will be available at the same forecast time. This is a function of what information is available when forecasts are actually made, at 11 AM each morning. For example, if the forecast weather data_block_id for predictins made on October 31st is 100 then the historic weather data_block_id for October 31st will be 101 as the historic weather data is only actually available the next day.
- `row_id` - A unique identifier for the row.
- `prediction_unit_id` - A unique identifier for the county, is_business, and product_type combination. New prediction units can appear or dissappear in the test set.

In [12]:
# TEST DATAFRAME

print("The shape of the TEST df is:")
print(test.shape)
test.head()

The shape of the TEST df is:
(12480, 8)


Unnamed: 0,county,is_business,product_type,is_consumption,prediction_datetime,data_block_id,row_id,prediction_unit_id
0,0,0,1,0,2023-05-28 00:00:00,634,2005872,0
1,0,0,1,1,2023-05-28 00:00:00,634,2005873,0
2,0,0,2,0,2023-05-28 00:00:00,634,2005874,1
3,0,0,2,1,2023-05-28 00:00:00,634,2005875,1
4,0,0,3,0,2023-05-28 00:00:00,634,2005876,2


In [16]:
# SUBMISSION DATAFRAME

print(submission.shape)
submission.head()

(12480, 3)


Unnamed: 0,row_id,data_block_id,target
0,2005872,634,0
1,2005873,634,0
2,2005874,634,0
3,2005875,634,0
4,2005876,634,0


## EDA

*We PROduce and conSUME our output, we are prosuming. Prosuming purpose is to Share.*

Currently, in most countries, energy systems are transitioning from a centralized to a distributed model, and in the long-term, they are foreseen as smart and connected models using advanced emerging information technologies, called the energy Internet (EI). This change of model was already envisioned, proving that any electric power system, irrespective of size, can be modelled as a prosumer, defined as an economically motivated entity that consumes, produces, and stores power; operates a power grid; and optimizes the economic decisions regarding its energy utilization.


In [25]:
def check_df(dataframe):
    print("----------------- SHAPE -----------------")
    print(dataframe.shape)
    print("----------------- INFO -----------------")
    print(dataframe.info())
    print("----------------- NA -----------------")
    print(dataframe.isnull().sum())
    
check_df(train)

----------------- SHAPE -----------------
(2018352, 9)
----------------- INFO -----------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2018352 entries, 0 to 2018351
Data columns (total 9 columns):
 #   Column              Dtype  
---  ------              -----  
 0   county              int64  
 1   is_business         int64  
 2   product_type        int64  
 3   target              float64
 4   is_consumption      int64  
 5   datetime            object 
 6   data_block_id       int64  
 7   row_id              int64  
 8   prediction_unit_id  int64  
dtypes: float64(1), int64(7), object(1)
memory usage: 138.6+ MB
None
----------------- NA -----------------
county                  0
is_business             0
product_type            0
target                528
is_consumption          0
datetime                0
data_block_id           0
row_id                  0
prediction_unit_id      0
dtype: int64
