I chose XGBoost as the prediction model for the half-hourly SMP prices for 2025 because it is one of the most powerful and reliable machine-learning algorithms for structured time-series data with complex patterns. Here are the main reasons:

- Excellent performance on nonlinear relationships

Electricity prices (SMP) are highly volatile and influenced by many nonlinear factors such as demand spikes, seasonal patterns, hour-of-day effects, and unexpected market changes.
XGBoost captures nonlinear interactions much better than classical statistical models like ARIMA or exponential smoothing.

-Handles many predictors easily

In this task, we can create many useful features:

hour of day

day of week

month / season

lag features (lag-1,lag-2, lag-48, lag-336, etc.)

rolling averages
XGBoost naturally handles large feature sets and finds the most important ones automatically.

-Robust to outliers and volatility

Electricity markets often show extreme price spikes.
XGBoost is much more robust to outliers than linear models, making it suitable for noisy SMP data

In [50]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

In [74]:
data = pd.read_excel("SMPprice.xlsx", sheet_name="SMP prices2")


In [75]:
print(data.columns)


Index(['Date', 'Time', 'SMP_NIS_per_MWh', 'SMP_NIS_per_KWh', 'DateTime',
       'Month', 'Weekday', 'ISWEEKEND', 'Holiday', 'DayType', 'Season', 'Hour',
       'Pattern', 'Pattern_key'],
      dtype='object')


In [76]:
data['DateTime'] = pd.to_datetime(
    data['Date'].astype(str) + ' ' + data['Time'].astype(str),
    dayfirst=True  #The date format was left in French style, with the day preceding the month (DD/MM/YYYY)
)

data.drop(columns=['Weekday', 'ISWEEKEND','DayType', 'Holiday', 'Hour', 'Pattern'], inplace=True)


I combined the date and time columns and created a uniform time column of type DateTime.
So that you can later extract the hour, day of the week, month, and perform sorting or time-based calculations.

In [77]:
data['hour'] = data['DateTime'].dt.hour
data['day_of_week'] = data['DateTime'].dt.dayofweek # day_of_week returns an integer from 0 to 6, 0 = Monday, 6 = Sunday
data['month'] = data['DateTime'].dt.month
data['is_weekend'] = data['day_of_week'].isin([5,6]).astype(int) #it marks Saturday (5) and Sunday (6) as weekend, which is why 5 and 6 are considered weekend days.

I extracted basic time features: hour of day, day of week, month, and a flag if it's a weekend.

Because electricity prices behave differently according to:

Time of day (peak/off)

Days of the week

Months of the year (seasons)

Weekends vs. weekdays

These features help the model understand seasonality.

In [78]:
data['half_hour_index'] = data['DateTime'].dt.hour * 2 + (data['DateTime'].dt.minute >= 30).astype(int)


In [79]:
data.head()

Unnamed: 0,Date,Time,SMP_NIS_per_MWh,SMP_NIS_per_KWh,DateTime,Month,Season,Pattern_key,hour,day_of_week,month,is_weekend,half_hour_index
0,01/01/2024,00:00:00,102.01,0.10201,2024-01-01 00:00:00,January,Winter,0|Regular weekdays|Winter,0,0,1,0,0
1,01/01/2024,00:30:00,100.57,0.10057,2024-01-01 00:30:00,January,Winter,"0,5|Regular weekdays|Winter",0,0,1,0,1
2,01/01/2024,01:00:00,100.58,0.10058,2024-01-01 01:00:00,January,Winter,1|Regular weekdays|Winter,1,0,1,0,2
3,01/01/2024,01:30:00,103.74,0.10374,2024-01-01 01:30:00,January,Winter,"1,5|Regular weekdays|Winter",1,0,1,0,3
4,01/01/2024,02:00:00,107.33,0.10733,2024-01-01 02:00:00,January,Winter,2|Regular weekdays|Winter,2,0,1,0,4


In [80]:
print(data.columns)


Index(['Date', 'Time', 'SMP_NIS_per_MWh', 'SMP_NIS_per_KWh', 'DateTime',
       'Month', 'Season', 'Pattern_key', 'hour', 'day_of_week', 'month',
       'is_weekend', 'half_hour_index'],
      dtype='object')


In [81]:
print(data.dtypes)


Date                       object
Time                       object
SMP_NIS_per_MWh           float64
SMP_NIS_per_KWh           float64
DateTime           datetime64[ns]
Month                      object
Season                     object
Pattern_key                object
hour                        int32
day_of_week                 int32
month                       int32
is_weekend                  int64
half_hour_index             int64
dtype: object


In [82]:
# Short lags
data['lag_1'] = data['SMP_NIS_per_MWh'].shift(1)       # 30 min Before
data['lag_2'] = data['SMP_NIS_per_MWh'].shift(2)       # 1h Before
data['lag_48'] = data['SMP_NIS_per_MWh'].shift(48)     # 1 day Before  (48 half hours)
data['lag_336'] = data['SMP_NIS_per_MWh'].shift(336)

# Short Rollings
data['rolling_2'] = data['SMP_NIS_per_MWh'].shift(1).rolling(2).mean()  # Average 1 hour
data['rolling_4'] = data['SMP_NIS_per_MWh'].shift(1).rolling(4).mean()
data['rolling_48'] = data['SMP_NIS_per_MWh'].shift(1).rolling(48).mean()    # Average over 1 day
data['rolling_336'] = data['SMP_NIS_per_MWh'].shift(1).rolling(336).mean()  #Average over 1 week(336 half-hours)

I created columns containing the SMP price before:

Half an hour

An hour

A day

A week

Models like XGBoost don't know what happened in the past on their own.
The lags give the model a "memory" of recent prices and allow it to understand repeat patterns.

I created afterward moving averages based on the latest values:

rolling_2 → hourly trend

rolling_4 → 2-hour trend

rolling_48 → daily trend

rolling_336 → weekly trend


Moving averages “smooth out” volatility and allow the model to identify short- and long-term trends

In [83]:
data.head(500)


Unnamed: 0,Date,Time,SMP_NIS_per_MWh,SMP_NIS_per_KWh,DateTime,Month,Season,Pattern_key,hour,day_of_week,...,is_weekend,half_hour_index,lag_1,lag_2,lag_48,lag_336,rolling_2,rolling_4,rolling_48,rolling_336
0,01/01/2024,00:00:00,102.01,0.10201,2024-01-01 00:00:00,January,Winter,0|Regular weekdays|Winter,0,0,...,0,0,,,,,,,,
1,01/01/2024,00:30:00,100.57,0.10057,2024-01-01 00:30:00,January,Winter,"0,5|Regular weekdays|Winter",0,0,...,0,1,102.01,,,,,,,
2,01/01/2024,01:00:00,100.58,0.10058,2024-01-01 01:00:00,January,Winter,1|Regular weekdays|Winter,1,0,...,0,2,100.57,102.01,,,101.290,,,
3,01/01/2024,01:30:00,103.74,0.10374,2024-01-01 01:30:00,January,Winter,"1,5|Regular weekdays|Winter",1,0,...,0,3,100.58,100.57,,,100.575,,,
4,01/01/2024,02:00:00,107.33,0.10733,2024-01-01 02:00:00,January,Winter,2|Regular weekdays|Winter,2,0,...,0,4,103.74,100.58,,,102.160,101.7250,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,11/01/2024,07:30:00,124.67,0.12467,2024-01-11 07:30:00,January,Winter,"7,5|Regular weekdays|Winter",7,3,...,0,15,163.23,162.00,162.00,103.30,162.615,141.0575,135.795625,128.577589
496,11/01/2024,08:00:00,124.67,0.12467,2024-01-11 08:00:00,January,Winter,8|Regular weekdays|Winter,8,3,...,0,16,124.67,163.23,115.85,98.97,143.950,143.6425,135.017917,128.641190
497,11/01/2024,08:30:00,165.71,0.16571,2024-01-11 08:30:00,January,Winter,"8,5|Regular weekdays|Winter",8,3,...,0,17,124.67,124.67,111.45,91.75,124.670,143.6425,135.201667,128.717679
498,11/01/2024,09:00:00,124.67,0.12467,2024-01-11 09:00:00,January,Winter,9|Regular weekdays|Winter,9,3,...,0,18,165.71,124.67,106.90,91.75,145.190,144.5700,136.332083,128.937798


In [84]:
data.dropna()

Unnamed: 0,Date,Time,SMP_NIS_per_MWh,SMP_NIS_per_KWh,DateTime,Month,Season,Pattern_key,hour,day_of_week,...,is_weekend,half_hour_index,lag_1,lag_2,lag_48,lag_336,rolling_2,rolling_4,rolling_48,rolling_336
336,08/01/2024,00:00:00,103.19,0.10319,2024-01-08 00:00:00,January,Winter,0|Regular weekdays|Winter,0,0,...,0,0,113.92,124.67,90.00,102.01,119.295,131.7650,131.462708,127.043452
337,08/01/2024,00:30:00,107.24,0.10724,2024-01-08 00:30:00,January,Winter,"0,5|Regular weekdays|Winter",0,0,...,0,1,103.19,113.92,114.10,100.57,108.555,118.2575,131.737500,127.046964
338,08/01/2024,01:00:00,103.26,0.10326,2024-01-08 01:00:00,January,Winter,1|Regular weekdays|Winter,1,0,...,0,2,107.24,103.19,110.49,100.58,105.215,112.2550,131.594583,127.066815
339,08/01/2024,01:30:00,97.44,0.09744,2024-01-08 01:30:00,January,Winter,"1,5|Regular weekdays|Winter",1,0,...,0,3,103.26,107.24,112.22,103.74,105.250,106.9025,131.443958,127.074792
340,08/01/2024,02:00:00,95.00,0.09500,2024-01-08 02:00:00,January,Winter,2|Regular weekdays|Winter,2,0,...,0,4,97.44,103.26,109.77,107.33,100.350,102.7825,131.136042,127.056042
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17561,31/12/2024,21:30:00,230.00,0.23000,2024-12-31 21:30:00,December,Winter,"21,5|Regular weekdays|Winter",21,1,...,0,43,230.00,312.18,200.00,148.12,271.090,291.6350,177.044792,138.919107
17562,31/12/2024,22:00:00,200.87,0.20087,2024-12-31 22:00:00,December,Winter,22|Regular weekdays|Winter,22,1,...,0,44,230.00,230.00,193.00,147.48,230.000,271.0900,177.669792,139.162798
17563,31/12/2024,22:30:00,193.63,0.19363,2024-12-31 22:30:00,December,Winter,"22,5|Regular weekdays|Winter",22,1,...,0,45,200.87,230.00,191.78,147.59,215.435,243.2625,177.833750,139.321696
17564,31/12/2024,23:00:00,189.71,0.18971,2024-12-31 23:00:00,December,Winter,23|Regular weekdays|Winter,23,1,...,0,46,193.63,200.87,160.00,120.87,197.250,213.6250,177.872292,139.458720


In [85]:
print(data.shape)


(17566, 21)


In [86]:
X = data[['half_hour_index','day_of_week','month','is_weekend', # Finally I did not include 'hour' because half_hour_index already captures the hour and distinguishes each half-hour.
          'lag_1','lag_2','lag_48','lag_336',
          'rolling_2','rolling_4','rolling_48','rolling_336']]
y = data['SMP_NIS_per_MWh']

I selected all the important features and defined the predictor variable (y).

So that the model gets a combination of:

Time

Past memory

Trends

This allows it to learn patterns in SMP prices

In [87]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)


I divided the data into 80% training and 20% testing, without shuffling (shuffle=False).

Because in time series, shuffling is not allowed — time sequence must be maintained.

In [88]:
from xgboost import XGBRegressor

model = XGBRegressor(
    n_estimators=500,
    learning_rate=0.05,
    max_depth=5,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42
)

model.fit(X_train, y_train)

I trained an XGBoost model on 2024 data.

This is a particularly powerful model for time series with features from the past.

XGBRegressor: This is a regression model based on XGBoost, suitable for predicting continuous values (here, for example, the SMP price).

n_estimators=500: the number of trees the model will build. More trees can improve accuracy, but increase computation time.

learning_rate=0.05: the size of the “steps” at each iteration. A small learning rate makes training more stable but slower.

max_depth=5: maximum depth of each tree. Greater depth allows the model to capture more complex relationships, but increases the risk of overfitting.

subsample=0.8: fraction of samples used to build each tree. Helps reduce overfitting.

colsample_bytree=0.8: fraction of features used to build each tree. Also helps regularization and reduces overfitting.

random_state=42: ensures reproducible results.

In [89]:
from sklearn.metrics import mean_squared_error
import numpy as np

y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"RMSE: {rmse:.4f}")

RMSE: 19.0311


I calculated the RMSE to measure the accuracy of the forecast.

This is the standard measure in time series.

In [90]:
dates_2025 = pd.date_range(start="2025-01-01 00:00", end="2025-12-31 23:30", freq="30min")
df_2025 = pd.DataFrame({'DateTime': dates_2025})

Building all half-hours of 2025.

To provide continuous year-round forecasting.

In [91]:
df_2025['half_hour_index'] = df_2025['DateTime'].dt.hour * 2 + (df_2025['DateTime'].dt.minute >= 30).astype(int)
df_2025['hour'] = df_2025['DateTime'].dt.hour

df_2025['day_of_week'] = df_2025['DateTime'].dt.dayofweek
df_2025['month'] = df_2025['DateTime'].dt.month
df_2025['is_weekend'] = df_2025['day_of_week'].isin([5,6]).astype(int)

In [92]:
import numpy as np

# Retrieve the latest values ​​needed for lags and rolling
last_2024 = data.tail(336).copy()  # pour lag_336 et rolling_336

# Create empty columns for 2025
for col in ['lag_1','lag_2','lag_48','lag_336','rolling_2','rolling_4','rolling_48','rolling_336']:
    df_2025[col] = np.nan

In [93]:
# Convert to list for quick access
prev_values = list(data['SMP_NIS_per_MWh'].tail(336))

for i in range(len(df_2025)):
    # lag_1, lag_2
    df_2025.loc[i, 'lag_1'] = prev_values[-1]
    df_2025.loc[i, 'lag_2'] = prev_values[-2]

    # lag_48 and lag_336
    df_2025.loc[i, 'lag_48'] = prev_values[-48]
    df_2025.loc[i, 'lag_336'] = prev_values[-336]

    # rolling_2, rolling_4, rolling_48, rolling_336
    df_2025.loc[i, 'rolling_2'] = np.mean(prev_values[-2:])
    df_2025.loc[i, 'rolling_4'] = np.mean(prev_values[-4:])
    df_2025.loc[i, 'rolling_48'] = np.mean(prev_values[-48:])
    df_2025.loc[i, 'rolling_336'] = np.mean(prev_values[-336:])

    # predict SMP for this line
    features = df_2025.loc[i, ['half_hour_index','day_of_week','month','is_weekend',
                           'lag_1','lag_2','lag_48','lag_336',
                           'rolling_2','rolling_4','rolling_48','rolling_336']].values.reshape(1,-1)

    pred = model.predict(features)[0]
    df_2025.loc[i, 'Predicted_SMP'] = pred

    # Add the prediction to prev_values ​​for future iterations
    prev_values.append(pred)

In the loop I used each prediction to create the next lag.

Because in 2025 there are no real past values ​​— only what the model predicts.

This is an autoregressive forecasting method.

In [94]:
df_2025['semester'] = df_2025['DateTime'].dt.month.apply(lambda x: 1 if x <= 6 else 2)


I divided the year into semesters and calculated the averages.


That's  what the question asked.

In [95]:
semester_avg = df_2025.groupby('semester')['Predicted_SMP'].mean().reset_index()
semester_avg.rename(columns={'Predicted_SMP':'Average_SMP'}, inplace=True)
print(semester_avg)

   semester  Average_SMP
0         1   163.644531
1         2   176.019226


In [96]:
semester_avg.to_excel("Predicted_SMP_2025_Semester.xlsx", index=False)


**Summary:**

I built time features to capture seasonality.

I added lags to give the model memory of past values.

I added rolling to capture short and long-term trends.

I trained an XGBoost model on the 2024 data.

I applied autoregressive forecasting to all of 2025.

I produced a semi-annual average — as requested in the question.


**Additional data to consider:**

Temperature data: Hourly or daily temperatures, as electricity demand is strongly influenced by weather conditions (heating/cooling needs).

Electricity consumption data: Historical and real-time consumption patterns to capture demand fluctuations.

Solar and wind production data: Generation from renewable sources, since these directly affect supply availability.

Gas prices: Market prices for natural gas, which impact marginal production costs and therefore electricity prices.

Special days / events: Holidays, public events, or abnormal days that can significantly alter typical demand patterns.

Grid/network data: Information on network constraints, outages, or transmission capacity that could affect supply-demand balance and prices.

Including these external variables would allow the model to better capture the real-world drivers of electricity price and consumption variations.