### API Analysis

![alt text](./images/image.png "Title")
This configuration shows the hourly day-ahead (price of energy until the same time tomorrow) for the last two weeks.
When checking the network traffic for the above dates and for the hourly resolution, you will find three .json files being fetched from the API.

A request to the api has the following structure:
https://www.smard.de/app/chart_data/4169/DE/4169_DE_hour_[timestamp_in_milliseconds].json

The following request fetch data for the corresponding time frames.

https://www.smard.de/app/chart_data/4169/DE/4169_DE_hour_1729461600000.json:
Sunday, 6 October 2024 22:00:00 -> Sunday, 13 October 2024 21:00:00

https://www.smard.de/app/chart_data/4169/DE/4169_DE_hour_1728856800000.json:
Sunday, 13 October 2024 22:00:00 -> Sunday, 20 October 2024 21:00:00

https://www.smard.de/app/chart_data/4169/DE/4169_DE_hour_1729461600000.json
Sunday, 20 October 2024 22:00:00 -> Sunday, 27 October 2024 22:00:00


You will find that for example the timestamp 1729461600000 maps to the initial date Sunday, 6 October 2024 22:00:00 and every file contains the date for one week. Interestingly enough the site only shows the data for two weeks even though it had to fetch the data for three entire weeks. If the above links are broken, it may be due to a shift in daylight savings time (DST) which we will have to take into account.

Additionally you will see that each .json file contains around 172 (more or less) time series entries for an entire week.



### Implementing the scraper
We now want to implement a scraper that fetches the hourly energy prices for n amount of days. With the above information we now know that we'll have to find the corresponding timestamps for each week and to fetch the data.

In [2]:
import requests
import numpy as np
import logging
from datetime import datetime, timedelta, timezone
import pytz
import time
from pprint import pprint

In [3]:
logging.basicConfig(level=logging.INFO) 

logger = logging.getLogger("scraper_logger")

# console_handler = logging.StreamHandler()
file_handler = logging.FileHandler("app.log")

# console_handler.setLevel(logging.WARNING)
file_handler.setLevel(logging.WARNING) 

# logger.addHandler(console_handler)
logger.addHandler(file_handler)

In [4]:
def scrape(url, delay):
    response =  requests.get(url)
    response.raise_for_status()

    time.sleep(delay)
    return response

In [5]:
from datetime import datetime, timedelta
import pytz

# Define Berlin timezone
tz_berlin = pytz.timezone("Europe/Berlin")

# Calculate last Monday in Berlin time, taking into account local DST
now = datetime.now(tz_berlin)
days_since_monday = now.weekday()
last_monday_berlin = now - timedelta(days=days_since_monday)
last_monday_berlin = last_monday_berlin.replace(hour=0, minute=0, second=0, microsecond=0)

# Convert Berlin time to UTC and get the timestamp in milliseconds
last_monday_utc = last_monday_berlin.astimezone(pytz.UTC)
last_monday_utc_ms = int(last_monday_utc.timestamp() * 1000)

print("Berlin time (local):", last_monday_berlin)
print("UTC time:", last_monday_utc)
print("UTC timestamp (ms):", last_monday_utc_ms)


Berlin time (local): 2024-10-28 00:00:00+01:00
UTC time: 2024-10-27 23:00:00+00:00
UTC timestamp (ms): 1730070000000


In [None]:
# Define constants
week_in_ms = 24 * 60 * 60 * 1000 * 7
delay = 0.5  # seconds
n = 500  # number of weeks
base_url = "https://www.smard.de/app/chart_data/4169/DE/4169_DE_hour_{}.json"
energy_ts_data = []


for k in range(n):
    last_monday_berlin = last_monday_utc.astimezone(tz_berlin)
    last_monday_utc = last_monday_berlin.astimezone(pytz.UTC)
    last_monday_utc_ms = int(last_monday_utc.timestamp() * 1000)

    # Adjust timestamp for daylight savings time (berlin tz) if necessary       
    if last_monday_berlin.dst() != timedelta(0):  # DST is in effect
        last_monday_utc_ms -= 60 * 60 * 1000
    

    try:
        response = scrape(base_url.format(last_monday_utc_ms), delay)
        logging.info(f"Successfully scraped data for ts: {last_monday_berlin} (Europe/Berlin)")
        json_data = response.json()
    except requests.exceptions.HTTPError as http_err:
        logging.warning(f"Failed to scrape data for timestamp: {last_monday_utc} (UTC)\n\tError: {http_err}")
        continue
    except requests.exceptions.JSONDecodeError as decoder_error:
        logging.warning(f"Failed to deserialize JSON: \n\tError: {decoder_error}")
        continue
    
    parsed_json = dict(json_data)
    energy_ts_data_week = []

    for ts, price in parsed_json["series"]:
        try:
            price_float = float(price)
            ts_datetime_berlin = datetime.fromtimestamp(ts / 1000, tz=tz_berlin).__str__()
        except TypeError as e:
            logging.warning(f"Failed to parse non-float value for timestamp {ts_datetime_berlin} (Europe/Berlin)\n\tError: {e}")
            continue

        energy_ts_data_week.append((ts_datetime_berlin, price_float))
    energy_ts_data_week.extend(energy_ts_data)
    energy_ts_data = energy_ts_data_week
    
    # Move to the previous week
    last_monday_utc = last_monday_utc - timedelta(weeks=1)

# Convert the list of tuples to a numpy array
data = np.array(energy_ts_data)[::-1]

print(data.shape)


In [None]:
data = np.vstack((["Datetime", "hourly day-ahead energy price"], data))
np.savetxt("./day_ahead_energy_prices.csv", data, delimiter=",", fmt="%s")

- Weather:
-- wind
-- sun 
-- temp

- per day energy mix
- gas price per day
- 

In [None]:
start_date = datetime.now()
end_date = datetime(2018, 9, 30)
delta = timedelta(days=1)
delay = 0.2

# end_date = start_date - (10 * delta)

base_url = "https://www.energy-charts.info/charts/energy_pie/data/de/day_pie_{}.json"

current_date = start_date
res = []
while current_date >= end_date:
    try:
        cd_format = current_date.strftime("%Y_%m_%d")
        response = scrape(base_url.format(cd_format), delay)

        logging.info(f"Successfully scraped data for date: {cd_format}")
        res.append((cd_format, response.json()))
    except requests.exceptions.HTTPError as http_err:
        logging.warning(f"Failed to scrape data for date: {cd_format} (UTC)\n\tError: {http_err}")
    except requests.exceptions.JSONDecodeError as decoder_error:
        logging.warning(f"Failed to deserialize JSON: \n\tError: {decoder_error}")
    current_date -= delta


print(len(res))


INFO:root:Successfully scraped data for date: 2024_10_31
INFO:root:Successfully scraped data for date: 2024_10_30
INFO:root:Successfully scraped data for date: 2024_10_29
INFO:root:Successfully scraped data for date: 2024_10_28
INFO:root:Successfully scraped data for date: 2024_10_27
INFO:root:Successfully scraped data for date: 2024_10_26
INFO:root:Successfully scraped data for date: 2024_10_25
INFO:root:Successfully scraped data for date: 2024_10_24
INFO:root:Successfully scraped data for date: 2024_10_23
INFO:root:Successfully scraped data for date: 2024_10_22
INFO:root:Successfully scraped data for date: 2024_10_21
INFO:root:Successfully scraped data for date: 2024_10_20
INFO:root:Successfully scraped data for date: 2024_10_19
INFO:root:Successfully scraped data for date: 2024_10_18
INFO:root:Successfully scraped data for date: 2024_10_17
INFO:root:Successfully scraped data for date: 2024_10_16
INFO:root:Successfully scraped data for date: 2024_10_15
INFO:root:Successfully scraped 

2224


### Energy Mix Scraper

In [19]:
exclude_cross_boarder_e_trading = True
cbet = "Cross border electricity trading"

dtype = [('date', 'U50'), ('e_component', 'U50'), ('value', 'float32')]

# Initialize an empty structured array
array = np.empty(0, dtype=dtype)

for date, data in res:
    sources = []
    for e_source in data:
        name = str(e_source["name"]["en"])

        if exclude_cross_boarder_e_trading and name == cbet:
            continue

        # Ensure numeric conversion or default to 0
        try:
            y_value = float(e_source["y"])
        except (ValueError, TypeError):
            continue
        
        sources.append((date, name, y_value))
    
    # Convert to a structured array with the correct dtype
    arr = np.array(sources, dtype=dtype)
    
    # Normalize the 'value' column
    arr['value'] /= np.sum(arr['value'], axis=0)

    # Append to the main array
    array = np.append(array, arr)

np.savetxt("./daily_market_mix.csv", array, delimiter=",", fmt="%s")
array

array([('2024_10_31', 'Waste renewable', 0.01161428),
       ('2024_10_31', 'Hydro Run-of-River', 0.04549553),
       ('2024_10_31', 'Hydro water reservoir', 0.00522868), ...,
       ('2018_09_30', 'Fossil gas', 0.04589883),
       ('2018_09_30', 'Others', 0.00584625),
       ('2018_09_30', 'Waste non-renewable', 0.0163848 )],
      dtype=[('date', '<U50'), ('e_component', '<U50'), ('value', '<f4')])