# Obtaining the data



## 1. Fetching the IDs

I have managed to get an API key to ESIOS, the platform that holds the data from the electrical grid in Spain. Now I need to identify which IDs provide the data that I'm looking for. Which is total demand, total generation, and generation by each type of source.*texto en cursiva*

In [1]:
import requests
import pandas as pd
from datetime import datetime, timedelta

# 1. Setup
ESIOS_TOKEN = "f1718d2123caf94384b20e9ed9aeee23c02573a0769603600fe9c2d9c3853b71"
headers = {
    "Accept": "application/json; application/vnd.esios-api-v1+json",
    "Content-Type": "application/json",
    "x-api-key": ESIOS_TOKEN
}

# 2. Define Test Date (Yesterday)
yesterday = datetime.now() - timedelta(days=1)
start_date = yesterday.replace(hour=0, minute=0, second=0).strftime('%Y-%m-%dT%H:%M:%S')
end_date = yesterday.replace(hour=23, minute=59, second=59).strftime('%Y-%m-%dT%H:%M:%S')

print(f"üîé Exhaustive Search for Solar/Wind Indicators (Testing date: {yesterday.strftime('%Y-%m-%d')})...")

# 3. Fetch All Indicators
url = "https://api.esios.ree.es/indicators"
try:
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    all_indicators = response.json()['indicators']
    print(f"   ‚úÖ Fetched {len(all_indicators)} total indicators.")
except Exception as e:
    print(f"   ‚ùå Failed to fetch indicator list: {e}")
    all_indicators = []

# 4. Filter Candidates
keywords = ["solar", "fotovoltaica", "e√≥lica", "eolica", "wind"]
candidates = []

for ind in all_indicators:
    name_lower = ind['name'].lower()
    if any(k in name_lower for k in keywords):
        candidates.append(ind)

print(f"   üëâ Found {len(candidates)} candidates matching keywords.")

# 5. Validate Candidates
valid_indicators = []

print("\n‚ö° Verifying data availability for candidates (this may take a moment)...")
for ind in candidates:
    ind_id = ind['id']
    name = ind['name']

    url_ind = f"https://api.esios.ree.es/indicators/{ind_id}"
    params = {
        "start_date": start_date,
        "end_date": end_date
        # Intentionally omitting geo_ids to cast a wide net
    }

    try:
        r = requests.get(url_ind, headers=headers, params=params)
        if r.status_code == 200:
            data = r.json()
            values = data['indicator']['values']
            if values:
                count = len(values)
                print(f"   ‚úÖ [WORKING] ID {ind_id}: {name} ({count} values)")
                valid_indicators.append({"id": ind_id, "name": name, "count": count})
    except Exception:
        pass

# 6. Summary
print("\nüèÜ Final Verified IDs:")
if valid_indicators:
    df_valid = pd.DataFrame(valid_indicators)
    print(df_valid[['id', 'name', 'count']].to_string(index=False))
else:
    print("No working indicators found.")

üîé Exhaustive Search for Solar/Wind Indicators (Testing date: 2025-12-09)...
   ‚úÖ Fetched 1988 total indicators.
   üëâ Found 111 candidates matching keywords.

‚ö° Verifying data availability for candidates (this may take a moment)...
   ‚úÖ [WORKING] ID 12: Generaci√≥n programada PBF E√≥lica terrestre (24 values)
   ‚úÖ [WORKING] ID 13: Generaci√≥n programada PBF E√≥lica marina (24 values)
   ‚úÖ [WORKING] ID 14: Generaci√≥n programada PBF Solar fotovoltaica (24 values)
   ‚úÖ [WORKING] ID 15: Generaci√≥n programada PBF Solar t√©rmica (24 values)
   ‚úÖ [WORKING] ID 47: Generaci√≥n programada PVP E√≥lica terrestre (24 values)
   ‚úÖ [WORKING] ID 48: Generaci√≥n programada PVP E√≥lica marina (24 values)
   ‚úÖ [WORKING] ID 49: Generaci√≥n programada PVP Solar fotovoltaica (24 values)
   ‚úÖ [WORKING] ID 50: Generaci√≥n programada PVP Solar t√©rmica (24 values)
   ‚úÖ [WORKING] ID 82: Generaci√≥n programada P48 E√≥lica terrestre (96 values)
   ‚úÖ [WORKING] ID 83: Generaci√≥n prog

## Result - IDs

#### üîß Indicators Used & Data Status
The following ESIOS indicators were identified and used for the 7-day data collection period:

| Technology | ID Used | Indicator Name | Status |
| :--- | :--- | :--- | :--- |
| **Solar PV** | 1295 | Generaci√≥n T.Real Solar fotovoltaica | ‚úÖ Success |
| **Solar Thermal** | 1294 | Generaci√≥n T.Real Solar t√©rmica | ‚úÖ Success |
| **Wind** | 551 | Generaci√≥n T.Real e√≥lica | ‚úÖ Success |
| **Nuclear** | 549 | Generaci√≥n T.Real nuclear | ‚úÖ Success |
| **Hydro** | 546 | Generaci√≥n T.Real hidr√°ulica | ‚úÖ Success |
| **Coal** | 547 | Generaci√≥n T.Real carb√≥n | ‚úÖ Success |
| **Demand** | 460 | Previsi√≥n diaria de la demanda el√©ctrica peninsular | ‚úÖ Success (Forecast) |

#### üìù Notes
*   **Demand:** The search algorithm selected ID 460 (Daily Forecast) based on keyword matching. For future real-time analysis, ID 1293 is recommended.
*   **Gas Combined Cycle:** ID 1746 returned no values for the requested period. This may indicate a lack of generation, a data reporting gap, or the need for an alternative geographic indicator (e.g., National vs. Peninsula).

## Downloading the data and saving it to a csv file

Now I know what IDs I need to use to obtain the data. The API does have some strong restrictions however, to get the data I must ask for a limited amount, so I'm going to make a loop where I keep asking for small amounts and then merge all of it

In [3]:
import requests
import pandas as pd
from datetime import datetime, timedelta
import time
import os
from google.colab import drive

drive.mount('/content/drive')

# Define IDs globally to avoid resolving them in every loop iteration
# IDs based on previous analysis and ESIOS documentation
ESIOS_INDICATORS = {
    "Total Generation": 10004,
    "Solar PV": 1295,
    "Solar Thermal": 1294,
    "Wind": 551,
    "Nuclear": 549,
    "Hydro": 546,
    "Coal": 547,
    "Gas Combined Cycle": 1746,
    "Demand": 1293  # Demanda real
}

def fetch_esios_data(start_date, end_date, output_folder="/content/drive/MyDrive/Deep Learning/Project 2/Spain"):
    """
    Fetches energy generation data from ESIOS for a given date range.

    Args:
        start_date (str or datetime): Start date (e.g., '2025-11-17').
        end_date (str or datetime): End date (e.g., '2025-11-27').
        output_folder (str): Directory to save the CSV file.
    """

    # 1. Configuration
    ESIOS_TOKEN = "f1718d2123caf94384b20e9ed9aeee23c02573a0769603600fe9c2d9c3853b71"
    headers = {
        "Accept": "application/json; application/vnd.esios-api-v1+json",
        "Content-Type": "application/json",
        "x-api-key": ESIOS_TOKEN
    }

    # Helper to format dates for API
    def format_date(d):
        if isinstance(d, str):
            try:
                return pd.to_datetime(d).strftime('%Y-%m-%dT%H:%M:%S')
            except:
                return d
        return d.strftime('%Y-%m-%dT%H:%M:%S')

    api_start = format_date(start_date)
    api_end = format_date(end_date)

    print(f"\n‚åö Processing Data from {api_start} to {api_end}...")

    # Use pre-defined IDs
    final_ids = ESIOS_INDICATORS

    # 2. Download Data
    print(f"\nüì• Downloading data for: {list(final_ids.keys())}")
    dfs = []

    for tech, ind_id in final_ids.items():
        url = f"https://api.esios.ree.es/indicators/{ind_id}"
        params = {
            "start_date": api_start,
            "end_date": api_end,
            "geo_ids[]": 8741 # Peninsula filter
        }

        try:
            time.sleep(0.2) # Simple rate limit
            r = requests.get(url, headers=headers, params=params)
            if r.status_code == 200:
                data = r.json()
                values = data.get('indicator', {}).get('values', [])
                if values:
                    df = pd.DataFrame(values)
                    df['datetime'] = pd.to_datetime(df['datetime'], utc=True)
                    df.set_index('datetime', inplace=True)
                    df = df[['value']].rename(columns={'value': tech})
                    # Resample to 10min
                    df = df.resample('10min').mean()
                    dfs.append(df)
                    print(f"   ‚úî {tech} (ID {ind_id}): {len(df)} rows")
                else:
                    print(f"   ‚ö†Ô∏è {tech} (ID {ind_id}): No data returned")
            else:
                print(f"   ‚ùå {tech} (ID {ind_id}): API Status {r.status_code}")

        except Exception as e:
            print(f"   ‚ùå {tech} (ID {ind_id}): Error {e}")

    # 3. Merge and Save
    if dfs:
        final_df = pd.concat(dfs, axis=1)
        final_df.sort_index(inplace=True)

        # Ensure output directory exists
        if output_folder and not os.path.exists(output_folder):
            try:
                os.makedirs(output_folder)
            except:
                pass

        # Construct filename
        fname_start = api_start.replace(':', '-')
        fname_end = api_end.replace(':', '-')
        filename = f"spain_energy_generation_from_{fname_start}_to_{fname_end}.csv"

        if output_folder:
            full_path = os.path.join(output_folder, filename)
        else:
            full_path = filename

        try:
            final_df.to_csv(full_path)
            print(f"\nüíæ Saved successfully to {full_path}")
            return final_df
        except Exception as e:
            print(f"\n‚ö†Ô∏è Could not save to {full_path} ({e}). Saving to local directory.")
            final_df.to_csv(filename)
            return final_df
    else:
        print("\n‚ùå No data collected.")
        return None

WINDOW = 10  # days per window

end_dt = datetime.now().replace(hour=23, minute=59, second=59, microsecond=0) - timedelta(days=1)

for runs in range(0, 5): # In the real implementation I did a 100 but this is just to show how it works
    start_dt = end_dt - timedelta(days=WINDOW)

    start_date = start_dt.strftime('%Y-%m-%dT%H:%M:%S')
    end_date = end_dt.strftime('%Y-%m-%dT%H:%M:%S')

    fetch_esios_data(start_date, end_date)

    # next window: move end_dt back by 10 days
    end_dt = start_dt

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

‚åö Processing Data from 2025-11-29T23:59:59 to 2025-12-09T23:59:59...

üì• Downloading data for: ['Total Generation', 'Solar PV', 'Solar Thermal', 'Wind', 'Nuclear', 'Hydro', 'Coal', 'Gas Combined Cycle', 'Demand']
   ‚úî Total Generation (ID 10004): 1440 rows
   ‚úî Solar PV (ID 1295): 1440 rows
   ‚úî Solar Thermal (ID 1294): 1440 rows
   ‚úî Wind (ID 551): 1440 rows
   ‚úî Nuclear (ID 549): 1440 rows
   ‚úî Hydro (ID 546): 1440 rows
   ‚úî Coal (ID 547): 1440 rows
   ‚ö†Ô∏è Gas Combined Cycle (ID 1746): No data returned
   ‚úî Demand (ID 1293): 1440 rows

üíæ Saved successfully to /content/drive/MyDrive/Deep Learning/Project 2/Spain/spain_energy_generation_from_2025-11-29T23-59-59_to_2025-12-09T23-59-59.csv

‚åö Processing Data from 2025-11-19T23:59:59 to 2025-11-29T23:59:59...

üì• Downloading data for: ['Total Generation', 'Solar PV', 'Solar Thermal