# Validation of the PyPSA-Earth stats

## Description
This task aims to develop such notebook that:
- takes as input the files from folders from pypsa-earth: `results/{scenarios}/stats.csv` (see PR Create statistics #579). In the meantime, data is loaded from `notebooks/validation/temp_stats_csv/stats_merged_20_3_23.csv`
- loads open data on power systems across the world
- Creates plots to perform the validation
Plots and tables shall have different aggregation levels (e.g. demand for a continent)

Create statistics for:
- demand (See `demand_validation.ipynb`)
- installed capacity by technology (compare with: IRENA, ...)
- renewable sources  (compare with: IRENA, ...)
- network characteristics (length of lines for example, https://wiki.openmod-initiative.org/wiki/Transmission_network_datasets)

Plots:
- Compare the statistics of the PyPSA-Earth model with open data

## Public data sources collection
These sources could be helpful:
- [ENTSO-E](https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show)
- [IRENA](https://www.irena.org/data-and-statistics), not working
- [IEA](https://www.iea.org/data-and-statistics)
    - Electricity demand: https://www.iea.org/data-and-statistics/data-product/world-energy-balances-highlights
- [WEC](https://www.worldenergy.org/statistics/), not working
- [WRI](https://www.wri.org/resources/data-sets)
- [UN](https://unstats.un.org/unsd/snaama/)
- [WBG](https://datacatalog.worldbank.org/dataset/world-development-indicators)
- [OECD](https://data.oecd.org/)
- [Eurostat](https://ec.europa.eu/eurostat/data/database)
- [EIA](https://www.eia.gov/outlooks/aeo/data/browser/)
- [Enerdata](https://www.enerdata.net/research/)
- [BP](https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy.html)
- [USAID](https://www.usaid.gov/what-we-do/energy/global-energy-database), Single countries only?

https://www.usaid.gov/powerafrica/nigeria


## TODO
- DONE: Include continent analysis with country converter coco
- DONE: Continent `Asia` shows high ror and low hydro in PyPSA-Earth, but low ror and high hydro in IRENA. Why? Technology mismatch?
- Include stats on how many of the countries of a specific continent are in the PyPSA-Earth model, to better compare continental data. coco could be helpful here. Is this necessary?
- Fossil fuels as oil, gas, coal are often summed up as 'Fossil fuels n.e.s' in Europe which means 'not elsewhere specified' in IRENA data. How to deal with this?

## Questions
- 

## Preparation

### Import packages

In [None]:
import logging
import os
import sys

import pypsa
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import country_converter as coco
import geopandas as gpd
import matplotlib

logger = logging.getLogger(__name__)

pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", 70)

In [None]:
nice_names = {
    "nuclear": "Nuclear",
    "oil": "Oil",
    "onwind": "Onshore wind",
    #"ror": "Run of river",
    "solar": "Solar PV",
    "hydro": "Hydro",
    "gas": "Gas",
    "coal_and_lignite": "Coal",
}

### Set main directory to root folder

In [None]:
# change current directory
module_path = os.path.abspath(os.path.join('../../../')) # To import helpers

if module_path not in sys.path:
    sys.path.append(module_path+"/pypsa-earth/scripts")
    
from _helpers import sets_path_to_root, country_name_2_two_digits, two_digits_2_name_country

sets_path_to_root("documentation")

### Load stats data (obtained from pypsa-earth)

In [None]:
# Read it with multilevel column names. Make sure that the country index "NA" is not recognized as NaN
stats = pd.read_csv("/data/davidef/gitsegan/documentation/stats_merged.csv", index_col=0, header=[0,1], keep_default_na=False, na_values="")

In [None]:
stats.loc["US", stats.columns.get_level_values(1)=="total_time"]/3600

### Load public data

In [None]:
EXAMPLE_URL="https://pxweb.irena.org/pxweb/en/IRENASTAT/IRENASTAT__Power%20Capacity%20and%20Generation/ELECCAP_2024_cycle2.px/"

In [None]:
# Read the data "https://pxweb.irena.org/pxweb/en/IRENASTAT/IRENASTAT__Power%20Capacity%20and%20Generation/ELECCAP_2022_cycle2.px/"
# TODO can we download the data directly?
irena_eleccap = pd.read_csv("notebooks/validation/temp_irena/ELECCAP_20230314-165057.csv", encoding="latin-1", skiprows=2)

# Replace ".." in the dataframe with NaN
irena_eleccap = irena_eleccap.replace("..", np.nan)

# Change dtype of column "Installed electricity capacity by country/area (MW)" to float
irena_eleccap["Installed electricity capacity by country/area (MW)"] = irena_eleccap["Installed electricity capacity by country/area (MW)"].astype(float)

In [None]:
# Combine ongrid and offgrid
irena_eleccap = irena_eleccap.groupby(["Country/area", "Year", "Technology"]).sum(numeric_only=True).reset_index() #"Technology", "Installed electricity capacity by country/area (MW)"

# Delete the column "Year" since it is not needed anymore
irena_eleccap = irena_eleccap.drop(columns=["Year"])

In [None]:
# Check data for a single country
irena_eleccap[irena_eleccap["Country/area"] == "Germany"].head(5)

## Validation

### Installed capacity by technology

In [None]:
# Define the technologies which should be compared
techs = ["CCGT", "OCGT", "nuclear", "onwind", "solar", "ror",  "hydro", "oil", "coal", "lignite"]

# Select rule "add_electricity" and their techs 
stats_capacities = stats["add_electricity"].loc[:, (techs)]

# Add continent at the beginning of the dataframe
stats_capacities.insert(0, "continent", coco.convert(names = stats_capacities.index, src = 'ISO2', to = 'continent'))

# Replace NaN with zeros
stats_capacities = stats_capacities.fillna(0)

In [None]:
# Combine CCGT and OCGT to "gas"
stats_capacities["gas"] = stats_capacities["CCGT"] + stats_capacities["OCGT"]
stats_capacities["coal_and_lignite"] = stats_capacities["coal"] + stats_capacities["lignite"]
stats_capacities["hydro"] = stats_capacities["ror"] + stats_capacities["hydro"]
del stats_capacities["CCGT"] 
del stats_capacities["OCGT"] # TODO write in one line
del stats_capacities["coal"] 
del stats_capacities["lignite"]
del stats_capacities["ror"]

In [None]:
# Rename to nice plotting names
stats_capacities.rename(columns=nice_names, inplace=True)

In [None]:
# Add a empty column "Fossil fuels n.e.s"
stats_capacities["Fossil fuels n.e.s."] = 0

In [None]:
stats_capacities.head()

#### Uniform technology names and dataframe structure

In [None]:
# Create dict to match the technology names of stats_capacities and irena eleccap
uniform_names = {"Solar photovoltaic": "solar",
        "Onshore wind energy": "onwind",
        #"Offshore wind energy": "offwind",
        "Renewable hydropower": "hydro",
        "Nuclear": "nuclear",
        "Oil": "oil",
        "Natural gas": "gas",
        "Mixed Hydro Plants": "ror", # TODO Is this correct? Check IRENA    
        "Coal and peat": "coal_and_lignite",
        }

In [None]:
# Rename the technologies in irena_eleccap to match the names in stats_capacities using the dict names
irena_eleccap["Technology"] = irena_eleccap["Technology"].replace(uniform_names)

# Transform technologies to columns and have the countries as index
irena_eleccap = irena_eleccap.pivot_table(index=["Country/area"], columns="Technology", values="Installed electricity capacity by country/area (MW)")
# Reset name of columns
irena_eleccap.columns.name = None

# Combine the columns ror and hydro and name them hydro
irena_eleccap["hydro"] = irena_eleccap["ror"] + irena_eleccap["hydro"]
del irena_eleccap["ror"]

In [None]:
# Change the index of irena_eleccap to two digit country name using the function country_name_2_two_digits()
irena_eleccap.index = irena_eleccap.index.map(country_name_2_two_digits)

In [None]:
# Add continent at the beginning of the dataframe
irena_eleccap.insert(0, "continent", coco.convert(names = irena_eleccap.index, src = 'ISO2', to = 'continent'))

# Rename to nice plotting names
irena_eleccap.rename(columns=nice_names, inplace=True)

#### Plot country comparison

In [None]:
def plot_barplot(data_stats, data_irena, area, title=False):
    # Plot a barplot to compare the technologies of the two dataframes irena_eleccap and stats_capacities

    # Get the index values
    index = np.arange(len(data_stats))
    barWidth = 0.3

    # Create a barplot
    plt.figure(figsize=(6, 3))
    plt.bar(index - barWidth/2, data_stats/1e3, color=['g'], alpha=1, edgecolor='white', width=barWidth) # convert from MW to GW
    plt.bar(index + barWidth/2, data_irena/1e3, color=['g'], alpha=0.3, edgecolor='white', width=barWidth) # convert from MW to GW

    # Enhance graph
    plt.xticks(index, data_stats.index)
    plt.ylabel("Capacity in GW")
    plt.legend(["PyPSA-Earth 2024", "IRENA 2022"], loc='upper left', ncol=1)

    plt.grid(axis='y', alpha=0.5)
    if title == True:
        if area == "Global":
            plt.title(f"Electric capacity ({area})")
        else:
            plt.title(f"Electric capacity in {area}")

    # Show graphic & save it
    plt.savefig(f"notebooks/validation/temp_results/el_cap_{area}.pdf", bbox_inches='tight') # TODO add save path
    plt.show()
    
    return

In [None]:
def is_country_or_continent(area_name):

    # Check if the area name is a valid ISO-3166 country code

    area_name = coco.convert(names=area_name, to='name_short')
    if area_name != "not found":
        return "country"
    else:
        return "continent" # TODO all invalid names are continents, this should be improved


In [None]:
def area_selection(area, stats_capacities, irena_eleccap, group_fossils=False):

    _stats_capacities = stats_capacities.copy()
    _irena_eleccap = irena_eleccap.copy()

    if area == "Global":
        _stats_capacities = _stats_capacities.sum(axis=0).to_frame().T #.rename(columns={"continent": "PyPSA-Earth"})
        _stats_capacities["continent"] = "Global"
        _stats_capacities.set_index("continent", inplace=True)

        _irena_eleccap = _irena_eleccap.sum(axis=0).to_frame().T #.rename(columns={"continent": "PyPSA-Earth"})
        _irena_eleccap["continent"] = "Global"
        _irena_eleccap.set_index("continent", inplace=True)

    elif area != "Global":
        if is_country_or_continent(area) == "continent":

            # Group the data by continent if area is "continent"
            _irena_eleccap = _irena_eleccap.groupby("continent").sum()
            _stats_capacities = _stats_capacities.groupby("continent").sum()

        elif is_country_or_continent(area) == "country":
            try:
                _irena_eleccap.drop(columns=["continent"], inplace=True)
                _stats_capacities.drop(columns=["continent"], inplace=True)
            except:
                pass

    # Select the data for the area
    data_irena = _irena_eleccap.loc[area][_stats_capacities.loc[area].index]
    data_stats = _stats_capacities.loc[area]

    if group_fossils == True:
        data_stats["Fossil fuels"] = data_stats["Oil"] + data_stats["Gas"] + data_stats["Coal"] + data_stats["Fossil fuels n.e.s."]
        data_stats = data_stats.drop(["Oil", "Gas", "Coal", "Fossil fuels n.e.s."])
        data_irena["Fossil fuels"] = data_irena["Oil"] + data_irena["Gas"] + data_irena["Coal"] + data_irena["Fossil fuels n.e.s."]
        data_irena = data_irena.drop(["Oil", "Gas", "Coal", "Fossil fuels n.e.s."])

    return data_stats, data_irena

### Plot comparison (both country or continent possible)

Single country, continent, or global

In [None]:
area = "Global"
data_stats, data_irena = area_selection(area, stats_capacities, irena_eleccap, group_fossils=True)
plot_barplot(data_stats, data_irena, area)

In [None]:
(data_stats/1000).head(10)

In [None]:
(data_irena/1000).head(10)

All continents

In [None]:
# Plot and save all continents
areas = stats_capacities.continent.unique()
areas = areas[areas != "not found"]
for area in areas:
    data_stats, data_irena = area_selection(area, stats_capacities, irena_eleccap, group_fossils=True)
    plot_barplot(data_stats, data_irena, area)
    if area == "Europe":
        print("Fossil fuels as oil, gas, coal are often summed up as 'Fossil fuels n.e.s' in Europe which means 'not elsewhere specified' in IRENA data.")

In [None]:
stats_capacities

## plot total by country

In [None]:
stats_capacities.columns.difference(["continent"])

In [None]:
cap_stats_cols = stats_capacities.columns.difference(["continent"])

stats_cap_country = gpd.GeoDataFrame(index=stats_capacities.index)
stats_cap_country["geometry"] = stats_cap_country.index.map(
    lambda country_code: gpd.read_file("../pypsa-earth/resources/" + country_code + "/shapes/country_shapes.geojson")["geometry"].iloc[0]
)
stats_cap_country["PyPSA Earth 2024"] = stats_capacities[cap_stats_cols].sum(axis=1)

cap_irena_cols = irena_eleccap.columns.difference(["continent"])

stats_cap_country["IRENA 2022"] = irena_eleccap[cap_irena_cols].sum(axis=1).fillna(0.001)

stats_cap_country["ratio"] = stats_cap_country["PyPSA Earth 2024"] / stats_cap_country["IRENA 2022"]*100
stats_cap_country["abs_ratio"] = stats_cap_country["ratio"].abs()



In [None]:
stats_cap_country.loc["IT"]

In [None]:
plot_col = "ratio"
k_quantiles = 6

min_val = max(1, stats_cap_country[plot_col].min())
max_val = min(300., stats_cap_country[plot_col].max())

def custom_cmap(x):
    if x < 10:  # <10% abs eror
        return (0, 1, 0)  # green
    if x < 30 :  #
        return (0, .5, 0)  # dark green
    if x < 50 == True:
        return (1, 1, 0)  # yellow
    if x < 100:
        return (1.0, 0.6, 0)  # orange
    return (1, 0, 0)  # red

# stats_cap_country["color"] = stats_cap_country["abs_difference_pc"].map(custom_cmap)
ax = stats_cap_country.plot(
    column=plot_col,
    cmap="viridis",
    scheme="UserDefined",
    k=k_quantiles,
    # color="color",
    legend=True,
    legend_kwds={
        "loc": 'center left',
        "bbox_to_anchor": (1, 0.5),
    },
    figsize=(15, 7),
    classification_kwds=dict(bins=[50, 80, 120, 150]),
    #norm=matplotlib.colors.LogNorm(vmin=min_val, vmax=max_val),
)
# ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# matplotlib.pyplot.tight_layout()
# matplotlib.pyplot.show()
# Manipulate the legend's label texts
# Replace range_text with mid_range as new label texts
leg1 = ax.get_legend()
labels = [""]
for it, eb in enumerate(leg1.get_texts()):
    txt = eb.get_text()
    low,high = [float(tt) for tt in  txt.split(sep=",")]
    avg = (low+high)/2        # compute mid-range values
    #print(low,high,avg)
    if it==0:
        eb.set_text(f"<{high:.0f}%") # midrange values, 2 deci digits
    elif it==len(leg1.get_texts())-1:
        eb.set_text(f">{low:.0f}%") # midrange values, 2 deci digits
    else:
        eb.set_text(f"{low:.0f}-{high:.0f}%") # midrange values, 2 deci digits
    #eb.set_text(f"")         # blank-out text

ax.set_title("IRENA 2022 - PyPSA-Earth 2030 installed capacity ratio in %")
ax.set_axis_off()

plt.savefig(f"notebooks/validation/temp_results/world_cap_map.pdf", bbox_inches='tight')

### Demand

In [None]:
stats.head()

In [None]:
# Select rule "add_electricity" and their techs 
stats_demand = stats["add_electricity"].loc[:, "demand"]
stats_demand_solve = stats["solve_network"].loc[:, "demand"]

In [None]:
# Create a dataframe with the demand of add_electricity and solve_network
stats_demand = pd.concat([stats["add_electricity"].loc[:, "demand"], stats["solve_network"].loc[:, "demand"]], axis=1)
stats_demand.columns = ["demand_add_el", "demand_solve"]

In [None]:
# Plot a boxplot to compare the difference in percent between the demand in "stats_demand" and "stats_demand_solve"
diff = ((stats_demand["demand_solve"] - stats_demand["demand_add_el"])/stats_demand["demand_add_el"]*100).dropna()
plt.figure(figsize=(6, 4))
plt.boxplot(diff)
plt.ylabel("Demand reduction in %")
plt.title("Demand reduction from 'add_electricity' to 'solve_network'")
plt.grid(axis='y', alpha=0.5)
plt.xticks([1], ["Countries"])
plt.show()

In [None]:
diff.describe()

In [None]:
# Add continent at the beginning of the dataframe
stats_demand.insert(0, "continent", coco.convert(names = stats_demand.index, src = 'ISO2', to = 'continent'))

In [None]:
stats_demand.head()

OWID

In [None]:
# Source: https://ourworldindata.org/grapher/electricity-demand?time=2022&country=USA~GBR~FRA~DEU~IND~BRA

In [None]:
owid_el_demand = pd.read_csv("notebooks/validation/temp_owid/electricity-demand.csv", index_col=0)

In [None]:
# Select the year 2021
ember_el_demand = owid_el_demand[owid_el_demand["Year"] == 2021]

# Get Ember data only
ember_el_demand_continent = ember_el_demand.loc[ember_el_demand.index.str.contains("Ember")]

# Delete the string " (Ember)" contained in the indizes
ember_el_demand_continent.index = ember_el_demand_continent.index.str.replace(" \(Ember\)", "", regex=True)

# Get country-level data
ember_el_demand_country = ember_el_demand.loc[~ember_el_demand.Code.isna()]

# Add 2-letter country code
ember_el_demand_country["ISO2"] = coco.CountryConverter().pandas_convert(series=ember_el_demand_country.Code, to="ISO2")


In [None]:
# Sum up the rows "North America" and "Latin America and Caribbean" along the columns "Electricity demand (TWh)" to "America"
ember_el_demand_continent.loc["America"] = ember_el_demand_continent.loc["North America"] + ember_el_demand_continent.loc["Latin America and Caribbean"]
# Set the Year of America to 2021
ember_el_demand_continent.loc["America", "Year"] = 2021

In [None]:
ember_el_demand_continent = ember_el_demand_continent.loc[['Europe', 'Asia', 'Africa', 'Oceania', 'America']]

In [None]:
# Create new column continent
ember_el_demand_continent.insert(0, "continent", ember_el_demand_continent.index)

# Sort index alphabetically
ember_el_demand_continent.sort_index(inplace=True)

In [None]:
ember_el_demand_continent.head()

In [None]:
stats_demand_continent = stats_demand.groupby("continent").sum()/1e6 #convert from MWh to TWh

# drop row "not found"
stats_demand_continent.drop(index="not found", errors="ignore", inplace=True)

In [None]:
stats_demand_continent.head(10)

In [None]:
def plot_barplot_demand(stats_demand_continent, ember_el_demand, title=True):
    # Plot a barplot to compare the technologies of the two dataframes irena_eleccap and stats_capacities

    # Get the index values
    index = np.arange(len(stats_demand_continent))
    barWidth = 0.3

    # Create a barplot
    plt.figure(figsize=(6, 3))
    plt.bar(index - barWidth/2, stats_demand_continent, color=['g'], alpha=1, edgecolor='white', width=barWidth)
    plt.bar(index + barWidth/2, ember_el_demand, color=['g'], alpha=0.3, edgecolor='white', width=barWidth)

    # Enhance graph
    plt.xticks(index, stats_demand_continent.index)
    plt.ylabel("Electricity demand in TWh")
    plt.legend(["PyPSA-Earth 2030", "Ember 2021"], loc='upper left', ncol=1)

    plt.grid(axis='y', alpha=0.5)
    if title == True:
        plt.title(f"Electricity demand")

    # Show graphic & save it
    plt.savefig(f"notebooks/validation/temp_results/el_demand.pdf", bbox_inches='tight')
    plt.show()
    
    return

plot_barplot_demand(stats_demand_continent["demand_add_el"], ember_el_demand_continent["Electricity demand (TWh)"], title=False)

In [None]:
stats_demand_continent["demand_add_el"]

In [None]:
ember_el_demand_continent["Electricity demand (TWh)"]

### Country maps demand comparison

In [None]:
comparison_demand_country = (
    stats_demand
    .drop(columns="continent")
    .reset_index()
    .rename(columns={"index": "ISO2"})
    .merge(
        ember_el_demand_country.rename(columns={"Electricity demand (TWh)": "demand_iea"})
        .drop(columns=["Year"])
        .rename(columns={"Code": "ISO3"})
        .reset_index(drop=True),
        how="left",
    )
)
for col in ["demand_add_el", "demand_solve"]:
    comparison_demand_country[col] /= 1e6
comparison_demand_country.head()

In [None]:
# import cartopy.io.shapereader as shpreader

# shpfilename = shpreader.natural_earth(resolution='110m',
#                                       category='cultural',
#                                       name='admin_0_countries')
# reader = shpreader.Reader(shpfilename)
# countries = reader.records()

# geoms_countries = pd.Series({
#     country.attributes["ADMIN"]: country.geometry
#     for country in reader.records()
#     if comparison_demand_country.ISO2.isin([country.attributes["ISO_A2"]]).any()
# })
# geoms_countries

comparison_demand_country["geometry"] = comparison_demand_country.ISO2.map(
    lambda country_code: gpd.read_file("../pypsa-earth/resources/" + country_code + "/shapes/country_shapes.geojson")["geometry"].iloc[0]
)

comparison_column = "demand_add_el"
comparison_base = "demand_iea"

comparison_demand_country["demand_error"] = (comparison_demand_country[comparison_column] - comparison_demand_country[comparison_base])
comparison_demand_country["demand_error_pc"] = 100*comparison_demand_country["demand_error"]/comparison_demand_country[comparison_base]

comparison_demand_country["absolute_error_pc"] = comparison_demand_country["demand_error_pc"].abs()

comparison_demand_country["ratio"] = (comparison_demand_country[comparison_column] / comparison_demand_country[comparison_base]) * 100
comparison_demand_country["ratio_abs"] = comparison_demand_country["ratio"].abs()


comparison_demand_country = gpd.GeoDataFrame(comparison_demand_country, geometry="geometry")


# comparison_demand_country["geometry"] = [

#     for cc_a3 in comparison_demand_country["ISO3"]
# ]

In [None]:
comparison_demand_country["demand_error"].sort_values(ascending=False).plot(use_index=False)

In [None]:
comparison_demand_country["demand_error_pc"].sort_values(ascending=False).plot(use_index=False)

In [None]:
plot_col = "ratio"
k_quantiles = 6

min_val = max(1, comparison_demand_country[plot_col].min())
max_val = min(300., comparison_demand_country[plot_col].max())

def custom_cmap(x):
    if x < 50:  # <10% abs eror
        return (0, 1, 0)  # green
    if x < 80 :  #
        return (0, .5, 0)  # dark green
    if x < 120:
        return (1, 1, 0)  # yellow
    if x < 150:
        return (1.0, 0.6, 0)  # orange
    return (1, 0, 0)  # red

comparison_demand_country["color"] = comparison_demand_country["ratio"].map(custom_cmap)
ax = comparison_demand_country.plot(
    column=plot_col,
    cmap="viridis",
    scheme="UserDefined",
    # k=k_quantiles,
    # color="color",
    legend=True,
    legend_kwds={
        "loc": 'center left',
        "bbox_to_anchor": (1, 0.5),
    },
    figsize=(15, 7),
    classification_kwds=dict(bins=[50, 80, 120, 150]),
    #norm=matplotlib.colors.LogNorm(vmin=min_val, vmax=max_val),
)
# ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# matplotlib.pyplot.tight_layout()
# matplotlib.pyplot.show()
# Manipulate the legend's label texts
# Replace range_text with mid_range as new label texts
leg1 = ax.get_legend()
labels = [""]
for it, eb in enumerate(leg1.get_texts()):
    txt = eb.get_text()
    low,high = [float(tt) for tt in  txt.split(sep=",")]
    avg = (low+high)/2        # compute mid-range values
    #print(low,high,avg)
    if it==0:
        eb.set_text(f"<{high:.0f}%") # midrange values, 2 deci digits
    elif it==len(leg1.get_texts())-1:
        eb.set_text(f">{low:.0f}%") # midrange values, 2 deci digits
    else:
        eb.set_text(f"{low:.0f}-{high:.0f}%") # midrange values, 2 deci digits
    #eb.set_text(f"")         # blank-out text

ax.set_title("Ember 2021 - PyPSA-Earth 2030 electricity demand ratio in %")
ax.set_axis_off()

plt.savefig(f"notebooks/validation/temp_results/world_el_map.pdf", bbox_inches='tight')

IEA

In [None]:
# Read the data from the excel file
iea_web = pd.read_excel("notebooks/validation/temp_IEA/World Energy Balances Highlights 2022.xlsx", sheet_name="TimeSeries_1971-2021", skiprows=1, index_col=0)

In [None]:
# filter the data, only "Total final consumption (PJ)" in the column "Flow" and "Electricity" in the column "Product" is required
iea_el_demand = iea_web[(iea_web["Flow"] == "Total final consumption (PJ)") & (iea_web["Product"] == "Electricity")]

# Get electricity demand of 2020 (most recent one available)
iea_el_demand = pd.DataFrame(iea_el_demand[2020]) * 0.277777778 #convert PJ to TWh 

In [None]:
# Change index entry "Non-OECD Asia (including China)" to "Non-OECD Asia (including C)" to avoid regex to match "China"
iea_el_demand.rename(index={"Non-OECD Asia (including China)": "Non-OECD Asia (including C)"}, inplace=True)

# Change country name to two digit country code and keep the old index where country_name_2_two_digits() returns "not found"
old_index = iea_el_demand.index.tolist()
new_index = iea_el_demand.index.map(country_name_2_two_digits) .tolist()

for i in range(len(new_index)):
    if new_index[i] == "not found":
        new_index[i] = old_index[i]

iea_el_demand.index = new_index

In [None]:
iea_el_demand.head()

### Networks

In [None]:
# Select rule "add_electricity" and their techs 
stats_network = stats["base_network"].loc[:, ["lines_length", "lines_capacity"]]

In [None]:
# Add continent at the beginning of the dataframe
stats_network.insert(0, "continent", coco.convert(names = stats_network.index, src = 'ISO2', to = 'continent'))

In [None]:
stats_network.head()

GridKit

In [None]:
# Get GridKit data https://zenodo.org/record/47317#.ZBw1KvaZM-U. Manually downloaded and extracted.
gridkit_europe = pd.read_csv("notebooks/validation/temp_gridkit/gridkit_euorpe/gridkit_europe-highvoltage-links.csv")
gridkit_northamerica = pd.read_csv("notebooks/validation/temp_gridkit/gridkit_north_america/gridkit_north_america-highvoltage-links.csv")

In [None]:
gridkit_europe.head()

In [None]:
gridkit_europe_length = gridkit_europe["length_m"].sum() / 1e3 # convert from m to km
gridkit_northamerica_length = gridkit_northamerica["length_m"].sum() / 1e3 # convert from m to km

In [None]:
gridkit_europe_length

In [None]:
gridkit_northamerica_length

Data In Brief: https://www.sciencedirect.com/science/article/pii/S2352340921006351#sec0011

In [None]:
dib_grid = pd.read_excel(
    "notebooks/validation/temp_dib/1-s2.0-S2352340921006351-mmc1.xlsx",
    sheet_name="Grids & transformers 2017",
    index_col=0,
    header=[0,1],
    skiprows=[2],
)
dib_grid["ISO2"] = coco.convert(names = dib_grid.index, to = 'ISO2')
dib_grid.head()

In [None]:
data_trans = dib_grid.iloc[:, [0, -1]].copy()
data_trans.loc[data_trans["ISO2"].map(lambda x: isinstance(x, list)), "ISO2"] = "RS"
data_trans = data_trans[data_trans.ISO2 != "not found"]
data_trans = data_trans.set_index("ISO2")
data_trans = data_trans.iloc[:,0].rename("transdata")
# data_trans.to_csv("data.csv")

In [None]:
stats_length = stats[("base_network", "lines_length")].copy().rename("pypsa_earth").to_frame()
stats_length["dib"] = data_trans
stats_length.loc["total"] = stats_length.sum()
stats_length["continent"] = coco.convert(names = stats_length.index, to = 'continent')
stats_length["ratio"] = stats_length["pypsa_earth"].div(stats_length["dib"].astype(float))
stats_length.loc[~np.isfinite(stats_length["ratio"]), 'ratio'] = np.nan
stats_length.to_csv("data.csv")

In [None]:
stats_by_continent = stats_length.groupby("continent")[["pypsa_earth", "dib"]].sum()/1000
stats_by_continent["ratio"] = stats_by_continent["pypsa_earth"] / stats_by_continent["dib"]
stats_by_continent

In [None]:
stats_length.sum()

## Compare Computational time

In [None]:
stats