# Stochastic optimisation example

This example optimises a single node with only a wind, solar, gas and lignite generator under uncertainty about the gas price.

In Stage 1 decisions are made about capacities of the generators while the gas price is unknown.


First we solve assuming knowledge about the gas price, then stochastically according to the probability distribution of gas prices.

We then show that the average total cost of the system of the stochastically optimised capacities is lower than the means of the solutions from the deterministically determined capacities.



### Required data

For this example, we need solar and wind generation time-series. For convenience, we will be fetching the time-series data directly from the renewables.ninja server. An arbitrary example of Germany's data is retrieved. 

The fetched files: 
- PV (1985-2016, SARAH) (6.37 MB)
- Wind (Current fleet, onshore/offshore separate, MERRA-2) (13.93 MB)

See: https://www.renewables.ninja/ 


### Dependencies

In [None]:
from io import StringIO

import matplotlib.pyplot as plt
import pandas as pd
import requests
from linopy.expressions import merge
from xarray import DataArray

import pypsa
from pypsa.descriptors import (
    get_bounds_pu,
    nominal_attrs,
)
from pypsa.descriptors import get_switchable_as_dense as get_as_dense
from pypsa.optimization.common import reindex

In [None]:
%matplotlib inline

### Retrieve PV & Wind data

In [None]:
urls = {
    "solar_pu": "https://www.renewables.ninja/country_downloads/DE/ninja_pv_country_DE_sarah_corrected.csv",
    "wind_pu": "https://www.renewables.ninja/country_downloads/DE/ninja_wind_country_DE_current-merra-2_corrected.csv",
}

In [None]:
def fetch_timeseries_data(url):
    """Fetch the timeseries data from the renewable.ninja server"""

    response = requests.get(url)
    response.raise_for_status()  # Raise an error for bad responses

    return pd.read_csv(
        StringIO(response.text), skiprows=2, parse_dates=["time"], index_col="time"
    )["national"]

In [None]:
solar_pu = fetch_timeseries_data(urls["solar_pu"])
wind_pu = fetch_timeseries_data(urls["wind_pu"])

### Major settings

In [None]:
scenarios = ["low", "med", "high"]

# this just determines the default scenario when building stochastic model
base_scenario = "low"

# in EUR/MWh_th
gas_prices = {"low": 40, "med": 70, "high": 100}

probability = {"low": 0.4, "med": 0.3, "high": 0.3}

In [None]:
# years for weather data (solar is 1985-2015 inclusive, wind is 1980-2019)
year_start = 2015
year_end = 2015

# 1 is hourly, 3 is 3-hourly
frequency = 3

# Fixed load in MW
load = 1

# https://github.com/ERGO-Code/HiGHS
solver_name = "highs"

cts = ["DE"]

### Prepare data

In [None]:
assumptions = pd.DataFrame(
    columns=["FOM", "discount rate", "efficiency", "investment", "lifetime"],
    index=["default", "onshore wind", "utility solar PV", "gas CCGT", "lignite"],
)

assumptions.at["default", "FOM"] = 3.0
assumptions.at["default", "discount rate"] = 0.03
assumptions.at["default", "lifetime"] = 25

assumptions.at["onshore wind", "investment"] = 2e6
assumptions.at["utility solar PV", "investment"] = 10e5
assumptions.at["gas CCGT", "investment"] = 7e5
assumptions.at["gas CCGT", "efficiency"] = 0.6

assumptions.at["lignite", "investment"] = 15e5
assumptions.at["lignite", "efficiency"] = 0.3

# fill defaults
assumptions = assumptions.fillna(
    {
        "FOM": assumptions.at["default", "FOM"],
        "discount rate": assumptions.at["default", "discount rate"],
        "lifetime": assumptions.at["default", "lifetime"],
    }
)


def annuity(lifetime, rate):
    if rate == 0.0:
        return 1 / lifetime
    else:
        return rate / (1.0 - 1.0 / (1.0 + rate) ** lifetime)


# annualise investment costs, add FOM
assumptions["fixed"] = [
    (annuity(v["lifetime"], v["discount rate"]) + v["FOM"] / 100.0) * v["investment"]
    for i, v in assumptions.iterrows()
]

assumptions

### Required functions

In [None]:
# prepare base network (without stochastic optimisation)
def prepare_network(cts, gas_price):
    network = pypsa.Network()

    snapshots = pd.date_range(
        f"{year_start}-01-01",
        f"{year_end}-12-31 23:00",
        freq=str(frequency) + "H",
    )

    network.set_snapshots(snapshots)

    network.snapshot_weightings = pd.Series(float(frequency), index=network.snapshots)

    for ct in cts:
        network.add("Bus", ct)
        network.add("Load", ct, bus=ct, p_set=load)

        network.add(
            "Generator",
            ct + " solar",
            bus=ct,
            p_max_pu=solar_pu.loc[snapshots],
            p_nom_extendable=True,
            marginal_cost=0.01,  # Small cost to prefer curtailment to destroying energy in storage, solar curtails before wind
            capital_cost=assumptions.at["utility solar PV", "fixed"],
        )

        network.add(
            "Generator",
            ct + " wind",
            bus=ct,
            p_max_pu=wind_pu.loc[snapshots],
            p_nom_extendable=True,
            marginal_cost=0.02,  # Small cost to prefer curtailment to destroying energy in storage, solar curtails before wind
            capital_cost=assumptions.at["onshore wind", "fixed"],
        )

        network.add(
            "Generator",
            ct + " gas",
            bus=ct,
            p_nom_extendable=True,
            efficiency=assumptions.at["gas CCGT", "efficiency"],
            marginal_cost=gas_price / assumptions.at["gas CCGT", "efficiency"],
            capital_cost=assumptions.at["gas CCGT", "fixed"],
        )

        network.add(
            "Generator",
            ct + " lignite",
            bus=ct,
            p_nom_extendable=True,
            efficiency=assumptions.at["lignite", "efficiency"],
            marginal_cost=150,
            capital_cost=assumptions.at["gas CCGT", "fixed"],
        )

    return network

In [None]:
# add additional operational scenarios to the base model
def prepare_stochastic_model(n):
    m = n.optimize.create_model()

    nonbase_scenarios = scenarios.copy()
    nonbase_scenarios.remove(base_scenario)

    # we only have generators in this example, which simplifies things
    c = "Generator"
    sns = n.snapshots
    attr = "p"
    active = None
    column = "bus"
    sign = 1
    ext_i = n.get_extendable_i(c)
    min_pu, max_pu = map(DataArray, get_bounds_pu(n, c, sns, ext_i, attr))
    capacity = n.model[f"{c}-{nominal_attrs[c]}"]

    for scenario in nonbase_scenarios:
        # add extra operational variables for each non-base scenario
        dispatch = m.add_variables(
            coords=m["Generator-p"].coords, name=f"Generator-p-{scenario}"
        )
        dispatch = reindex(dispatch, c, ext_i)

        # add dispatch constraints
        lhs = dispatch - max_pu * capacity  # instead of the tuple formulation
        m.add_constraints(lhs, "<=", 0, f"{c}-ext-{attr}-upper-{scenario}", active)

        lhs = dispatch - min_pu * capacity
        m.add_constraints(lhs, ">=", 0, f"{c}-ext-{attr}-lower-{scenario}", active)

        # add nodal balance constraints
        exprs = []
        expr = DataArray(sign) * m[f"{c}-{attr}-{scenario}"]
        buses = n.static(c)[column].rename("Bus")
        expr = expr.groupby(
            buses.to_xarray()
        ).sum()  # for linopy >=0.2, see breaking changes log
        exprs.append(expr)
        lhs = merge(exprs).reindex(Bus=n.buses.index)
        rhs = (
            (-get_as_dense(n, "Load", "p_set", sns) * n.loads.sign)
            .groupby(n.loads.bus, axis=1)
            .sum()
            .reindex(columns=n.buses.index, fill_value=0)
        )
        rhs.index.name = "snapshot"
        rhs = DataArray(rhs)
        mask = None
        m.add_constraints(lhs, "=", rhs, f"Bus-nodal_balance-{scenario}", mask=mask)

    # define the new objective

    objective = []
    weighting = n.snapshot_weightings.objective
    weighting = weighting.loc[sns]
    cost = (
        get_as_dense(n, c, "marginal_cost", sns)
        .loc[:, lambda ds: (ds != 0).all()]
        .mul(weighting, axis=0)
    )

    for scenario in scenarios:
        cost_modified = cost.copy()

        if scenario == base_scenario:
            name = f"{c}-{attr}"
        else:
            name = f"{c}-{attr}-{scenario}"
            cost_modified["DE gas"] = (
                cost_modified["DE gas"]
                * gas_prices[scenario]
                / gas_prices[base_scenario]
            )

        operation = m[name].sel({"snapshot": sns, c: cost.columns})
        objective.append((operation * (probability[scenario] * cost_modified)).sum())

    ext_i = n.get_extendable_i(c)
    cost = n.static(c)["capital_cost"][ext_i]
    objective.append((capacity * cost).sum())

    m.objective = merge(objective)

In [None]:
# Check that network is created correctly:
# gas_price = 30
# n = prepare_network(cts,gas_price)

### First solve capacities for each scenario deterministically

In [None]:
results = None

for scenario in scenarios:
    gas_price = gas_prices[scenario]

    n = prepare_network(cts, gas_price)

    n.optimize(solver_name=solver_name)

    if results is None:
        results = pd.DataFrame(columns=n.generators.index)
        results.index.name = "scenario"

    results.loc[scenario] = n.generators.p_nom_opt

In [None]:
results

### Now solve the full problem stochastically

In [None]:
gas_price = gas_prices[base_scenario]

n = prepare_network(cts, gas_price)

prepare_stochastic_model(n)

n.optimize.solve_model(solver_name=solver_name)

In [None]:
results.loc["stochastic"] = n.generators.p_nom_opt

In [None]:
results

### Now test each set of capacities against realisations of the gas price

In [None]:
for scenario in scenarios:
    gas_price = gas_prices[scenario]
    n = prepare_network(cts, gas_price)
    n.generators.p_nom_extendable = False

    for capacity_scenario in results.index:
        n.generators.p_nom = results.loc[capacity_scenario, n.generators.index]

        print(n.generators.p_nom)

        n.optimize(solver_name=solver_name)

        results.at[capacity_scenario, f"gas-p-{scenario}"] = n.generators_t.p[
            "DE gas"
        ].sum()
        results.at[capacity_scenario, f"lignite-p-{scenario}"] = n.generators_t.p[
            "DE lignite"
        ].sum()

In [None]:
results

In [None]:
for capacity_scenario in results.index:
    for g in n.generators.index:
        results.at[capacity_scenario, f"{g} CC"] = (
            results.at[capacity_scenario, g] * n.generators.at[g, "capital_cost"]
        )

    for scenario in scenarios:
        results.at[capacity_scenario, f"DE gas-{scenario} MC"] = (
            n.snapshot_weightings.objective.mean()
            * gas_prices[scenario]
            / n.generators.at["DE gas", "efficiency"]
            * results.at[capacity_scenario, f"gas-p-{scenario}"]
        )
        results.at[capacity_scenario, f"DE lignite-{scenario} MC"] = (
            n.snapshot_weightings.objective.mean()
            * n.generators.at["DE lignite", "marginal_cost"]
            * results.at[capacity_scenario, f"lignite-p-{scenario}"]
        )

    results.at[capacity_scenario, "DE gas-mean MC"] = sum(
        [
            probability[scenario]
            * results.at[capacity_scenario, f"DE gas-{scenario} MC"]
            for scenario in scenarios
        ]
    )
    results.at[capacity_scenario, "DE lignite-mean MC"] = sum(
        [
            probability[scenario]
            * results.at[capacity_scenario, f"DE lignite-{scenario} MC"]
            for scenario in scenarios
        ]
    )

In [None]:
fig, axes = plt.subplots(1, len(results.index), figsize=(len(results.index) * 4, 4))

colors = {
    "wind": "b",
    "solar": "y",
    "lignite": "black",
    "gas": "brown",
    "gas MC": "orange",
    "lignite MC": "gray",
}

# fig.suptitle('Horizontally stacked subplots')

for i, capacity_scenario in enumerate(results.index):
    ax = axes[i]

    df = pd.DataFrame(index=scenarios + ["mean"])

    for tech in ["solar", "wind", "gas", "lignite"]:
        df[tech] = results.at[capacity_scenario, f"DE {tech} CC"]

    for scenario in scenarios + ["mean"]:
        df.at[scenario, "gas MC"] = results.at[
            capacity_scenario, f"DE gas-{scenario} MC"
        ]
        df.at[scenario, "lignite MC"] = results.at[
            capacity_scenario, f"DE lignite-{scenario} MC"
        ]

    df.plot(kind="bar", stacked=True, ax=ax, color=colors)

    ax.set_title(f"capacity scenario {capacity_scenario}")

    ax.legend(loc="upper left")

    ax.set_ylim([0, 2.5e6])

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(4, 4))

df = (
    results[
        [
            "DE solar CC",
            "DE wind CC",
            "DE gas CC",
            "DE lignite CC",
            "DE gas-mean MC",
            "DE lignite-mean MC",
        ]
    ]
    .rename(columns=lambda x: x[3:-3])
    .rename(columns={"gas-mean": "gas MC", "lignite-mean": "lignite MC"})
)

df.plot(kind="bar", stacked=True, ax=ax, color=colors)

ax.set_xlabel("capacity scenario")

ax.set_title("means of results")
ax.set_ylim([0, 2e6])

# Analysis of a Stochastic Solution

### The Expected costs of ignoring uncertainty (ECIU)

in some literature also defined as the Value of Stochastic Solution (VSS). Can be used interchangeably.

The natural question to ask is how much difference it really makes to the quality of the decisions reached if I use a stochastic problem instead of a deterministic problem?

The ECIU measures the value of using a stochastic model (or the expected costs of ignoring uncertainty when using a deterministic model).


In [None]:
portfolios = pd.DataFrame()
costs = pd.Series()

### Define the naive problem (usually -- the expected value problem (EV))

In [None]:
# can be anything (e.g., the 'med' scenario). A texbook way is to take expected value of uncertain parameter.

naive_scenario = sum(pd.Series(gas_prices) * pd.Series(probability))
naive_scenario
# naive_scenario = gas_prices["med"]

### solve naive problem (deterministic)

In [None]:
scenario = "naive"  # naive problem (in literature often EVP for Expected Value Problem, if the naive assumption is the expected value)
gas_price = naive_scenario

n = prepare_network(cts, gas_price)

n.optimize(solver_name=solver_name)

portfolios[scenario] = n.generators.p_nom_opt
costs[scenario] = n.objective

In [None]:
# pd.set_option("display.precision", 10)
portfolios
# costs

### solve stochastic problem

In [None]:
scenario = "SP"  # SP for Stochastic Problem
gas_price = gas_prices[base_scenario]

n = prepare_network(cts, gas_price)
prepare_stochastic_model(n)

n.optimize.solve_model(solver_name=solver_name)

portfolios[scenario] = n.generators.p_nom_opt
costs[scenario] = n.objective

In [None]:
portfolios

### Solve stochastic problem constrained by the naive solution 

In [None]:
scenario = "SP-constrained"

gas_price = gas_prices[base_scenario]
n = prepare_network(cts, gas_price)
prepare_stochastic_model(n)

n.generators.p_nom_extendable = False
n.generators.p_nom = portfolios.loc[n.generators.index, "naive"]
# n.generators.T

n.optimize.solve_model(solver_name=solver_name)

In [None]:
# don't forget to add the capital costs of the (fixed) generators portfolio
c = "Generator"
ext_i = portfolios["naive"].index
cost = n.static(c)["capital_cost"][ext_i]
cost_of_portfolio = (n.generators.p_nom * cost).sum()
n.objective += cost_of_portfolio
n.objective

In [None]:
portfolios[scenario] = (
    n.generators.p_nom
)  # just a fixed copy of naive problem's solution
costs[scenario] = (
    n.objective
)  # must be >= than the stochastic solution's costs, because you do dispatch with the suboptimal first-stage decisions

costs

### Compute ECIU

In [None]:
# ECIU (or VSS) in M euro
eciu = (costs["SP-constrained"] - costs["SP"]) / 1e6
# ECIU in % of stochastic solution
eciu_pp = eciu / (costs["SP"] / 1e6) * 100

print(
    f"ECIU: {round(eciu, 3)} Meuro \nwhich is {round(eciu_pp)}% of stochastic solution's costs"
)

### The Expected Value of Perfect Information (EVPI)

If system planner knew at the first stage which scenario will play out, it could optimize an expansion plan (i.e. that results in lower cost) for that scenario.

The expected value (and the corresponding mathematical problem) of such solution is denoted in the literature as „wait-and-see” solution (or wait-and-see (WS) problem).

The difference between the (probability-weighted) wait-and-see solutions and the here-and-now (stochastic) solution represents the added value of information about the future (i.e., the expected profit).

*modelling perspective*: How much the expected costs could be reduced if system planner in the first stage knew exactly which scenario would happen?

*economic perspective*: An upper bound to the amount that should be paid for improved forecasts.

In [None]:
portfolios = pd.DataFrame()
costs = pd.Series()

### Solve Wait-and-See problems
where Wait-and-See (WS) is a standard textbook name for individual determinic problem (i.e. running a single scenario).

In [None]:
for scenario in scenarios:
    gas_price = gas_prices[scenario]
    n = prepare_network(cts, gas_price)

    n.optimize(solver_name=solver_name)

    if results is None:
        results = pd.DataFrame(columns=n.generators.index)
        results.index.name = "scenario"

    portfolios[scenario] = n.generators.p_nom_opt
    costs[scenario] = n.objective

### compute the expected value of wait-and-see scenario costs

In [None]:
ws = sum(costs * pd.Series(probability))

### solve stochastic problem

In [None]:
scenario = "SP"  # SP for Stochastic Problem
gas_price = gas_prices[base_scenario]

n = prepare_network(cts, gas_price)
prepare_stochastic_model(n)

n.optimize.solve_model(solver_name=solver_name)

portfolios[scenario] = n.generators.p_nom_opt
costs[scenario] = n.objective

### Compute EVPI

In [None]:
# EVPI in M euro
evpi = (
    costs["SP"] - ws
) / 1e6  # must be >=0 because improved information cannot make the decision maker worse
# ECIU in % of stochastic solution
evpi_pp = evpi / (costs["SP"] / 1e6) * 100

print(
    f"EVPI: {round(evpi, 3)} Meuro \nwhich is {round(evpi_pp)}% of stochastic solution's costs"
)

### Comparing the ECIU and EVPI metrics

ECIU: an investment decision is made when uncertainty is **ignored**. 
The ECIU is **the additional expected cost of assuming that future is certain**.

EVPI: an investment decision is made after uncertainty is **removed**.
The EVPI is the **expected cost of being uncertain about the future**.