# Using the statistics module in PyPSA

The `statistics` module is used to easily extract information from your networks. This is useful when inspecting your solved networks and creating first visualizations of your results.

With the `statistics` module, you can look at different metrics of your network. A list of the implemented metrics are:
    
- Capital expenditure
- Operational expenditure
- Installed capacities
- Optimal capacities
- Supply
- Withdrawal
- Curtailment
- Capacity Factor
- Revenue
- Market value
- Energy balance

Now lets look at an example.

In [1]:
import pypsa
import pandas as pd
import matplotlib.pyplot as plt

First, we open an example network we want to investigate.

In [2]:
n = pypsa.examples.scigrid_de()

INFO:pypsa.io:Imported network scigrid-de.nc has buses, generators, lines, loads, storage_units, transformers


Lets run an overview of all statistics by calling:

In [3]:
n.statistics()

Unnamed: 0_level_0,Unnamed: 1_level_0,Capacity Factor,Capital Expenditure,Curtailment,Dispatch,Installed Capacity,Market Value,Operational Expenditure,Optimal Capacity,Revenue,Supply,Withdrawal
Unnamed: 0_level_1,carrier,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Generator,Brown Coal,,0.0,0.0,0.0,20879.5,,0.0,0.0,0.0,0.0,0.0
Generator,Gas,,0.0,0.0,0.0,23913.13,,0.0,0.0,0.0,0.0,0.0
Generator,Geothermal,,0.0,0.0,0.0,31.7,,0.0,0.0,0.0,0.0,0.0
Generator,Hard Coal,,0.0,0.0,0.0,25312.6,,0.0,0.0,0.0,0.0,0.0
Generator,Multiple,,0.0,0.0,0.0,152.7,,0.0,0.0,0.0,0.0,0.0
Generator,Nuclear,,0.0,0.0,0.0,12068.0,,0.0,0.0,0.0,0.0,0.0
Generator,Oil,,0.0,0.0,0.0,2710.2,,0.0,0.0,0.0,0.0,0.0
Generator,Other,,0.0,0.0,0.0,3027.8,,0.0,0.0,0.0,0.0,0.0
Generator,Run of River,,0.0,0.0,0.0,3999.1,,0.0,0.0,0.0,0.0,0.0
Generator,Solar,,0.0,0.0,0.0,37041.524779,,0.0,0.0,0.0,0.0,0.0


So far the `statistics` are not so interesting, because we have not solved the network yet. We can only see that the network already has some installed capacities for different components.

You can see that `statistics` returns a `pandas.DataFrame`. The MultiIndex of the `DataFrame` provides the name of the network component (i.e. first entry of the MultiIndex, like *Generator, Line,...*) on the first index level. The `carrier` index level provides the carrier name of the given component. For example, in `n.generators`, we have the carriers *Brown Coal, Gas* and so on.

Now lets solve the network.

In [4]:
n.optimize()

Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
       '32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
       '87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
       '120', '123', '124', '125', '128', '129', '138', '143', '156', '157',
       '159', '160', '165', '184', '191', '195', '201', '220', '231', '232',
       '233', '236', '247', '248', '250', '251', '252', '261', '263', '264',
       '267', '272', '279', '281', '282', '292', '303', '307', '308', '312',
       '315', '317', '322', '332', '334', '336', '338', '351', '353', '360',
       '362', '382', '384', '385', '391', '403', '404', '413', '421', '450',
       '458'],
      dtype='object', name='Transformer')
Index(['2', '5', '10', '12', '13', '15', '18', '20', '22', '24', '26', '30',
       '32', '37', '42', '46', '52', '56', '61', '68', '69', '74', '78', '86',
       '87', '94', '95', '96', '99', '100', '104', '105', '106', '107', '117',
       '120

GLPSOL--GLPK LP/MIP Solver 5.0
Parameter(s) specified in the command line:
 --lp /tmp/linopy-problem-sttdqu70.lp --output /tmp/linopy-solve-d56wbo7n.sol
Reading problem data from '/tmp/linopy-problem-sttdqu70.lp'...
142968 rows, 59640 columns, 261202 non-zeros
761240 lines were read
GLPK Simplex Optimizer 5.0
142968 rows, 59640 columns, 261202 non-zeros
Preprocessing...
22930 rows, 38242 columns, 119766 non-zeros
Scaling...
 A: min|aij| =  1.485e-02  max|aij| =  1.974e+02  ratio =  1.329e+04
GM: min|aij| =  1.854e-01  max|aij| =  5.395e+00  ratio =  2.911e+01
EQ: min|aij| =  3.436e-02  max|aij| =  1.000e+00  ratio =  2.911e+01
Constructing initial basis...
Size of triangular part is 22387
      0: obj =   9.374665196e+09 inf =   9.541e+08 (17831)
   8221: obj =   5.726371382e+07 inf =   5.071e+07 (12594) 58
  14847: obj =   4.757747487e+07 inf =   2.597e+07 (8971) 48
  20411: obj =   4.736806099e+07 inf =   1.599e+07 (6410) 38
  25460: obj =   4.164607034e+07 inf =   9.257e+06 (4472) 3

INFO:linopy.constants: Optimization successful: 
Status: ok
Termination condition: optimal
Solution: 59640 primals, 142968 duals
Objective: 6.68e+06
Solver model: not available
Solver message: optimal

INFO:pypsa.optimization.optimize:The shadow-prices of the constraints Generator-fix-p-lower, Generator-fix-p-upper, Line-fix-s-lower, Line-fix-s-upper, Transformer-fix-s-lower, Transformer-fix-s-upper, StorageUnit-fix-p_dispatch-lower, StorageUnit-fix-p_dispatch-upper, StorageUnit-fix-p_store-lower, StorageUnit-fix-p_store-upper, StorageUnit-fix-state_of_charge-lower, StorageUnit-fix-state_of_charge-upper, Kirchhoff-Voltage-Law, StorageUnit-energy_balance were not assigned to the network.


('ok', 'optimal')

Now we can look at the `statistics` of the solved network.

In [5]:
n.statistics()

Unnamed: 0_level_0,Unnamed: 1_level_0,Capacity Factor,Capital Expenditure,Curtailment,Dispatch,Installed Capacity,Market Value,Operational Expenditure,Optimal Capacity,Revenue,Supply,Withdrawal
Unnamed: 0_level_1,carrier,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Generator,Brown Coal,0.571594,0.0,0.0,286430.4,20879.5,14.151949,2864304.0,20879.5,4053548.0,286430.4,0.0
Generator,Gas,0.001311,0.0,0.0,752.1553,23913.13,50.0,37607.77,23913.13,37607.77,752.1553,0.0
Generator,Geothermal,0.242902,0.0,0.0,184.8,31.7,32.02785,4804.8,31.7,5918.747,184.8,0.0
Generator,Hard Coal,0.100441,0.0,0.0,61017.97,25312.6,25.586093,1525449.0,25312.6,1561211.0,61017.97,0.0
Generator,Multiple,0.0,0.0,0.0,0.0,152.7,,0.0,152.7,0.0,0.0,0.0
Generator,Nuclear,0.689185,0.0,0.0,199609.9,12068.0,16.395014,1596879.0,12068.0,3272607.0,199609.9,0.0
Generator,Oil,0.0,0.0,0.0,0.0,2710.2,,0.0,2710.2,0.0,0.0,0.0
Generator,Other,0.011299,0.0,0.0,821.0863,3027.8,34.036685,26274.76,3027.8,27947.06,821.0863,0.0
Generator,Run of River,0.898289,0.0,0.0,86216.38,3999.1,19.937617,258649.1,3999.1,1718949.0,86216.38,0.0
Generator,Solar,0.050194,0.0,2768.990558,44622.07,37041.524779,12.883098,0.0,37041.524779,574870.5,44622.07,0.0


As you can see there is now much more information available. There are still no capital expenditures in the network, because we only performed an operational optimization with this example network.

If you are interested in a specific metric, e.g. curtailment, you can run

In [6]:
curtailment = n.statistics.curtailment()
curtailment

             carrier      
Generator    Brown Coal           0.000000
             Gas                  0.000000
             Geothermal           0.000000
             Hard Coal            0.000000
             Multiple             0.000000
             Nuclear              0.000000
             Oil                  0.000000
             Other                0.000000
             Run of River         0.000000
             Solar             2768.990558
             Storage Hydro        0.000000
             Waste                0.000000
             Wind Offshore    20134.309955
             Wind Onshore     46228.020186
StorageUnit  Pumped Hydro         0.000000
dtype: float64

Note that when calling a specific metric the `statistics` module returns a `pandas.Series`.
To find the unit of the data returned by `statistics`, you can call `attrs` on the `DataFrame` or `Series`.

In [7]:
curtailment.attrs

{'name': 'Curtailment', 'unit': 'MWh'}

So the unit of curtailment is given in `MWh`. You can also customize your request.

For this you have various options:
1. You can select the component from which you want to get the metric with the attribute `comps`. Careful, `comps` has to be a list of strings.

In [None]:
n.statistics.curtailment(comps=["Generator"])

2. For metrics which have a time dimension, you can choose the aggregation method or decide to not aggregate them at all. Just use the `aggregate_time` attribute to specify what you want to do.

For example calculate the mean curtailment per time step is

In [None]:
n.statistics.curtailment(comps=["Generator"], aggregate_time="mean")

Or retrieve the curtailment time series by not aggregating the time series. 

In [None]:
n.statistics.curtailment(comps=["Generator"], aggregate_time=False).iloc[:,:4]

3. You can choose how you want to group the components of the network and how to aggregate the groups. By default the components are grouped by their carriers and summed. However, you can change this by providing different `groupby` and `aggregate_groups` attributes.

In [None]:
n.statistics.curtailment(comps=["Generator"], groupby=["bus"], aggregate_groups="max")

Now you obtained the maximal curtailment during one time step for every bus in the network.

Often it is better when inspecting your network to visualize the tables. Therefore, you can easily make plots to analyze your results. For example the generation/supply of the generators.

In [None]:
n.statistics.supply(comps=["Generator"]).div(1e3).plot.bar(title="Generator in GWh")

Or you could plot the generation time series of the generators.

In [None]:
fig, ax = plt.subplots()
n.statistics.supply(comps=["Generator"], aggregate_time=False).div(1e3).T.plot.area(
    title="Generation in GW", ax=ax, legend=False
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)

Finally, we want to look at the energy balance of the network. The energy balance is not included in the overview of the statistics module.To calculate the energy balance, you can do

In [None]:
n.statistics.energy_balance()

Note that there is now an additional index level called bus carrier. This is because an energy balance is defined for every bus carrier. The bus carriers you have in your network you can find by looking at `n.buses.carrier.unique()`. For this network, there is only one bus carrier which is AC. AC corresponds to electricity in the regarded network. However, you can have further bus carriers for example when you have a sector coupled network. You could for example have heat or CO $_2$ as carrier. Therefore, for many `statistics` functions you have to be careful about the units of the values and it is not always given by the `attr` object of the `DataFrame` or `Series`.