# Working with pandas in pandapower

All data in pandapower is stored in pandas dataframes, which is why all functionalities that are implemented in pandas (http://pandas.pydata.org/) can be used with pandapower. pandas is a very powerful and widely used tool for data analysis.

In the following we will present just a few of the many possibilities how builtin pandas functions can make your life easier when working in pandapower.

<div class="alert alert-warning">
This tutorial refers to pandas functionalities and how to use them with pandapower. If you have not worked with pandas before, this should give you a good overview of the relevant functionality. If you are already familiar with pandas, it might still refresh your memory, but maybe not provide new information.
<div/>

We use the mv_oberrhein example from the pandapower networks package for this tutorial.

In [110]:
#Import the pandapower and the networks module:
import pandapower as pp
import pandapower.networks as nw
import pandas as pd

#Import an example network:
net = nw.mv_oberrhein()

## Accessing and Setting Parameters

You can accesss all parameters of one element with the .loc command:

In [111]:
#Detailed information about line 4
net.line.loc[4]

name                                  Line 4
std_type        NA2XS2Y 1x185 RM/25 12/20 kV
from_bus                                 144
to_bus                                   141
length_km                           0.381968
r_ohm_per_km                           0.161
x_ohm_per_km                           0.117
c_nf_per_km                              273
imax_ka                                0.362
df                                         1
type                                      cs
in_service                                 1
parallel                                   1
Name: 4, dtype: object

Or one parameter of multiple elements:

In [112]:
#Length information about line 5 and 6
net.line.length_km.loc[[5,6]]

5    3.3028
6    0.3504
Name: length_km, dtype: float64

The values are returned as a pandas series. To get them in an array instead, use the values function:

In [113]:
#Length information about line 5 and 6 as numpy array
net.line.length_km.loc[[5,6]].values

array([ 3.3028,  0.3504])

Multiple parameters of multiple elements are returned as a pandas dataframe:

In [115]:
#Length and type information about line 5 and 6                               
net.line.loc[[5,6], ["length_km", "r_ohm_per_km", "x_ohm_per_km", "c_nf_per_km"]]

Unnamed: 0,length_km,r_ohm_per_km,x_ohm_per_km,c_nf_per_km
5,3.3028,0.161,0.117,273.0
6,0.3504,0.161,0.117,273.0


The values function in this case yields a multi-dimensional array:

In [116]:
#Length and type information about line 5 and 6  as an array                             
net.line.loc[[5,6], ["length_km", "r_ohm_per_km", "x_ohm_per_km", "c_nf_per_km"]].values

array([[  3.30280000e+00,   1.61000000e-01,   1.17000000e-01,
          2.73000000e+02],
       [  3.50400000e-01,   1.61000000e-01,   1.17000000e-01,
          2.73000000e+02]])

To access exactly one parameter, .at can be used instead of .loc:

In [117]:
net.line.length_km.at[6]

0.35039999999999999

This .at solution can only be used for exactly one parameter, but is siginifcantly faster than using .loc. That is why you should always use .at if possible!

Setting parameters works the same way:

In [118]:
#Set single parameter with .at
net.line.length_km.at[5] = 3.2
#or
net.line.at[5, "length_km"] = 3.2

#Set multiple parameters with .loc
net.line.length_km.loc[[4,6]] = [1.8, 2.2]

#Output
print("Line lengths after parameter setting:")
net.line.length_km.loc[[4,5,6]]

Line lengths after parameter setting:


4    1.8
5    3.2
6    2.2
Name: length_km, dtype: float64

## Statistical Evaluations

You can easily find maximum, minimum values of a column with pandas, e.g.

In [119]:
print("Maximum Line Length: %.2f km"%net.line.length_km.max())
print("Minimum Line Length: %.2f km"%net.line.length_km.min())
print("Mean Line Length: %.2f km"%net.line.length_km.mean())

Maximum Line Length: 3.20 km
Minimum Line Length: 0.07 km
Mean Line Length: 0.62 km


Of course you can also combine these, for example to calculate weighted means:

In [72]:
weighted_mean_r = (net.line.length_km * net.line.r_ohm_per_km).sum() / net.line.length_km.sum()
print("Weighted Mean Resistance: %.2f Ohm per kilometer"%weighted_mean_r)

Weighted Mean Resistance: 0.16 Ohm per kilometer


## Iterating over Elements

If you want to iterate over a dataframe, use the iterrows function. It gives you the index of each row and the row as a pandas series.

In [73]:
for lidx, linerow in net.line.loc[[4,5,6]].iterrows():
        print("the line with index %s is %f kilometers long and has a resistance of %f ohm per kilometers" %(lidx, linerow.length_km, linerow.r_ohm_per_km))

the line with index 4 is 1.800000 kilometers long and has a resistance of 0.161000 ohm per kilometers
the line with index 5 is 3.200000 kilometers long and has a resistance of 0.161000 ohm per kilometers
the line with index 6 is 2.200000 kilometers long and has a resistance of 0.161000 ohm per kilometers


You can also group elements that share the same value in any column of the dataframe with the groubpy function of pandas;

In [74]:
for std_type, linetable in net.line.groupby(net.line.std_type):
        print("there are %u lines with standard type %s with an overall length of %f kilometers"%(len(linetable), std_type, linetable.length_km.sum()))

there are 10 lines with standard type 243-AL1/39-ST1A 20.0 with an overall length of 10.761400 kilometers
there are 160 lines with standard type NA2XS2Y 1x185 RM/25 12/20 kV with an overall length of 96.806343 kilometers
there are 11 lines with standard type NA2XS2Y 1x240 RM/25 12/20 kV with an overall length of 4.343042 kilometers


The first variable is now the value of the column you grouped by (here line standard type) and the second variable is a pandas dataframe of all lines that have this value (here all lines with the standard type defined in the first variable).

You can also directly sum up certain values of a groupby, for example

In [75]:
net.line.groupby(net.line.std_type).sum().length_km

std_type
243-AL1/39-ST1A 20.0            10.761400
NA2XS2Y 1x185 RM/25 12/20 kV    96.806343
NA2XS2Y 1x240 RM/25 12/20 kV     4.343042
Name: length_km, dtype: float64

gives you a pandas of series with the sum of all line length grouped by standard type.

## Querying Dataframes

You will often need to select elements with specific characteristics. You can do that with boolean masks:

In [76]:
mask = net.line.length_km > 2.5

the mask variable is now a boolean time series that indicates for every line if it longer than 2500 meters or not. We can now select only the ones which are longer than 2500 meters and output their name and length:

In [77]:
long_lines = net.line[mask]

#Output
long_lines[["name", "length_km", "std_type"]]

Unnamed: 0,name,length_km,std_type
5,Line 5,3.2,NA2XS2Y 1x185 RM/25 12/20 kV
45,Line 45,2.611111,NA2XS2Y 1x185 RM/25 12/20 kV
59,Line 59,3.15539,NA2XS2Y 1x185 RM/25 12/20 kV
91,Line 91,2.5282,NA2XS2Y 1x185 RM/25 12/20 kV
162,Line 162,2.5953,243-AL1/39-ST1A 20.0


or, directly in one step:

In [78]:
net.line[net.line.length_km > 2.5][["name", "length_km", "std_type"]]


Unnamed: 0,name,length_km,std_type
5,Line 5,3.2,NA2XS2Y 1x185 RM/25 12/20 kV
45,Line 45,2.611111,NA2XS2Y 1x185 RM/25 12/20 kV
59,Line 59,3.15539,NA2XS2Y 1x185 RM/25 12/20 kV
91,Line 91,2.5282,NA2XS2Y 1x185 RM/25 12/20 kV
162,Line 162,2.5953,243-AL1/39-ST1A 20.0



You can of course also check for exact values, for example if you want all 110 kV buses:

In [81]:
net.bus[net.bus.vn_kv == 110]

Unnamed: 0,name,vn_kv,type,in_service,zone
38,bus_250,110.0,n,1.0,
177,,110.0,b,1.0,


For string querys, use the str.contains function, for example to find loads with "MV" in the name:

In [82]:
net.load[net.load.name.str.startswith("MV")]

Unnamed: 0,name,bus,p_kw,q_kvar,sn_kva,scaling,in_service,type


or to find all all loads that contain "Load 0" in their name:

In [83]:
net.load[net.load.name.str.contains("Load 0")]

Unnamed: 0,name,bus,p_kw,q_kvar,sn_kva,scaling,in_service,type
0,Load 0,63,250.0,50.764665,,0.6,1.0,MV/LV Station


The isin function allows you to check if certain values are contained in a list, for example:

In [84]:
lines = net.line[net.line.from_bus.isin([39,72])]
lines[["name", "from_bus", "to_bus"]]

Unnamed: 0,name,from_bus,to_bus
147,Line 147,39,40
172,Line 172,72,93


returns the names of all lines that are connected to one of the buses 39 or 72 at the from bus.

You can also combine certain queries with the logical | (or) and & (and) for more complicated queries, for example:

In [85]:
lines = net.line[(net.line.from_bus.isin([39,72])) | (net.line.to_bus.isin([39,72]))]
lines[["name", "from_bus", "to_bus"]]

Unnamed: 0,name,from_bus,to_bus
147,Line 147,39,40
149,Line 149,49,39
172,Line 172,72,93
175,Line 175,50,72


returns the names of all lines that are connected to one of the buses 39 or 72 at the from bus or to bus.

Please be aware that the boolean query always returns a copy of the dataframe. Lets say you want to set all medium voltage nodes out of service and try this:

In [86]:
net.bus[net.bus.vn_kv > 25].in_service = False

You will notice that your pandapower network table did not change:

In [87]:
net.bus.in_service.head()

0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
Name: in_service, dtype: float64

This is because the query returns a copy and you change the value in this copy in the same way like the following code:

In [88]:
bus_table = net.bus[net.bus.vn_kv < 25]
bus_table.in_service = False

#Output
print(bus_table.in_service.head())
# but still:
print(net.bus.in_service.head())

0    False
1    False
2    False
3    False
4    False
Name: in_service, dtype: bool
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
Name: in_service, dtype: float64


Here bus_table is a copy of a part of the net.bus table. In the example above, you changed that copy, but not the original table. That is why you need to save the index of the copied bus table and than change the value of the original table:

In [89]:
#Get index and store in ns_nodes
ns_nodes = net.bus[net.bus.vn_kv < 25].index

#Change all values where index is in ns_nodes
net.bus.in_service.loc[ns_nodes] = False

#Output
print(net.bus.in_service.head())

0    False
1    False
2    False
3    False
4    False
Name: in_service, dtype: object


## Extending and Customizing the Framework

One of the benefits of pandapower is its easy to customize and extend datastructure. For pandapower to be able to run a loadflow, the standard parameters of the datastructure have to be defined. However, you can extend the pandapower datastructure any way you like.

Say you have network that is partitioned in three zones A, B and C: 

In [101]:
net.bus.loc[:int(len(net.bus.index)/2), "zone"] = "A"
net.bus.loc[int(len(net.bus.index)/2):, "zone"] = "B"

The goal is to apply a load scaling factor of 0.8 in Zone A and of 0.6 in Zone B:

In [102]:
for i, load in net.load.iterrows():
    if net.bus.zone.at[load.bus] == "A":
        net.load.scaling.at[i] = 0.8
    elif net.bus.zone.at[load.bus] == "B":
        net.load.scaling.at[i] = 0.6
net.load.head(n=5)

Unnamed: 0,name,bus,p_kw,q_kvar,sn_kva,scaling,in_service,type
0,Load 0,63,250.0,50.764665,,0.8,1.0,MV/LV Station
1,Load 1,109,630.0,127.926956,,0.6,1.0,MV/LV Station
2,Load 2,119,250.0,50.764665,,0.6,1.0,MV/LV Station
3,Load 3,14,250.0,50.764665,,0.8,1.0,MV/LV Station
4,Load 4,46,250.0,50.764665,,0.8,1.0,MV/LV Station


However, you could also extend the line table to include the zone of each line like this:

In [103]:
net.scaling = 1.0 #reset
net.load["zone"] = net.bus.zone.loc[net.load.bus.values].values
net.load.head(n=5)

Unnamed: 0,name,bus,p_kw,q_kvar,sn_kva,scaling,in_service,type,zone
0,Load 0,63,250.0,50.764665,,0.8,1.0,MV/LV Station,A
1,Load 1,109,630.0,127.926956,,0.6,1.0,MV/LV Station,B
2,Load 2,119,250.0,50.764665,,0.6,1.0,MV/LV Station,B
3,Load 3,14,250.0,50.764665,,0.8,1.0,MV/LV Station,A
4,Load 4,46,250.0,50.764665,,0.8,1.0,MV/LV Station,A


and then apply your function only to the lines in Zone A:

In [104]:
net.load.loc[net.load[net.load.zone=="A"].index, "scaling"] = 0.8
net.load.loc[net.load[net.load.zone=="B"].index, "scaling"] = 0.6
net.load.head(n=5)

Unnamed: 0,name,bus,p_kw,q_kvar,sn_kva,scaling,in_service,type,zone
0,Load 0,63,250.0,50.764665,,0.8,1.0,MV/LV Station,A
1,Load 1,109,630.0,127.926956,,0.6,1.0,MV/LV Station,B
2,Load 2,119,250.0,50.764665,,0.6,1.0,MV/LV Station,B
3,Load 3,14,250.0,50.764665,,0.8,1.0,MV/LV Station,A
4,Load 4,46,250.0,50.764665,,0.8,1.0,MV/LV Station,A


Of course for this simple example, the first solution would also be feasible. But the more complicated your code gets, the more important it is to customize the pandapower framework, so your code is more efficient as well as easier to read and debug.

## Merging Dataframes

Lets say you want to see the voltage at each bus together with the bus name. The voltage is stored in net.res_bus, whereas the name ist stored in net.bus. The two values are in different tables, but they have the same index, which is why you can easily merge them into a new shared dataframe:

In [105]:
net = nw.mv_oberrhein()
pp.runpp(net) #Run a load flow calculation to get result tables

#Store bus name with bus voltage result of loadflow calculation in bus_results
#axis={horizontal=0, vertical=1}: axis to concatenate along
bus_results = pd.concat([net.bus.name, net.res_bus.vm_pu], axis=1)

#Output
bus_results.head(n=8)

Unnamed: 0,name,vm_pu
0,bus_200,0.990963
1,bus_204,0.990837
2,bus_203,0.991809
3,bus_202,0.990077
4,bus_201,0.991523
5,bus_207,0.997197
6,bus_206,1.005408
7,bus_205,0.997228


Now bus results shows you the name and the voltage of each bus in one dataframe.

If you want to see the bus voltage of all loads, you can also merge the bus and the load dataframe. They do not share an index, but rather the index of the bus is given in the “bus” column of the load table. In that case you can merge the dataframes like this: (for more information about the arguments see [pandas merging documentation] [pandas_merg])
[pandas_merg]: http://pandas.pydata.org/pandas-docs/stable/merging.html 

In [106]:
load_bus_results = pd.merge(net.res_bus, net.load, left_index=True, right_on="bus")

#Output
load_bus_results.head(n=8)

Unnamed: 0,vm_pu,va_degree,p_kw_x,q_kvar_x,name,bus,p_kw_y,q_kvar_y,sn_kva,scaling,in_service,type
67,0.990963,-7.111844,240.0,48.734079,Load 67,0,400.0,81.223464,,0.6,1.0,MV/LV Station
17,0.990837,-7.116278,150.0,30.458799,Load 17,1,250.0,50.764665,,0.6,1.0,MV/LV Station
128,0.991809,-7.084888,150.0,30.458799,Load 128,2,250.0,50.764665,,0.6,1.0,MV/LV Station
73,0.990077,-7.140966,150.0,30.458799,Load 73,3,250.0,50.764665,,0.6,1.0,MV/LV Station
53,0.991523,-7.093822,150.0,30.458799,Load 53,4,250.0,50.764665,,0.6,1.0,MV/LV Station
61,0.997197,-6.63761,150.0,30.458799,Load 61,5,250.0,50.764665,,0.6,1.0,MV/LV Station
142,1.005408,-5.764742,300.0,60.917598,Load 142,6,500.0,101.52933,,0.6,1.0,MV Load
101,0.990384,-7.130613,240.0,48.734079,Load 101,8,400.0,81.223464,,0.6,1.0,MV/LV Station


Here net.res_bus is the left and net.load is the right dataframe. To merge the dataframe into one, the index of the left dataframe (left_index=True) and the bus column of the right dataframe (right_on=”bus”) are used. The resulting dataframe shows you the load table with the bus results of the respective load buses.

If you want to know the maximum voltage at a bus that has a load connected to it, you could use:

In [107]:
max_load_voltage = pd.merge(net.res_bus, net.load, left_index=True, right_on="bus").vm_pu.max()

#Output
max_load_voltage

1.0106900295057959

or alternatively with a boolean mask:

In [108]:
max_load_voltage = net.res_bus[net.bus.index.isin(net.load.bus.values)].vm_pu.max()

#Output
max_load_voltage

1.0106900295057959

In this simple case, the boolean mask solution is about 5 times faster than merging the dataframes, so only use merging for more complicated operations or if runtime is not important (e.g. debugging).