<img src="http://openenergy-platform.org/static/OEP_logo_2_no_text.svg" alt="OpenEnergy Platform" height="100" width="100"  align="left"/>

# OpenEnergyPlatform
<br><br>

## Normalizing / Denormalizing

This tutorial explains how to transform a custom number of columns of a table into rows (Normalizing) and vice versa (Denormalizing).
Please report bugs and improvements here: https://github.com/OpenEnergyPlatform/examples/issues <br>

In [1]:
__copyright__ = "Reiner Lemoine Institut"
__license__   = "GNU Affero General Public License Version 3 (AGPL-3.0)"
__url__       = "https://github.com/openego/data_processing/blob/master/LICENSE"
__author__    = "oakca, Ludee"

In [2]:
import pandas as pd

In [3]:
# open df.xlsx
with pd.ExcelFile('df.xlsx') as xls:

    # save Sheet1 in df.xlsx as df
    df = xls.parse('Sheet1')

# show df
df

Unnamed: 0,Site,Process,inst-cap,cap-lo,cap-up,inv-cost,fix-cost,var-cost,wacc,depreciation
0,Mid,Coal plant,0,0,0,600000,0,0.6,0.07,40
1,Mid,Lignite plant,0,0,60000,600000,0,0.6,0.07,40
2,Mid,Gas plant,0,0,80000,450000,0,1.6,0.07,30
3,Mid,Biomass plant,0,0,5000,875000,0,1.4,0.07,25
4,Mid,Wind plant,0,0,13000,1500000,0,0.0,0.07,25
5,Mid,Solar plant,0,0,160000,600000,0,0.0,0.07,25
6,Mid,Hydro plant,0,0,1400,1600000,0,0.0,0.07,50
7,South,Coal plant,0,0,100000,600000,0,0.6,0.07,40
8,South,Lignite plant,0,0,0,600000,0,0.6,0.07,40
9,South,Gas plant,0,0,100000,450000,0,1.6,0.07,30


## Normalizing

In [4]:
# Normalizing the df with melt (we keep the two columns 'Site' and 'Process'), we also add column unit, sort the values 
# and make sure no old indexes are used as columns.
# As a result we have the two kept columns 'Site' and 'Process' and all other columns are transposed into rows 
# labeled as variable and the respective values are found in value. The newly added column unit contains the respective units
norm = df.melt(['Site', 'Process']).assign(unit='').sort_values(['Site','Process']).reset_index(drop=True)

# assign values for unit
unit = {'inst-cap': 'MW', 'cap-lo': 'MW', 'cap-up': 'MW',
        'inv-cost': '€/MW', 'fix-cost': '€/MW/a', 'var-cost': '€/MWh',
        'wacc': None, 'depreciation': 'a'}

# include units in table
norm['unit'] = norm['variable'].map(unit)

# show normalized df
norm

Unnamed: 0,Site,Process,variable,value,unit
0,Mid,Biomass plant,inst-cap,0.00,MW
1,Mid,Biomass plant,cap-lo,0.00,MW
2,Mid,Biomass plant,cap-up,5000.00,MW
3,Mid,Biomass plant,inv-cost,875000.00,€/MW
4,Mid,Biomass plant,fix-cost,0.00,€/MW/a
5,Mid,Biomass plant,var-cost,1.40,€/MWh
6,Mid,Biomass plant,wacc,0.07,
7,Mid,Biomass plant,depreciation,25.00,a
8,Mid,Coal plant,inst-cap,0.00,MW
9,Mid,Coal plant,cap-lo,0.00,MW


## Denormalizing

In [5]:
# denormalizing the norm with pivot_table, transforms the table back into the original state (apart from column order)
denorm = norm.pivot_table(values='value', index=['Site', 'Process'], columns='variable').reset_index()

# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1) 

# show denormalized df
denorm

# note: the only difference between denorm and df is that the column names are in alphabetic order

Unnamed: 0,Site,Process,cap-lo,cap-up,depreciation,fix-cost,inst-cap,inv-cost,var-cost,wacc
0,Mid,Biomass plant,0.0,5000.0,25.0,0.0,0.0,875000.0,1.4,0.07
1,Mid,Coal plant,0.0,0.0,40.0,0.0,0.0,600000.0,0.6,0.07
2,Mid,Gas plant,0.0,80000.0,30.0,0.0,0.0,450000.0,1.6,0.07
3,Mid,Hydro plant,0.0,1400.0,50.0,0.0,0.0,1600000.0,0.0,0.07
4,Mid,Lignite plant,0.0,60000.0,40.0,0.0,0.0,600000.0,0.6,0.07
5,Mid,Solar plant,0.0,160000.0,25.0,0.0,0.0,600000.0,0.0,0.07
6,Mid,Wind plant,0.0,13000.0,25.0,0.0,0.0,1500000.0,0.0,0.07
7,North,Biomass plant,0.0,6000.0,25.0,0.0,0.0,875000.0,1.4,0.07
8,North,Coal plant,0.0,100000.0,40.0,0.0,0.0,600000.0,0.6,0.07
9,North,Gas plant,0.0,100000.0,30.0,0.0,0.0,450000.0,1.6,0.07


In [6]:
# In case of columns containing NaN values, pivot_table will omit these columns in the denormalized table.
# Hence, in this case another option is to use unstack():

# denormalizing the norm with unstack
denorm = norm.set_index(['Site', 'Process', 'variable'])['value'].unstack().reset_index()

# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1) 

# show denormalized df
denorm

Unnamed: 0,Site,Process,cap-lo,cap-up,depreciation,fix-cost,inst-cap,inv-cost,var-cost,wacc
0,Mid,Biomass plant,0.0,5000.0,25.0,0.0,0.0,875000.0,1.4,0.07
1,Mid,Coal plant,0.0,0.0,40.0,0.0,0.0,600000.0,0.6,0.07
2,Mid,Gas plant,0.0,80000.0,30.0,0.0,0.0,450000.0,1.6,0.07
3,Mid,Hydro plant,0.0,1400.0,50.0,0.0,0.0,1600000.0,0.0,0.07
4,Mid,Lignite plant,0.0,60000.0,40.0,0.0,0.0,600000.0,0.6,0.07
5,Mid,Solar plant,0.0,160000.0,25.0,0.0,0.0,600000.0,0.0,0.07
6,Mid,Wind plant,0.0,13000.0,25.0,0.0,0.0,1500000.0,0.0,0.07
7,North,Biomass plant,0.0,6000.0,25.0,0.0,0.0,875000.0,1.4,0.07
8,North,Coal plant,0.0,100000.0,40.0,0.0,0.0,600000.0,0.6,0.07
9,North,Gas plant,0.0,100000.0,30.0,0.0,0.0,450000.0,1.6,0.07


## Please check your final table, depending on your data it might not always work as expected.