<img src="https://dsiag.ch/images/dsi_rgb.png" alt="dsi logo" width="100" style="position: absolute; right: 0px;"/>

# Electricity production plants - Descriptive Statistics





Data origins from https://opendata.swiss/en/dataset/elektrizitatsproduktionsanlagen and we use the following .csv files.

- ElectricityProductionPlant.csv 
- MainCategoryCatalogue.csv
- SubCategoryCatalogue.csv
- PlantCategoryCatalogue.csv


![Data Overview](http://yuml.me/64ed9f84.png)

### Loading data

We use `pd.read_csv` to read the csv files into a `DataFrame`. 

After reading we set the index to the corresponding column which makes it easier to join tables.

In [3]:
import pandas as pd

In [4]:
epp = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/ElectricityProductionPlant.csv').set_index('xtf_id')
mainCat = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/MainCategoryCatalogue.csv').set_index('Catalogue_id')
subCat = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/SubCategoryCatalogue.csv').set_index('Catalogue_id')
plantCat = pd.read_csv('../data/ch.bfe.elektrizitaetsproduktionsanlagen/PlantCategoryCatalogue.csv').set_index('Catalogue_id')


[display(d.head()) for d in [epp, mainCat, subCat,  plantCat]]

Unnamed: 0_level_0,Address,PostCode,Municipality,Canton,BeginningOfOperation,InitialPower,TotalPower,MainCategory,SubCategory,PlantCategory,_x,_y
xtf_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
5646,Rue des Creusets 41,1948,Fionnay,VS,1958-03-07,1872000.0,1872000.0,maincat_1,subcat_1,plantcat_6,2589880.0,1097661.0
5686,Binenweg 5,3904,Naters,VS,1969-09-01,349576.0,349576.0,maincat_1,subcat_1,plantcat_7,2644115.0,1131390.0
5726,Robbia 504G,7741,San Carlo,GR,1910-11-03,29150.0,29150.0,maincat_1,subcat_1,plantcat_2,2801863.0,1136379.0
5727,Via Principale 16,7744,Campocologno,GR,1907-03-01,55000.0,55000.0,maincat_1,subcat_1,plantcat_7,2808646.0,1123676.0
5730,Büdemli 65B,7240,Küblis,GR,1922-01-01,44200.0,44200.0,maincat_1,subcat_1,plantcat_7,2778481.0,1198505.0


Unnamed: 0_level_0,de,fr,it,en
Catalogue_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
maincat_1,Wasserkraft,Énergie hydraulique,Forza idrica,Hydroelectric power
maincat_2,Übrige erneuerbare Energien,Autres énergies renouvelables,Altre energie rinnovabili,Other renewable energies
maincat_3,Kernenergie,Énergie nucléaire,Energia nucleare,Nuclear energy
maincat_4,Fossile Energieträger,Agents énergétiques fossiles,Vettori energetici fossili,Fossil fuel


Unnamed: 0_level_0,de,fr,it,en
Catalogue_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
subcat_1,Wasserkraft,Énergie hydraulique,Forza idrica,Hydroelectric power
subcat_2,Photovoltaik,Photovoltaïque,Energia fotovoltaica,Photovoltaic
subcat_3,Windenergie,Énergie éolienne,Energia eolica,Wind energy
subcat_4,Biomasse,Biomasse,Biomassa,Biomass
subcat_5,Geothermie,Géothermie,Geotermia,Geothermal energy


Unnamed: 0_level_0,de,fr,it,en
Catalogue_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
plantcat_1,Abwasserkraftwerk,Centrale sur les eaux usées,Centrale ad acqua di scarico,Wastewater power plant
plantcat_2,Ausleitkraftwerk,Centrale de dérivation,Centrale a derivazione,Diversion power plant
plantcat_3,Dotierwasserkraftwerk,Centrale de dotation,Centrale ad acqua di dotazione,Weir plant
plantcat_4,Durchlaufkraftwerk,Centrale au fil de l’eau,Centrale ad acque di deflusso,Continuous power plant
plantcat_5,Trinkwasserkraftwerk,Centrale sur l’eau potable,Centrale ad acqua potabile,Drinking water power plant


[None, None, None, None]

***

<div class="alert alert-block alert-success">
<b>Exercise: Mode, mean and median</b> 

1. Mode: Which municipality has the most electricity power plants? Select all power plants with this municipality.
2. Mean: What is the mean total power of all power plants in Switzerland?
3. Median: What is the median total power of all power plants in Switzerland? Can you explain why it is so different from the mean?
4. Optional: Calculate the mean and median total power for each main category. Where is the largest difference
    

</div>

***


### Merge Sub Tables

To simplify the understanding of the categories we join the names of the categories into the main power plant table

In [5]:
lang='de'

epp = epp.merge(mainCat[lang].rename("MainCategoryName"),how='left', left_on='MainCategory', right_index=True)
epp = epp.merge(subCat[lang].rename("SubCategoryName"),how='left', left_on='SubCategory', right_index=True)
epp = epp.merge(plantCat[lang].rename("PlantCategoryName"),how='left', left_on='PlantCategory', right_index=True)


***

<div class="alert alert-block alert-success">
<b>Exercise: Count absolute frequencies for categories</b> 

1. Count the absolute frequencies for main categories. Which category has the most power plants?
2. Count the absolute frequencies for main and sub categories. Which category/sub-category has the most power plants?
3. (Optional) Can you merge the frequencies with the mean values of the previous optional exercise for each category/sub-category in one table?

**Notes**: 
- *use [.groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) to group by one ore multiple categories* 
- *use [.count](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.count.html) to count the frequencies* 
- *use [pd.merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) to join two Series into a DataFrame*
</div>

***


In [6]:
a = epp.groupby('MainCategory')['TotalPower'].count()

In [7]:
b = epp.groupby('MainCategory')['TotalPower'].mean()

In [8]:
pd.merge(a, b, left_index=True, right_index=True)

Unnamed: 0_level_0,TotalPower_x,TotalPower_y
MainCategory,Unnamed: 1_level_1,Unnamed: 2_level_1
maincat_1,1453,10737.59318
maincat_2,111797,29.95186
maincat_3,4,753650.0
maincat_4,193,1475.295337
