# 04-Funktionen
DataFrames und Serien haben mächtige Funktionen. Die wichtigsten möchten wir uns im folgenden anschauen. Manchmal allerdings fehlen **Pandas** Funktionen. An diesen Stellen hilft meist **Numpy** weiter. Pandas ist auf Basis von Numpy geschrieben, weshalb alle Funktionen von **Numpy** gut mit Pandas DataFrames und Series zusammenarbeiten.

In [27]:
# Neben pandas importieren wir auch Numpy
import pandas as pd
import numpy as np

In [28]:
# Importieren des Datensatzes
df_pokemon = pd.read_csv('../src/pokemon.csv')
df_pokemon.head()

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


# WENN
Wenn ist eine beliebte Funktion in <font color='green'>**Excel**</font>. Auch mit Pandas und Numpy ist diese Funktion möglich, wir benutzen hierfür `np.where()`. Im folgenden Überprüfen wir eine Bedingung in einer Spalte und erzeugen in Abhängigkeit von dieser eine neue Spalte.

In [29]:
# Wie wir besonders starke Pokemon auswählen wissen wir bereits
df_pokemon[df_pokemon['HP'] > 100].head(10)

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
8,CharizardMega Charizard Y,Fire,Flying,634,101,104,78,159,115,100,1,False
44,Jigglypuff,Normal,Fairy,270,115,45,20,45,25,20,1,False
45,Wigglytuff,Normal,Fairy,435,140,70,45,85,50,45,1,False
96,Muk,Poison,,500,105,105,75,65,100,50,1,False
120,Rhydon,Ground,Rock,485,105,130,120,45,45,40,1,False
121,Chansey,Normal,,450,250,5,5,35,105,50,1,False
123,Kangaskhan,Normal,,490,105,95,80,40,80,90,1,False
124,KangaskhanMega Kangaskhan,Normal,,590,105,125,100,60,100,100,1,False
142,Lapras,Water,Ice,535,130,85,80,85,95,60,1,False
145,Vaporeon,Water,,525,130,65,60,110,95,65,1,False


In [30]:
# Nun möchten wir abhänig davon, ob ein Pokemon dazuzählt, dieses in einer neuen
# Spalte Description vermerken.
df_pokemon['Description'] = np.where(df_pokemon['HP'] > 100, 'Super Strong', '')
df_pokemon.sample(20)

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Description
798,HoopaHoopa Unbound,Psychic,Dark,680,80,160,60,170,130,80,6,True,
432,Turtwig,Grass,,318,55,68,64,45,55,31,4,False,
512,Weavile,Dark,Ice,510,70,120,65,45,85,125,4,False,
494,GarchompMega Garchomp,Dragon,Ground,700,108,170,115,120,95,92,4,False,Super Strong
179,Ledyba,Bug,Flying,265,40,20,30,40,80,55,2,False,
646,Deerling,Normal,Grass,335,60,60,50,40,50,75,5,False,
610,Basculin,Water,,460,70,92,65,80,55,98,5,False,
94,Dewgong,Water,Ice,475,90,70,80,70,95,70,1,False,
286,Zigzagoon,Normal,,240,38,30,41,30,41,60,3,False,
595,Tympole,Water,,294,50,50,40,50,40,64,5,False,


In [31]:
# np.where funktioniert also genau gleich wie das Filtern. Wir erzeugen eine Serie mit False/True werten
df_pokemon['Description2'] = np.where((df_pokemon['HP'] > 100) & (df_pokemon['Attack'] > 100), 'Super Strong', '')
df_pokemon.sample(20)

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Description,Description2
203,Skiploom,Grass,Flying,340,55,45,50,45,65,80,2,False,,
404,Relicanth,Water,Rock,485,100,90,130,45,65,55,3,False,,
237,Magcargo,Fire,Rock,410,50,50,120,80,80,30,2,False,,
669,Lampent,Ghost,Fire,370,60,40,60,95,60,55,5,False,,
569,Liepard,Dark,,446,64,88,50,88,50,106,5,False,,
462,Combee,Bug,Flying,244,30,30,42,30,42,70,4,False,,
688,Rufflet,Normal,Flying,350,70,83,50,37,50,60,5,False,,
63,Growlithe,Fire,,350,55,70,45,70,50,60,1,False,,
137,PinsirMega Pinsir,Bug,Flying,600,65,155,120,65,90,105,1,False,,
103,Onix,Rock,Ground,385,35,45,160,30,45,70,1,False,,


# Mathematische Funktionen (Zusammenfassend)
Mathematische Funktionen bzw. statistische Werte sind essentiell für die Arbeit mit DataFrames. Die Folgenden Funktionen lassen sich für Series spielend leicht berechen.

* Anzahl
* Summe
* Mittelwert
* Median
* Min
* Max

In [32]:
len(df_pokemon)

800

In [33]:
df_pokemon['Total'].sum()

348082

In [34]:
df_pokemon['Total'].sum() / len(df_pokemon)

435.1025

In [35]:
df_pokemon['Total'].mean()

435.1025

In [36]:
df_pokemon['Total'].median()

450.0

In [37]:
df_pokemon['Total'].min()

180

In [38]:
df_pokemon['Total'].max()

780

## Zusammen aller Spalten `.describe()`

In [39]:
# Zusammenfassen aller Spalten -> Ergebnis ist eine Serie
df_pokemon.max()

Name            Zygarde50% Forme
Type 1                     Water
Total                        780
HP                           255
Attack                       190
Defense                      230
Sp. Atk                      194
Sp. Def                      230
Speed                        180
Generation                     6
Legendary                   True
Description         Super Strong
Description2        Super Strong
dtype: object

In [40]:
# Oder alle statistischen Werte für alle Spalten auf einmal
df_pokemon.describe()

Unnamed: 0,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,435.1025,69.2875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,119.96304,25.557461,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,180.0,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,330.0,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,450.0,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,515.0,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,780.0,255.0,190.0,230.0,194.0,230.0,180.0,6.0


# Mathematische Funktionen in Kombination mit Filtern

In [41]:
# Summe vom Wert "Total", wenn vom Typ Grass
df_pokemon[df_pokemon['Type 1'] == 'Grass'].head()

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Description,Description2
0,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,,
1,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,,
2,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,,
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,,
48,Oddish,Grass,Poison,320,45,50,55,75,65,30,1,False,,


In [42]:
df_pokemon[df_pokemon['Type 1'] == 'Grass'].sum()

Name            BulbasaurIvysaurVenusaurVenusaurMega VenusaurO...
Type 1          GrassGrassGrassGrassGrassGrassGrassGrassGrassG...
Total                                                       29480
HP                                                           4709
Attack                                                       5125
Defense                                                      4956
Sp. Atk                                                      5425
Sp. Def                                                      4930
Speed                                                        4335
Generation                                                    235
Legendary                                                       3
Description                              Super StrongSuper Strong
Description2                                                     
dtype: object

In [43]:
df_pokemon[df_pokemon['Type 1'] == 'Grass']['Total'].sum()

29480

In [44]:
df_pokemon[(df_pokemon['Type 1'] == 'Grass') & (df_pokemon['Type 2'] == 'Poison')]

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Description,Description2
0,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,,
1,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,,
2,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,,
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,,
48,Oddish,Grass,Poison,320,45,50,55,75,65,30,1,False,,
49,Gloom,Grass,Poison,395,60,65,70,85,75,40,1,False,,
50,Vileplume,Grass,Poison,490,75,80,85,110,90,50,1,False,,
75,Bellsprout,Grass,Poison,300,50,75,35,70,30,40,1,False,,
76,Weepinbell,Grass,Poison,390,65,90,50,85,45,55,1,False,,
77,Victreebel,Grass,Poison,490,80,105,65,100,70,70,1,False,,


In [45]:
df_pokemon[(df_pokemon['Type 1'] == 'Grass') & (df_pokemon['Type 2'] == 'Poison')]['Total'].sum()

6211

# Kombinieren von Spaltenwerten

In [46]:
# Wir können ganz einfach zwei Spaltenwerte addieren und erhalten eine neue Series
df_pokemon['HP'] + df_pokemon['Attack']

0       94
1      122
2      162
3      180
4       91
5      122
6      162
7      208
8      205
9       92
10     122
11     162
12     182
13      75
14      70
15     105
16      75
17      70
18     155
19     215
20      85
21     123
22     163
23     163
24      86
25     136
26     100
27     155
28      95
29     145
      ... 
770    160
771    170
772    125
773    100
774     95
775    143
776    190
777    137
778    113
779    195
780    115
781    110
782    120
783    125
784    155
785    140
786    170
787    185
788    124
789    212
790     70
791    155
792    257
793    257
794    208
795    150
796    210
797    190
798    240
799    190
Length: 800, dtype: int64

In [47]:
df_pokemon['Total 2'] = df_pokemon['HP'] + df_pokemon['Attack']
df_pokemon.sample(10)

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Description,Description2,Total 2
506,Finneon,Water,,330,49,49,56,49,61,66,4,False,,,98
175,Sentret,Normal,,215,35,46,34,35,45,20,2,False,,,81
77,Victreebel,Grass,Poison,490,80,105,65,100,70,70,1,False,,,185
714,KeldeoResolute Forme,Water,Fighting,580,91,72,90,129,90,108,5,False,,,163
156,Articuno,Ice,Flying,580,90,85,100,95,125,85,1,True,,,175
548,Manaphy,Water,,600,100,100,100,100,100,100,4,False,,,200
59,Psyduck,Water,,320,50,52,48,65,50,55,1,False,,,102
446,Kricketot,Bug,,194,37,25,41,25,41,25,4,False,,,62
481,Chingling,Psychic,,285,45,30,50,65,50,45,4,False,,,75
527,GalladeMega Gallade,Psychic,Fighting,618,68,165,95,65,115,110,4,False,,,233


# Runden
Die `.round()` Funktion entspricht der Funktion `=Runden()` in <font color='green'>**Excel**</font>. 

In [48]:
(df_pokemon['Total 2']/df_pokemon['Total']).head(10)

0    0.295597
1    0.301235
2    0.308571
3    0.288000
4    0.294498
5    0.301235
6    0.303371
7    0.328076
8    0.323344
9    0.292994
dtype: float64

In [49]:
# Wir schreiben das obige Beispiel um und speichern die Ergebnisse in einer neuen Spalte
df_pokemon['Anteil T2'] = df_pokemon['Total 2']/df_pokemon['Total']
df_pokemon['Anteil T2'].head(10)

0    0.295597
1    0.301235
2    0.308571
3    0.288000
4    0.294498
5    0.301235
6    0.303371
7    0.328076
8    0.323344
9    0.292994
Name: Anteil T2, dtype: float64

In [50]:
# Alles in einem un mit Runden des Ergebnisses auf 2 Nachkommastellen
df_pokemon['Anteil T2'] = df_pokemon['Total 2']/df_pokemon['Total']
df_pokemon['Anteil T2'] = df_pokemon['Anteil T2'].round(2)
df_pokemon.head(10)

Unnamed: 0,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Description,Description2,Total 2,Anteil T2
0,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,,,94,0.3
1,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,,,122,0.3
2,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,,,162,0.31
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,,,180,0.29
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,,,91,0.29
5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False,,,122,0.3
6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False,,,162,0.3
7,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False,,,208,0.33
8,CharizardMega Charizard Y,Fire,Flying,634,101,104,78,159,115,100,1,False,Super Strong,Super Strong,205,0.32
9,Squirtle,Water,,314,44,48,65,50,64,43,1,False,,,92,0.29


# Tipps und Tricks
### Übersicht zu Serien:

* ...aller möglichen einfachen Datentypen einer Serie: https://docs.scipy.org/doc/numpy/user/basics.types.html
* ...aller Funktionen einer Serie https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html