### 📖Content

* [Condtional selections](#chapter1)
    * [Multi-conditional selections](#section_1_1)
* [Get information aboout a DataFrame](#chapter2)
* [Groupby](#chapter3)
* [Apply functions to DataFrames](#chapter4)
* [Sort Data](#chapter5)
* [Import / Export data](#chapter6)

In [None]:
import pandas as pd
import numpy as np

In [None]:
#Create a DataFrame
random_arr = np.random.randn(5,4)
index = ["A", "B", "C", "D", "E"]
columns = ["W", "X","Y", "Z"]
df = pd.DataFrame(data = random_arr, index = index, columns = columns)
df

Unnamed: 0,W,X,Y,Z
A,-1.421026,0.914904,-1.196348,-0.658324
B,-0.142865,0.535455,-1.337502,-0.249793
C,0.356914,1.428533,-0.950927,-1.924251
D,0.128768,-0.088755,-1.031069,2.604485
E,-0.907819,-2.150992,0.025807,1.523691


***
***

### 📖Condtional selections <a class="anchor" id="chapter1"></a>
Just like in NumPy, we can use the standard operants to check whether a condition is True or False

In [None]:
# >, >=, <, <=, !=, ==
df > 0

Unnamed: 0,W,X,Y,Z
A,False,True,False,False
B,False,True,False,False
C,True,True,False,False
D,True,False,False,True
E,False,False,True,True


We can then use it as a filter for the DataFrame

In [None]:
df[df>0]

Unnamed: 0,W,X,Y,Z
A,,0.914904,,
B,,0.535455,,
C,0.356914,1.428533,,
D,0.128768,,,2.604485
E,,,0.025807,1.523691


Filter the whole DataFrame for a condition in one column

In [None]:
#Only keep values, if the item in column W is > 0
df[df['W']>0]

Unnamed: 0,W,X,Y,Z
C,0.356914,1.428533,-0.950927,-1.924251
D,0.128768,-0.088755,-1.031069,2.604485


In [None]:
#count how many times a condition is True
sum(df['W']>0)

2

### 📖Multi-conditional selections <a class="anchor" id="section_1_1"></a>
We can combine several conditions. For DataFrames we have to use '&' for the logical combination 'and' and '|' for the logical operator 'or'

In [None]:
condition1 = df['X'] < 0
condition2 = df['Z'] > 0
df[ (condition1) | (condition2) ]

Unnamed: 0,W,X,Y,Z
D,0.128768,-0.088755,-1.031069,2.604485
E,-0.907819,-2.150992,0.025807,1.523691


***
***
### 📖Get information aboout a DataFrame <a class="anchor" id="chapter2"></a>

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, A to E
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   W       5 non-null      float64
 1   X       5 non-null      float64
 2   Y       5 non-null      float64
 3   Z       5 non-null      float64
dtypes: float64(4)
memory usage: 200.0+ bytes


In [None]:
df.dtypes

W    float64
X    float64
Y    float64
Z    float64
dtype: object

In [None]:
df.describe()

Unnamed: 0,W,X,Y,Z
count,5.0,5.0,5.0,5.0
mean,0.214121,-0.0724,0.412636,-0.004536
std,0.490851,0.780763,0.680274,0.640666
min,-0.437695,-1.011218,-0.588768,-0.978522
25%,0.070558,-0.436185,0.115568,-0.232002
50%,0.132031,-0.25411,0.612295,0.07775
75%,0.401312,0.288945,0.720721,0.49028
max,0.904401,1.050568,1.203365,0.619811


In [None]:
df.columns

Index(['W', 'X', 'Y', 'Z'], dtype='object')

In [None]:
df.index

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

***
***
### 📖Groupby <a class="anchor" id="chapter3"></a>
![Pandas_Groupby](Pandas_Groupby.png)

In [None]:
#Create a new Dataframe
mydata = {
    'Landuse': ['Urban', 'Pasture', 'Forest', 'Forest', 'Urban', 'Pasture'],
    'Area': [1.3, 0.5, 2.3, 4.3, 2, 1.1]
}

df = pd.DataFrame(mydata)
df

Unnamed: 0,Landuse,Area
0,Urban,1.3
1,Pasture,0.5
2,Forest,2.3
3,Forest,4.3
4,Urban,2.0
5,Pasture,1.1


In [None]:
df.groupby('Landuse').sum() #could also be .mean() etc.


Unnamed: 0_level_0,Area
Landuse,Unnamed: 1_level_1
Forest,6.6
Pasture,1.6
Urban,3.3


In [None]:
df.groupby('Landuse').describe()

Unnamed: 0_level_0,Area,Area,Area,Area,Area,Area,Area,Area
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
Landuse,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Forest,2.0,3.3,1.414214,2.3,2.8,3.3,3.8,4.3
Pasture,2.0,0.8,0.424264,0.5,0.65,0.8,0.95,1.1
Urban,2.0,1.65,0.494975,1.3,1.475,1.65,1.825,2.0


***
***
### 📖Apply functions to DataFrames <a class="anchor" id="chapter4"></a>

In [None]:
mydata = {
    'City': ['Bochum', 'Essen', 'Dortmund', 'Duesseldorf', 'Oberhausen'],
    'Population': [364628, 583109, 587010, 619294, 210829],
    'Area_sqm': [145400000, 210300000, 280700000, 217400000, 77040000]
}
df_pop = pd.DataFrame(mydata)
df_pop

Unnamed: 0,City,Population,Area_sqm
0,Bochum,364628,145400000
1,Essen,583109,210300000
2,Dortmund,587010,280700000
3,Duesseldorf,619294,217400000
4,Oberhausen,210829,77040000


In [None]:
def sqm_to_sqkm (value):
    return value/1000000

In [None]:
df_pop['Area_sqkm'] = df_pop['Area_sqm'].apply(sqm_to_sqkm)

df_pop

Unnamed: 0,City,Population,Area_sqm,Area_sqkm
0,Bochum,364628,145400000,145.4
1,Essen,583109,210300000,210.3
2,Dortmund,587010,280700000,280.7
3,Duesseldorf,619294,217400000,217.4
4,Oberhausen,210829,77040000,77.04


***
***
### 📖Sort Data <a class="anchor" id="chapter5"></a>

In [None]:
df_pop.sort_values('Population', ascending=True)

Unnamed: 0,City,Population,Area_sqm,Area_sqkm
4,Oberhausen,210829,77040000,77.04
0,Bochum,364628,145400000,145.4
1,Essen,583109,210300000,210.3
2,Dortmund,587010,280700000,280.7
3,Duesseldorf,619294,217400000,217.4


***
***
### 📖Import / Export data <a class="anchor" id="chapter6"></a>
[Source](https://subscription.packtpub.com/book/big-data-and-business-intelligence/9781789959413/1/ch01lvl1sec04/pandas)

[More Reader & Writer](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)

To open files, we need to know their location on our hard disc. We can provide the path to the file either as an absolute or relative path. An absolute path starts with the name of the hard disc itself and continues with the folder chain until the file is reached (e.g. D:/myseminars/KI/Session_2/Data/MyCSV.csv). A relative path provides information about the location of a file based on the location of the Python files working directory. In our case this is equal to the location of the Python file we are coding in. We can specify a relative path with './', which stands for the folder we are working in. We can also move one folder up with '../'.

1. Example for an absolute path to our CSV file: "D:/MyAI_Folder/AI_Seminar/01_Python_Basics/02_Python_Pandas/Data/Airmeasurements.csv"
2. Example for a relative path: "./Data/Airmeasurements.csv"


In [None]:
pd.read_csv('./Data/Airmeasurements.csv')

Unnamed: 0,MEASURE POINT;JEAR;MONTH;TIME_FROM;TIME_TO;MEASURED VALUE
0,1;2008;January;20080103;20080201;37
1,1;2008;February;20080201;20080229;55
2,1;2008;March;20080229;20080401;44
3,1;2008;April;20080401;20080430;53
4,1;2008;May;20080430;20080530;55
...,...
4401,1031;2022;February;20220129;20220302;18
4402,1031;2022;March;20220302;20220331;24
4403,1031;2022;April;20220331;20220502;20
4404,1031;2022;May;20220502;20220601;16


As you can see, pandas loaded the csv. However, it was not loaded correctly. We need to specify a seperator.

In [None]:
df = pd.read_csv('./Data/Airmeasurements.csv', sep= ';')
df

Unnamed: 0,MEASURE POINT,JEAR,MONTH,TIME_FROM,TIME_TO,MEASURED VALUE
0,1,2008,January,20080103,20080201,37
1,1,2008,February,20080201,20080229,55
2,1,2008,March,20080229,20080401,44
3,1,2008,April,20080401,20080430,53
4,1,2008,May,20080430,20080530,55
...,...,...,...,...,...,...
4401,1031,2022,February,20220129,20220302,18
4402,1031,2022,March,20220302,20220331,24
4403,1031,2022,April,20220331,20220502,20
4404,1031,2022,May,20220502,20220601,16


Lets filter the data for the measuring point 1 and save it to a new file

In [None]:
new_df = df[df['MEASURE POINT'] == 7]
new_df

Unnamed: 0,MEASURE POINT,JEAR,MONTH,TIME_FROM,TIME_TO,MEASURED VALUE
756,7,2008,January,20080103,20080201,43
757,7,2008,February,20080201,20080229,58
758,7,2008,March,20080229,20080401,46
759,7,2008,April,20080401,20080430,58
760,7,2008,May,20080430,20080530,58
...,...,...,...,...,...,...
925,7,2022,February,20220129,20220302,29
926,7,2022,March,20220302,20220331,36
927,7,2022,April,20220331,20220502,28
928,7,2022,May,20220502,20220601,26


In [None]:
new_df.to_csv('./Data/Airmeasurements_Station7.csv', sep = ';', index=False) #You can set index=True if you need the index