# Pandas

Pandas is perfect for manupalting tabular data coming from Excel, CSV, HTML, SQL ... . A Dataframe contains rows and columns.

<div>
<img src="attachment:grafik.png" width="500"/>
</div>
Typically when working with experimental data the following tasks have to be conducted:

- Reading and writing data
- Access data
- Group and manipulate data
- Statistics
- Graphical representation capabilities (plotting)

For these tasks we will use Pandas!

Let's have a look at the Pandas documentation first! Pandas has become an essential part of the python ecosystem and is therefore referenced in the help menu. Go - To Help - Pandas reference to show Pandas documentation

We start with importing Pandas to our Jupyter notebook: (Pandas is preinstalled in Anaconda, we can use it right away)

In [1]:
import pandas as pd

## 1. Reading Data

We will read some dummy Data. Make sure the data.xlsx file is in the same folder as the jupyter notebook (otherwise add the path).

Lets have a look at the documentation! -> Reading Excel files

In [2]:
# We assume here that this a file containing experimentally measured data.
df=pd.read_excel("Data.xlsx")

There are some options to show the Data!

In [3]:
df

Unnamed: 0,Experiment,Temperature [°C],Resistance [Ohm]
0,,,
1,1.0,-50.0,0.0
2,2.0,-50.0,0.0
3,3.0,0.0,0.06
4,4.0,0.0,0.05
5,5.0,5.0,0.039
6,6.0,5.0,0.038
7,7.0,10.0,0.031
8,8.0,10.0,0.03
9,9.0,15.0,0.0185


In [5]:
# First inspection
df.head()
df.tail()

Unnamed: 0,Experiment,Temperature [°C],Resistance [Ohm]
20,20.0,40.0,0.006
21,21.0,45.0,0.006
22,22.0,45.0,0.006
23,23.0,50.0,0.0055
24,24.0,50.0,0.0054


## 2. Access the Data in Pandas

| 2.1 Selection by label | 2.2 Selection by position   |
|-------|-------|
|   .loc  | .iloc |

2.1 Selection by label
Single Label: 

In [6]:
df.loc[5]

Experiment          5.000
Temperature [°C]    5.000
Resistance [Ohm]    0.039
Name: 5, dtype: float64

In [7]:
df.loc[5:7]

Unnamed: 0,Experiment,Temperature [°C],Resistance [Ohm]
5,5.0,5.0,0.039
6,6.0,5.0,0.038
7,7.0,10.0,0.031


In [8]:
df.loc[5,'Experiment']

5.0

2.2 Selection by Position

In [11]:
#Here we will use integers to access the Data
df.iloc[5]

Experiment          5.000
Temperature [°C]    5.000
Resistance [Ohm]    0.039
Name: 5, dtype: float64

In [14]:
#Here we use an Array to find the position
df.iloc[5,0]

5.0

Let's go to the documentation to find more information!

# 3. Manipulate Data

Ok now we selected the Data we need! How can we change it?

This can be easily done by using an "=" after selecting the Data

In [16]:
df.iloc[5,0]=6

In [17]:
df

Unnamed: 0,Experiment,Temperature [°C],Resistance [Ohm]
0,,,
1,1.0,-50.0,0.0
2,2.0,-50.0,0.0
3,3.0,0.0,0.06
4,4.0,0.0,0.05
5,6.0,5.0,0.039
6,6.0,5.0,0.038
7,7.0,10.0,0.031
8,8.0,10.0,0.03
9,9.0,15.0,0.0185


You can also Add new Data to the Dataframe! 
How to add new columns!
Example: Let's add the Temperature in Kelvin

In [19]:
df['Temperature [K]']= df['Temperature [°C]']+273.15
df

Unnamed: 0,Experiment,Temperature [°C],Resistance [Ohm],Temperature [K]
0,,,,
1,1.0,-50.0,0.0,223.15
2,2.0,-50.0,0.0,223.15
3,3.0,0.0,0.06,273.15
4,4.0,0.0,0.05,273.15
5,6.0,5.0,0.039,278.15
6,6.0,5.0,0.038,278.15
7,7.0,10.0,0.031,283.15
8,8.0,10.0,0.03,283.15
9,9.0,15.0,0.0185,288.15


4. Group Data and Manipulate it


   - Splitting the data into groups based on some criteria.
    
   - Applying a function to each group independently. (Aggregation| Filtering| Transformation)

   - Combining the results into a data structure.



In [26]:
#Grouping Data
grouped=df.groupby('Temperature [°C]')

In [25]:
grouped

Unnamed: 0_level_0,Experiment,Resistance [Ohm],Temperature [K]
Temperature [°C],Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-50.0,1.5,0.0,223.15
0.0,3.5,0.055,273.15
5.0,6.0,0.0385,278.15
10.0,7.5,0.0305,283.15
15.0,9.5,0.01845,288.15
20.0,15.5,0.0125,293.15
25.0,13.5,0.0085,298.15
30.0,11.5,0.00775,303.15
35.0,17.5,0.0065,308.15
40.0,19.5,0.00625,313.15


In [27]:
grouped.get_group(40.0)

Unnamed: 0,Experiment,Temperature [°C],Resistance [Ohm],Temperature [K]
19,19.0,40.0,0.0065,313.15
20,20.0,40.0,0.006,313.15
