# Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the python programming language. It can be used to create DataFrames with rows and columns and contains many useful import and export functions.

In [1]:
import pandas as pd

In [40]:
# Create empty Dataframe
df = pd.DataFrame()

print(df)

Empty DataFrame
Columns: []
Index: []


In [49]:
# add values to the empty dataframe
df["A"] = [2, 1, 3]
df["B"] = [4, 5, 6]

df

Unnamed: 0,A,B
0,2,4
1,1,5
2,3,6


In the example dataframe above, you see:
- Index, which if undefined automatically starts at 0 and increases when adding rows.
- Column headers
- Values

You can extract these values from the dataframe

In [46]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [47]:
df.columns

Index(['A', 'B'], dtype='object')

In [48]:
df.values

array([[2, 4],
       [1, 5],
       [3, 6]], dtype=int64)

## Selecting data
You can use loc or iloc to select a row or value

In [74]:
# First we rename the index such that it is not sorted
df.index = [35, 17, 25]
df

Unnamed: 0,A,B
35,2,4
17,1,3
25,3,6


In [75]:
# Select the first row
df.iloc[0]

A    2
B    4
Name: 35, dtype: int64

In [76]:
# Select the last row
df.iloc[-1]

A    3
B    6
Name: 25, dtype: int64

In [77]:
# Select rows with index 35 and 25
df.loc[[35, 25]]

Unnamed: 0,A,B
35,2,4
25,3,6


In [78]:
# select index 17, column B
df.loc[17, "B"]

3

## Importing data from CSV
Importing data from a CSV file is extremely easy with `pd.read_csv`. You can read all the possibilities in the documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

In [85]:
hours = pd.read_csv("hours.csv")

hours

Unnamed: 0,Name,Week 10,Week 11,Week 12,Week 13
0,Bas,40,25,38,21
1,Stijn,32,24,40,28
2,Najib,39,40,37,25
3,Anneli,25,29,34,40
4,Kimberley,40,40,20,27
5,Robin,37,16,40,38
6,Barend,37,36,36,29
7,Itty,26,28,40,39


### Assignment 1
Now we see in the above dataframe that the default index is used, but we want the Name column as index. There are two ways to make this happen:
1. use set_index (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html)
2. add kwargs (key word arguments) to read_csv (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

Try both ways below

In [86]:
# use set_index
hours = hours.set_index(...)

hours

In [87]:
# use read_csv with kwargs
hours = pd.read_csv(...)

hours