## 1. Pandas
This tutorial is assuming that you have some basic programming skills. If you are more familiar with R than with Python, don't worry, it's not that different. First, we will have a look at the pandas package, which introduces the data frame (similar to R) as a fundamental data structure. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")

df

  from pandas.core import (


Unnamed: 0.1,Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Name
0,0,5.1,3.5,1.4,0.2,Iris-setosa
1,1,4.9,3.0,1.4,0.2,Iris-setosa
2,2,4.7,3.2,1.3,0.2,Iris-setosa
3,3,4.6,3.1,1.5,0.2,Iris-setosa
4,4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,145,6.7,3.0,5.2,2.3,Iris-virginica
146,146,6.3,2.5,5.0,1.9,Iris-virginica
147,147,6.5,3.0,5.2,2.0,Iris-virginica
148,148,6.2,3.4,5.4,2.3,Iris-virginica


### Selecting rows / columns
Using pandas, we can easily select rows or columns in a data frame. One option is to select via name of the column:

In [None]:
accx = df.loc[:,"SepalLength"]
accx

Here, `:` means "choose all rows". When only a single column is selected, the result is of type `Series`, otherwise it is a data frame again. We can also access columns via their index:

In [None]:
acc = df.iloc[:,1:4]
acc

We can access rows in the same way. The following expression gives us a data frame consisting of the first five rows:

In [None]:
df.iloc[0:5,:]

These can both be combined like this:

In [None]:
df.loc[0:5,["SepalLength", "SepalWidth"]]

Another option is to access rows or columns via boolean expressions. For example, the following line returns a data frame that only contains rows whose `SepalLength` is smaller than 5:

In [None]:
df.loc[df.SepalLength < 5,:]

### Adding new values
Writing new values in a data frame is also easy. The following expression replaces all values of the column `SepalLength` that are smaller than 5 to the value 0.


In [None]:
df.loc[df.SepalLength < 0,"SepalLength"] = 0
df

## 2. Decision Trees in Python 
The following example trains a decision tree on the Iris data, using 2/3 as training data and 1/3 as test data.

In [2]:
import sklearn

X = df.iloc[:,1:5]
y = df.iloc[:,5]

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33)

In [4]:
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)

In [5]:
pred = clf.predict(X_test)
print(np.mean(pred==y_test))

0.92


## 3. Task: Train your own model
Next, you can try to train a model yourself: we want you to use the Titanic passenger data (name, age, price of ticket, etc) to try to predict who will survive and who will die.

In [7]:
titanic = pd.read_csv("titanic.csv")

## Explain the model
Compute SHAP values of the model and plot them as a beeswarm plot.