## Applying the PPS to the Titanic dataset
- This script shows you how to apply the PPS to the Titanic dataset
- If you want to execute the script yourself, you need to have valid installations of the packages ppscore, seaborn and pandas.

In [None]:
import pandas as pd
import seaborn as sns

import ppscore as pps

In [None]:
def heatmap(df):
    ax = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", linewidths=0.5, annot=True)
    ax.set_title('PPS matrix')
    ax.set_xlabel('feature')
    ax.set_ylabel('target')
    return ax

In [None]:
def corr_heatmap(df):
    ax = sns.heatmap(df, vmin=-1, vmax=1, cmap="BrBG", linewidths=0.5, annot=True)
    ax.set_title('Correlation matrix')
    return ax

In [None]:
df = pd.read_csv("titanic.csv")

## Preparation of the Titanic dataset
- Selecting a subset of columns
- Changing some data types
- Renaming the column names to be more clear

In [None]:
df = df[["Survived", "Pclass", "Sex", "Age", "Ticket", "Fare", "Embarked"]]
df = df.rename(columns={"Pclass": "Class"})
df = df.rename(columns={"Ticket": "TicketID"})
df = df.rename(columns={"Fare": "TicketPrice"})
df = df.rename(columns={"Embarked": "Port"})

## Single Predictive Power Score
- Answering the question: how well can Sex predict the Survival probability?

In [None]:
pps.score(df, "Sex", "Survived")

## PPS matrix
- Answering the question: which predictive patterns exist between the columns?

In [None]:
matrix = pps.matrix(df)

In [None]:
matrix

In [None]:
heatmap(matrix)

## Correlation matrix
- As a comparison to the PPS matrix

In [None]:
corr_heatmap(df.corr())