# Showcasing Fuzzy with the Titanic dataset
This notebooks contains a Fuzzy decision tree fitted on the Titanic dataset, currently only using the categorical features.

Additional packages necessary to run this notebook:
 - Pandas

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import tree

from INNO.core import fuzzy

# Loading data
Titanic dataset. We're only using the columns "Pclass" and "Sex" as input, and "Survived" as output.
- Pclass is passenger class. This column contains the classes 1, 2 and 3.
- Sex is the gender listed for the passenger. This column contains the classes "male" and "female".
- Survived is if the passenger survived the disaster or not. It contains the classes 1 (Survived) and 0 (did not survive).

In [2]:
data = pd.read_csv(r"https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv", usecols=["Sex", "Pclass", "Survived", "Age"])

In [3]:
data.head()

Unnamed: 0,Survived,Pclass,Sex,Age
0,0,3,male,22.0
1,1,1,female,38.0
2,1,3,female,26.0
3,1,1,female,35.0
4,0,3,male,35.0


In [4]:
data.describe(include="all")

Unnamed: 0,Survived,Pclass,Sex,Age
count,887.0,887.0,887,887.0
unique,,,2,
top,,,male,
freq,,,573,
mean,0.385569,2.305524,,29.471443
std,0.487004,0.836662,,14.121908
min,0.0,1.0,,0.42
25%,0.0,2.0,,20.25
50%,0.0,3.0,,28.0
75%,1.0,3.0,,38.0


In [5]:
X = data.drop("Survived", axis=1).to_numpy()
y = data["Survived"].to_numpy()

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [7]:
X_train[:,1] = X_train[:,1] == "female"
X_test[:,1] = X_test[:,1] == "female"

In [8]:
X_train = X_train.astype(int)
X_test = X_test.astype(int)

# Comparing Performance
We will be comparing the performance of Fuzzy with SKlearn's CART.

#### Fuzzy's speed

In [9]:
%%timeit
predictor = fuzzy.FuzzyClassifier(np.array([2]))
predictor.fit(X_train, y_train, 0.3)
predictor.score(X_test, y_test)

165 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Fuzzy's accuracy

In [10]:
predictor = fuzzy.FuzzyClassifier(np.array([2]))
predictor.fit(X_train, y_train, 0.3)
predictor.score(X_test, y_test)

0.7477477477477478

#### CART's speed

In [11]:
%%timeit
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)
clf.score(X_test, y_test)

885 µs ± 24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


#### CART's accuracy

In [12]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.5765765765765766

## Test Prediction

In [13]:
clf.predict([[1, 1, 20], [2, 1, 20], [3, 1, 5]])  # 1 == female

array([1, 1, 1], dtype=int64)

In [14]:
predictor.predict(np.array([[1, 1, 20], [2, 1, 20], [3, 1, 5]], dtype=int))  # 1 == female

array([1, 1, 1], dtype=int64)

Second of all, it appears that adult men, no matter the passenger class, do not survive.

In [15]:
predictor.predict(np.array([[1, 0, 20], [2, 0, 10], [3, 0, 3]], dtype=int))  # 0 == female

array([0, 0, 0], dtype=int64)

In [16]:
clf.predict([[1, 0, 20], [2, 0, 20], [3, 0, 20]])  # 0 == male

array([0, 0, 0], dtype=int64)