# Titanic using the ID3 decision tree classifier
The **ID3** (Iterative Dichotomiser 3) algorithm is one of the most venerable [decision tree](https://en.wikipedia.org/wiki/Decision_tree_learning) classifiers that there is, published nearly 35 years ago by [Ross Quinlan](https://en.wikipedia.org/wiki/Ross_Quinlan). Here I shall use the `decision-tree-id3` implementation, written by [Daniel Pettersson](https://github.com/svaante).

In [1]:
!pip install decision-tree-id3
import pandas  as pd

# The following is a workaround for: ImportError: cannot import name 'six' from 'sklearn.externals' 
import six
import sys
sys.modules['sklearn.externals.six'] = six

Collecting decision-tree-id3
  Downloading decision-tree-id3-0.1.2.tar.gz (12 kB)
Building wheels for collected packages: decision-tree-id3
  Building wheel for decision-tree-id3 (setup.py) ... [?25l- \ done
[?25h  Created wheel for decision-tree-id3: filename=decision_tree_id3-0.1.2-py3-none-any.whl size=15961 sha256=d58893a6131cac5a694831933146751d65b816a06d9157317687e552a6888448
  Stored in directory: /root/.cache/pip/wheels/19/24/be/ceeb7146de9186dada6000e36b040c6724548cd7ecbf7c557e
Successfully built decision-tree-id3
Installing collected packages: decision-tree-id3
Successfully installed decision-tree-id3-0.1.2
You should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [2]:
#===========================================================================
# read in the Titanic data
#===========================================================================
train_data = pd.read_csv('../input/titanic/train.csv')
test_data  = pd.read_csv('../input/titanic/test.csv')

#===========================================================================
# select some features
#===========================================================================
features = ["Pclass", "Sex", "SibSp", "Parch"]

#===========================================================================
# for the features that are categorical we use pd.get_dummies
#===========================================================================
X_train       = pd.get_dummies(train_data[features])
y_train       = train_data["Survived"]
final_X_test  = pd.get_dummies(test_data[features])

#===========================================================================
# perform the classification
#===========================================================================
from id3 import Id3Estimator
classifier = Id3Estimator()
classifier.fit(X_train, y_train)

#===========================================================================
# use the model to predict 'Survived' for the test data
#===========================================================================
predictions = classifier.predict(final_X_test)

#===========================================================================
# write out CSV submission file
#===========================================================================
output = pd.DataFrame({'PassengerId': test_data.PassengerId, 
                       'Survived': predictions})
output.to_csv('submission.csv', index=False)


# Links
* [J. R. Quinlan "Induction of decision trees", Machine Learning vol. **1** pp. 81-106 (1986)](https://link.springer.com/content/pdf/10.1007/BF00116251.pdf)
* [decision-tree-id3](https://github.com/svaante/decision-tree-id3) on GitHub
* [ID3 algorithm](https://en.wikipedia.org/wiki/ID3_algorithm) on Wikipedia