# Decision Tree

This example shows how to use [SciKit-Learn](https://scikit-learn.org/stable/) to train a Decision Tree model on the Titanic dataset. Data is processed to increase the accuracy of the model. For a more detailed explanation of what is Decision Tree is, see [Decision Tree](../document/decision_tree.md).

## Imports

In [None]:
import polars as pl
from polars import LazyFrame

## Process Data
Apply the same processing to the training and testing data.

In [None]:
train_data: LazyFrame = pl.scan_csv("data/train.csv", has_header=True)
test_data: LazyFrame = pl.scan_csv("data/test.csv", has_header=True)

In [None]:
title_map: dict[str, int] = {
    "Capt": 1,
    "Col": 1,
    #"Countess": 2,
    "Don": 2,
    "Dona": 2,
    "Dr": 3,
    "Jonkheer": 2,
    "Lady": 2,
    "Major": 1,
    "Master": 4,
    "Miss": 5,
    "Miss": 5,
    "Mme": 6,
    "Mr": 7,
    "Mrs": 6,
    "Ms": 5,
    "Rev": 8,
    "Sir": 2,
}

### Features

In [None]:
train_features: LazyFrame = train_data.select(
    sku=pl.col("Pclass").rank(method="dense"),
    family_size=pl.col("SibSp") + pl.col("Parch") + 1,
    embarked=pl.col("Embarked").rank(method="dense"),
    title=pl.col("Name").str.extract(r",\s*(\w+)\.\s*").replace_strict(
        title_map,
        default=max(title_map.values()) + 1,
        return_dtype=pl.UInt8),
)

In [None]:
train_features.collect().sample(5)

----
Go back to [index](_index.ipynb).