# Decision Tree
A decision tree is a flowchart-like structure that represents a sequence of decisions and their possible consequences. It is built using a set of rules and conditions to divide the data into smaller and more manageable subsets based on different features. Each internal node of the tree represents a decision based on a specific feature, while the leaf nodes represent the outcomes or predictions.


# Tree Building:
● The process begins by selecting the most informative feature from the available dataset as the root node. ● The dataset is then split based on the chosen feature into smaller subsets. 
● The above steps are recursively applied to each subset, creating a tree structure until a certain termination criterion is met.
![image.png](attachment:image.png)

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

# Read data from the csv file
df = pd.read_csv("F:/Nust/titanic_clean.csv")

# Data cleaning and preprocessing
## Convert categorical variable to one-hot vectors
df_OneHot = pd.get_dummies(df, columns=['Pclass', 'Sex', 'Embarked', 'Title', 'GrpSize', 'FareCat', 'AgeCat'])
df = df_OneHot
# Remove label and extra features
X = df.drop(['Survived','PassengerId'],axis=1)
# Target/prediction variable
Y = df['Survived']

# Split data to train and test-set
xtrain,xtest,ytrain,ytest=train_test_split(X,Y,test_size=0.3, random_state=100,shuffle=True)

# Train the model
clf_dt = DecisionTreeClassifier(criterion='gini')
clf_dt.fit(xtrain, ytrain)

# Get predictions for testset
dt_pred = clf_dt.predict(xtest)
dt_pred_prb = clf_dt.predict_proba(xtest)[:, 1]

# Calculate accuracy
accuracy_dt = accuracy_score(ytest,dt_pred)
print("Accuracy: {}".format(accuracy_dt))

Accuracy: 0.7835820895522388
