<a href="https://colab.research.google.com/github/Scottymichaelmillerguy/decision-trees-and-ensemble-methods/blob/main/Decision_Trees.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Decision Trees

## From ML Algorithms to GenAI & LLMs by Aman Kharwal

### CART

We will use the breast cancer dataset available in the scikit-learn library to demonstrate the implementation of CART and other algorithms in decision trees using Python.
Here is the step-by-step process for implementing CART in decision trees

Step 1: Import the necessary libraries and load the dataset

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

#Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

Step 2: Divide the dataset into training and test sets

In [2]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

Step 3: Create an instance of DecisionTreeClassifier and fit the model to the training data

In [3]:
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

Step 4: Make predictions on the test set

In [4]:
y_pred = clf.predict(X_test)

Step 5: Evaluate the Model's Performance

In [5]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9473684210526315


You can visualize the decision-making process of your model using the code below:

In [7]:
import graphviz
from sklearn.tree import export_graphviz

dot_data = export_graphviz(clf, out_file=None,
                           feature_names=data.feature_names,
                           class_names=data.target_names,
                           filled=True, rounded=True)

graph = graphviz.Source(dot_data)

# Save and open the decision tree as a PDF or image
graph.render("decision_tree", view=True)


dot_data = export_graphviz(clf, out_file=None,
                           feature_names=data.feature_names,
                           class_names=data.target_names,
                           filled=True, rounded=True)

graph = graphviz.Source(dot_data)

# Save and open the decision tree as a PDF or image
graph.render("decision_tree", view=True)


'decision_tree.pdf'

### ID3

To Train a machine learning model using ID3 in decision trees, you just need to set the criterion as "entropy" as shown below

In [8]:
clf = DecisionTreeClassifier(criterion='entropy')
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuacy:", accuracy)

Accuacy: 0.9473684210526315


visualize it

In [9]:
dot_data = tree.export_graphviz(clf, out_file=None,
                                feature_names=data.feature_names,
                                class_names=data.target_names,
                                filled=True, rounded=True,
                                special_characters=True)

graph = graphviz.Source(dot_data)

graph.render("decision_tree_id3", view=True)

'decision_tree_id3.pdf'

### C4.5

To train a machine learning model using C4.5 in decision trees, you just need to set the criterion as "entropy" and the splitter as "best" as shown below

In [10]:
clf = DecisionTreeClassifier(criterion='entropy', splitter='best')
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.9473684210526315


visualize

In [13]:
dot_data = tree.export_graphviz(clf, out_file=None,
                                feature_names=data.feature_names,
                                class_names=data.target_names,
                                filled=True, rounded=True,
                                special_characters=True)

graph = graphviz.Source(dot_data)

graph.render("decision_tree_c4.5", view=True)

'decision_tree_c4.5.pdf'