# How does the Decision Tree algorithm Work?
The decision tree operates by analyzing the data set to predict its classification. It commences from the tree’s root node, where the algorithm views the value of the root attribute compared to the attribute of the record in the actual data set. Based on the comparison, it proceeds to follow the branch and move to the next node.

The algorithm repeats this action for every subsequent node by comparing its attribute values with those of the sub-nodes and continuing the process further. It repeats until it reaches the leaf node of the tree. The complete mechanism can be better explained through the algorithm given below.



1.   Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
2.   Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
3. Step-3: Divide the S into subsets that contains possible values for the best attributes.


4.  Step-4: Generate the decision tree node, which contains the best attribute.
5.   Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf nodeClassification and Regression Tree algorithm.

# Steps of the code :


1. Installing necessary libraries

1. Importing needed libraries
2.   Generate a decision tree
4.   visualization the Decision tree
2.   vactorizing the the categorical data

1.   reading the data from txt file and run the whole code



# Explanation of every part in the code 🐾 :

## 1_ Block 1 & 2 of code :
the code installs the pydotplus and scikit-learn packages and imports pandas, numpy, tree from scikit-learn, and pydotplus.

1. pydotplus :  is a library commonly used for creating and visualizing graphs and diagrams in Python, particularly for tasks related to machine learning, decision trees, and graphical representations of data structures.
2. scikit-learn : This is the name of the Python package that you want to install. Scikit-learn is commonly used for machine learning and data analysis tasks in Python.
3. Pandas : Pandas is a powerful library for data manipulation and analysis in Python.
4. NumPy : NumPy is a fundamental library for numerical and array operations in Python.
5. Tree : tree module provides functionality for decision trees, which are a type of machine learning model. You can use this module to create, train, and visualize decision trees.
6. pydotplus : pydotplus library, which is a Python interface to Graphviz, a graph visualization tool. It's often used in combination with scikit-learn to visualize decision trees and other machine learning models.


In [None]:
!pip install pydotplus
!pip install scikit-learn



In [None]:
import pandas as pd # dataframe
import numpy as np # array
from sklearn import tree
import pydotplus

## 3_ Generate a decision tree :
In summary, this code defines a function that takes a training dataset as input, extracts features and labels from it, creates a decision tree classifier, trains the classifier on the data, and returns the trained classifier. It's a basic implementation of a decision tree classifier using scikit-learn.

In [None]:
def createTree(trainingData):
  #  selects all rows and all columns except the last one, which is assumed to be the label column ( Feature matrix)
    X_tain = trainingData.iloc[:, :-1]

    #  DataFrame. It selects all rows and only the last column, which is assumed to be the label column.
    labels = trainingData.iloc[:, -1]

    #It configures the classifier to use the "entropy" criterion (ASM), which is a measure of impurity in calssification used to make decisions about splitting the tree nodes
    trainedTree = tree.DecisionTreeClassifier(criterion="entropy")

    # This is where the actual learning or training of the decision tree model takes place.
    # The model will learn to make decisions based on the features and labels provided in the training data.
    trainedTree.fit(data, labels)  # Train the model.
    return trainedTree

### 4_ visualization the Decision tree :
In summary, this code defines a function that takes a trained decision tree classifier, generates a visualization of the decision tree in the DOT format using Graphviz, and saves the tree diagram as a PDF file with the provided filename. This function is useful for visualizing decision trees generated by machine learning models.

In [None]:
# the function takes two arguments: trainedTree (the trained decision tree classifier) and
# finename (the name of the PDF file where the tree diagram will be saved).
def showtree2pdf(trainedTree,finename):

  # export_graphviz: it is used to export a decision tree in a format that can be visualized using Graphviz, a graph visualization tool.
  #out_file=None: It specifies where to write the output DOT format data.
  #means that the DOT data should not be saved to a file but should instead be stored in the variable dot_data as a string.

    dot_data = tree.export_graphviz(trainedTree, out_file=None)

    # Here, the DOT data (dot_data) is passed to pydotplus.graph_from_dot_data,
    # which creates a Graphviz graph object (graph) from the DOT data. This graph will represent the decision tree structure.
    graph = pydotplus.graph_from_dot_data(dot_data)

    # Finally, this line saves the Graphviz graph as a PDF file with the specified filename (finename).
    graph.write_pdf(finename)  # Save the tree diagram to the local machine in PDF format.

## 5_ vactorizing the the categorical data :
In summary, this function is designed to take a Pandas DataFrame containing categorical data, convert the categorical columns into numerical representations, and return the modified DataFrame. This type of transformation is often necessary when working with machine learning algorithms that require numerical input data. However, please note that the code assumes that all columns except the last one are categorical.


In [None]:
def data2vectoc(data):
  #This line extracts the names of all columns in the data DataFrame.
    names = data.columns

    for i in names:
      #his line creates a Pandas Categorical object (col) from the data in the current column data[i].
      #Categorical data typically represents discrete values
        col = pd.Categorical(data[i])

      #This line replaces the values in the current column data[i] with the numerical codes assigned to the categories in the col Categorical object.
      # The .codes attribute of a Categorical object returns the integer codes corresponding to the categories.
        data[i] = col.codes
    return data

## 6_reading the data from txt file and run the whole code :


In [None]:
#pd.read_table is used to read data from a text file where the columns are separated by a tab character ('\t').
# header=None indicates that there is no header row in the file, so the columns will be indexed numerically.
data = pd.read_table("tennis.txt",header=None,sep='\t')

# Apply Vectorization for data
trainingvec=data2vectoc(data)

# printin the data to make sure that we convert the caegorical data into numerical
print(data)

# greation the tree
decisionTree=createTree(trainingvec)

# Plot the decision tree in tennis.pdf file , you will find it in the same directory of the tennis.txt file
showtree2pdf(decisionTree,"tennis.pdf")

    0  1  2  3  4
0   2  1  0  1  0
1   2  1  0  0  0
2   0  1  0  1  1
3   1  2  0  1  1
4   1  0  1  1  1
5   1  0  1  0  0
6   0  0  1  0  1
7   2  2  0  1  0
8   2  0  1  1  1
9   1  2  1  1  1
10  2  2  1  0  1
11  0  2  0  0  1
12  0  1  1  1  1
13  1  2  0  0  0
