<a href='https://www.darshan.ac.in/'> <img src='https://www.darshan.ac.in/Content/media/DU_Logo.svg' width="250" height="300"/></a>
<pre>
<center><b><h1>Data Mining</b></center> 
<pre>



# Implement Decision Tree(ID3) in python
Uses Information Gain to choose the best feature to split. 

Recursively builds the tree until stopping conditions are met.

1) Calculate Entropy for the dataset.<BR>
2) Calculate Information Gain for each feature. <BR>
3) Choose the feature with maximum Information Gain. <BR>
4) Split dataset into subsets for that feature. <BR>
5) Repeat recursively until: <BR>

All samples in a node have the same label.<BR>
No features are left.<BR>
No data is left.

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

##  import Pandas, Numpy

In [34]:
import pandas as pd
import numpy as np

##  Create Following Data

In [39]:
data = pd.read_csv("heart.csv", encoding='latin1', on_bad_lines='skip')     

In [41]:
data

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59,1,1,140,221,0,1,164,1,0.0,2,0,2,1
1021,60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
1022,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0
1023,50,0,0,110,254,0,0,159,0,0.0,2,0,2,1


## Now Define Function to  Calculate Entropy

In [44]:
def entropy(y):
    elements,value_counts = np.unique(y,return_counts=True)
    probabilities = value_counts / value_counts.sum()
    entropy = -np.sum(probabilities * np.log2(probabilities))
    return entropy
  

## Define function to Calculate Information Gain

In [47]:
def information_gain(data, split_attribute, target):
    total_entropy = entropy(data[target])

    elements, value_counts = np.unique(data[split_attribute], return_counts=True)

    weighted_entropy = 0.0
    for i in range(len(elements)):
        subset = data[data[split_attribute] == elements[i]]
        weighted_entropy += (value_counts[i] / value_counts.sum()) * entropy(subset[target])

    information_gain = total_entropy - weighted_entropy
    return information_gain
   

## Implement ID3 Algo

In [50]:
def id3(data, features, target):
    # If all labels are same → return the label
    if len(np.unique(data[target])) == 1:
        return data[target].iloc[0] 
   
    
    # If no features left → return majority label
    if len(features) == 0:
        return data[target].mode()[0]
    
    
    # Choose best feature
    gains = [information_gain(data, feature, target) for feature in features]
    best_feature = features[np.argmax(gains)]

    tree = {best_feature: {}}

    
    # For each value of best feature → branch
    for value in np.unique(data[best_feature]):
        sub_data = data[data[best_feature] == value].drop(columns=[best_feature])
        sub_tree = id3(sub_data, [f for f in features if f != best_feature], target)
        tree[best_feature][value] = sub_tree
        
    # Return the tree    
    return tree

## Use ID3

In [53]:
id3_tree = id3(data, data.columns[:-1], 'target')

## Print Tree

In [55]:
# print("ID3 Decision Tree: ", id3_tree)
id3_tree

{'chol': {126: 1,
  131: 0,
  141: 1,
  149: {'age': {49: 0, 71: 1}},
  157: 1,
  160: 1,
  164: 0,
  166: 0,
  167: 0,
  168: 1,
  169: 0,
  172: 0,
  174: 0,
  175: 1,
  176: 0,
  177: {'age': {43: 0, 46: 1, 59: 0, 65: 1}},
  178: 1,
  180: 1,
  182: 1,
  183: 1,
  184: 0,
  185: 0,
  186: 1,
  187: 0,
  188: 0,
  192: 1,
  193: {'age': {56: 1, 68: 0}},
  195: 1,
  196: 1,
  197: {'age': {44: 0, 46: 1, 53: 1, 58: 1, 63: 0, 76: 1}},
  198: {'age': {35: 0, 41: 1}},
  199: 1,
  200: 0,
  201: 1,
  203: {'age': {41: 1, 53: 0, 61: 0}},
  204: {'age': {29: 1, 41: 1, 46: 1, 47: 1, 52: 0, 59: 0}},
  205: {'age': {52: 1, 55: 0}},
  206: 0,
  207: {'age': {57: 1, 61: 0}},
  208: 1,
  209: 1,
  210: 1,
  211: 1,
  212: {'age': {52: 0, 59: 1, 64: 0, 66: 0, 67: 0}},
  213: 1,
  214: 1,
  215: 1,
  216: {'age': {53: 1, 58: 0}},
  217: 0,
  218: 0,
  219: {'age': {39: 0, 44: 1, 50: 1}},
  220: 1,
  221: 1,
  222: 1,
  223: {'age': {40: 0, 52: 1, 67: 1}},
  224: 0,
  225: 0,
  226: 1,
  227: 1,
  22