# Decision Tree Implementation

Your task is to implement parts of the Decision Tree Classification algorithm from scratch (i.e., without importing any libraries or packages for decision trees). A decision tree is a supervised learning model used for classification tasks and is comprised of the following steps:

1. Split the Dataset: Create possible splits for each feature of the dataset. For each split, separate the data into two groups based on the feature's threshold value.

2. Calculate the Split Metric: For each split, calculate an evaluation metric such as Gini Impurity to determine the best split. Gini Impurity measures how often a randomly chosen element would be incorrectly labeled, aiming to create pure groups. A Gini Impurity of 0 represents perfect purity, while higher values indicate more mixed groups.

3. Build the Tree Recursively: Use a recursive function to create nodes, splitting the data until a stopping criterion is met (e.g., maximum depth of tree, minimum number of samples, or no more possible splits).

4. Classify New Data: Use the resulting decision tree to classify a new data point based on the feature values and decision rules at each node.

You will be given a 2D array of float values, train_data, as training data, where each subarray represents a unique sample. The last element in each subarray represents the true class label (e.g., 0 or 1). You will also be given test_data, which you need to classify using the decision tree you build.

In [None]:
train_data = [[2.8, 1.0, 0], 
 [1.3, 3.1, 1], 
 [3.6, 2.7, 0], 
 [2.9, 1.9, 1], 
 [1.5, 0.9, 0], 
 [3.7, 1.5, 1]]


test_data = [[3.0, 1.0], 
 [1.8, 2.5], 
 [2.2, 1.7]]


In [None]:
# skeleton code

# Step 1: Split dataset based on a feature and a threshold
def split_dataset(data, feature_index, threshold):
    # implement this
    pass

# Step 2: Calculate Gini Impurity or Information Gain
def calculate_gini(groups, classes):
    # Total number of samples
    n_samples = sum([len(group) for group in groups])
    
    # Gini calculation for each group
    gini = 0.0
    for group in groups:
        size = len(group)
        if size == 0:
            continue
        score = 0.0
        
        # Using set to find unique class labels in the group
        unique_labels = set(row[-1] for row in group)
        
        # Calculate the proportion for each unique label
        for label in unique_labels:
            count = sum(1 for row in group if row[-1] == label)
            p = count / size
            score += p ** 2
        
        # Calculate weighted Gini for each group
        gini += (1.0 - score) * (size / n_samples)
    
    return gini


# Step 3: Select the best split point
def get_best_split(data):
    # implement this
    pass

# Step 4: Build the tree
def build_tree(data, max_depth, min_size, depth=0):
    # implement this
    pass

# Step 5: Make a prediction
def predict(node, row):
    # implement this
    pass

# Step 6: Decision Tree Classifier
def decision_tree(train_data, test_data, max_depth, min_size):
    # Build the tree
    tree = build_tree(train_data, max_depth, min_size)
    
    # Make predictions
    predictions = []
    for row in test_data:
        prediction = predict(tree, row)
        predictions.append(prediction)
    
    return predictions
