# Divide Dataset Based on Feature Threshold

Write a Python function to divide a dataset based on whether the value of a specified feature is greater than or equal to a given threshold. The function should return two subsets of the dataset: one with samples that meet the condition and another with samples that do not.

Example:
```python
    X = np.array([[1, 2], 
                  [3, 4], 
                  [5, 6], 
                  [7, 8], 
                  [9, 10]])
    feature_i = 0
    threshold = 5
    output: [array([[ 5,  6],
                    [ 7,  8],
                    [ 9, 10]]), 
             array([[1, 2],
                    [3, 4]])]
    Reasoning:
    The dataset X is divided based on whether the value in the 0th feature (first column) is greater than or equal to 5. Samples with the first column value >= 5 are in the first subset, and the rest are in the second subset.
```

## Understanding Dataset Division Based on Feature Threshold

Dividing a dataset based on a feature threshold is a common operation in machine learning, especially in decision tree algorithms. This technique helps in creating splits that can be used for further processing or model training.

In this problem, you will write a function to split a dataset based on whether the value of a specified feature is greater than or equal to a given threshold. You'll need to create two subsets: one for samples that meet the condition and another for samples that do not.

This method is crucial for algorithms that rely on data partitioning, such as decision trees and random forests. By splitting the data, the model can create rules to make predictions based on the threshold values of certain features.

In [1]:
import numpy as np

def divide_on_feature(X, feature_i, threshold):
	
    # Define the split function based on the threshold type
    split_func = None
    if isinstance(threshold, int) or isinstance(threshold, float):
        # For numeric threshold, check if feature value is greater than or equal to the threshold
        split_func = lambda sample: sample[feature_i] >= threshold
    else:
        # For non-numeric threshold, check if feature value is equal to the threshold
        split_func = lambda sample: sample[feature_i] == threshold

    # Create two subsets based on the split function
    X_1 = np.array([sample for sample in X if split_func(sample)])
    X_2 = np.array([sample for sample in X if not split_func(sample)])

    # Return the two subsets
    return [X_1, X_2]

In [4]:
ans = divide_on_feature(np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]), 0, 5)
print('Test Case 1: Accepted') if ans[0].all() == np.array([[5, 6], [7, 8], [9, 10]]).all() and ans[1].all() == np.array([[1, 2], [3, 4]]).all() else print('Test Case 1: Rejected')
print('Input:')
print('print(divide_on_feature(np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]), 0, 5))')
print()
print('Output:')
print(ans)
print()
print('Expected:')
print('[array([[ 5,  6], [ 7,  8], [ 9, 10]]), array([[1, 2], [3, 4]])]')
print()
print()

ans = divide_on_feature(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]), 1, 3)
print('Test Case 2: Accepted') if ans[0].all() == np.array([[3, 3], [4, 4]]).all() and ans[1].all() == np.array([[1, 1], [2, 2]]).all() else print('Test Case 2: Rejected')
print('Input:')
print('print(divide_on_feature(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]), 1, 3))')
print()
print('Output:')
print(ans)
print()
print('Expected:')
print('[array([[3, 3], [4, 4]]), array([[1, 1], [2, 2]])]')

Test Case 1: Accepted
Input:
print(divide_on_feature(np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]), 0, 5))

Output:
[array([[ 5,  6],
       [ 7,  8],
       [ 9, 10]]), array([[1, 2],
       [3, 4]])]

Expected:
[array([[ 5,  6], [ 7,  8], [ 9, 10]]), array([[1, 2], [3, 4]])]


Test Case 2: Accepted
Input:
print(divide_on_feature(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]), 1, 3))

Output:
[array([[3, 3],
       [4, 4]]), array([[1, 1],
       [2, 2]])]

Expected:
[array([[3, 3], [4, 4]]), array([[1, 1], [2, 2]])]
