# Implement Gini Impurity Calculation for a Set of Classes

Description:

## Task: Implement Gini Impurity Calculation

Your task is to implement a function that calculates the Gini Impurity for a set of classes. Gini impurity is commonly used in decision tree algorithms to measure the impurity or disorder within a node.

Write a function `gini_impurity(y)` that takes in a list of class labels `y` and returns the Gini Impurity rounded to three decimal places.

Example:

```python
Input:
y = [0, 1, 1, 1, 0]
print(gini_impurity(y))
Output:
0.48
Reasoning:
The Gini Impurity is calculated as 1 - (p_0^2 + p_1^2), where p_0 and p_1 are the probabilities of each class. In this case, p_0 = 2/5 and p_1 = 3/5, resulting in a Gini Impurity of 0.48.
```

```
Test Cases:
Test:
y = [0, 0, 0, 0, 1, 1, 1, 1]
print(gini_impurity(y))
Expected Output:
0.5
Test:
y = [0, 0, 0, 0, 0, 1]
print(gini_impurity(y))
Expected Output:
0.278
```

## Understanding Gini Impurity

Gini impurity is a statistical measurement of the impurity or disorder in a list of elements. It is commonly used in decision tree algorithms to decide the optimal split at tree nodes. It is calculated as follows, where $p_i$ is the probability of each class $\frac{n_i}{n}$:

$$Gini\ Impurity = 1 - \sum_{i=1}^{n} p_i^2$$
 
A Gini impurity of 0 indicates a node where all elements belong to the same class, whereas a Gini impurity of 0.5 indicates maximum impurity, where elements are evenly distributed among each class. This means that a lower impurity implies a more homogeneous distribution of elements, suggesting a good split, as decision trees aim to minimize it at each node.

## Advantages and Limitations

### Advantages:
- Computationally efficient
- Works for binary and multi-class classification

### Limitations:
- Biased toward larger classes
- May cause overfitting in deep decision trees

## Example Calculation

Suppose we have the set: [0,1,1,1,0]. The probability of each class is calculated as follows:

$$p_0 = \frac{2}{5} = 0.4, p_1 = \frac{3}{5} = 0.6$$
 
The Gini Impurity is then calculated as follows:

$$Gini\ Impurity = 1 - (0.4^2 + 0.6^2) = 1 - (0.16 + 0.36) = 1 - 0.52 = 0.48$$

In [1]:
import numpy as np

def gini_impurity(y):
    """
    Calculate Gini Impurity for a list of class labels.

    :param y: List of class labels
    :return: Gini Impurity rounded to three decimal places
    """
    sz = len(y)
    cls = len(set(y))
    cnt = [0 for _ in range(cls)]
    for i in y: cnt[i]+=1
    return round(1 - sum((cnti / sz) ** 2 for cnti in cnt), 3)

In [2]:
y = [0, 0, 0, 0, 1, 1, 1, 1]
output = gini_impurity(y)
print('Test Case 1: Accepted') if output == 0.5 else print('Test Case 1: Failed')
print('Input:')
print('y = [0, 0, 0, 0, 1, 1, 1, 1]\nprint(gini_impurity(y))')
print()
print('Expected Output:')
print('0.5')
print('Actual Output:')
print(output)
print()
print()

y = [0, 0, 0, 0, 0, 1]
output = gini_impurity(y)
print('Test Case 2: Accepted') if output == 0.278 else print('Test Case 2: Failed')
print('Input:')
print('y = [0, 0, 0, 0, 0, 1]\nprint(gini_impurity(y))')
print()
print('Expected Output:')
print('0.278')
print('Actual Output:')
print(output)
print()
print()

y = [0, 1, 1, 1, 0]
output = gini_impurity(y)
print('Test Case 3: Accepted') if output == 0.48 else print('Test Case 3: Failed')
print('Input:')
print('y = [0, 1, 1, 1, 0]\nprint(gini_impurity(y))')
print()
print('Expected Output:')
print('0.48')
print('Actual Output:')
print(output)
print()

Test Case 1: Accepted
Input:
y = [0, 0, 0, 0, 1, 1, 1, 1]
print(gini_impurity(y))

Expected Output:
0.5
Actual Output:
0.5


Test Case 2: Accepted
Input:
y = [0, 0, 0, 0, 0, 1]
print(gini_impurity(y))

Expected Output:
0.278
Actual Output:
0.278


Test Case 3: Accepted
Input:
y = [0, 1, 1, 1, 0]
print(gini_impurity(y))

Expected Output:
0.48
Actual Output:
0.48

