#### Main Objective: To perform McNemer's test on the results of all the (a) binary classifiers (b) multi-class classifiers
1. Binary Classifiers:
    a. Decision Tree
    b. Random Forest
    c. XGBoost
    d. Logistic Regression
    e. Gradient Boosting Classifier
    f. Decision Tree
    g. MLPClassifier

2. Multi-Class Classifiers:
    a. Decision Tree
    b. Random Forest
    c. XGBoost
    d. Logistic Regression
    e. Gradient Boosting Classifier
    f. Decision Tree
    g. MLPClassifier


##### 1. Binary Classifiers:

In [7]:
import numpy as np
from mlxtend.evaluate import mcnemar


# Confusion matrices for each NaN classifier
nan_tb_rf = np.array([[3808, 33], [672, 23]])

nan_tb_dt = np.array([[3281, 560], [543, 152]])

nan_tb_gb = np.array([[3837, 4], [689, 6]])

# tb_mlp = np.array([[1735, 2102],
#                  [286, 367]])

nan_tb_xgb = np.array([[3792, 49], [663, 32]])

nan_tb_lr = np.array([[3821, 20], [687, 8]])

nan_tb_mlp = np.array([[3613, 228], [601, 94]])

# Confusion matrices for each custom weight technique classifier
custom_tb_rf = np.array([[701, 3140], [59, 636]])

custom_tb_dt = np.array([[3273, 568], [526, 169]])

custom_tb_gb = np.array([[3839, 2], [692, 3]])

custom_tb_xgb = np.array([[3814, 27], [664, 31]])

custom_tb_lr = np.array([[3834, 7], [692, 3]])

# Confusion matrices for each balanced classifier
balanced_tb_rf = np.array([[3797, 44], [674, 21]])

balanced_tb_dt = np.array([[3319, 522], [524, 171]])

balanced_tb_gb = np.array([[2348, 1493], [207, 488]])

balanced_tb_xgb = np.array([[2796, 1045], [325, 370]])

balanced_tb_lr = np.array([[2312, 1529], [232, 463]])


# Confusion matrices for each oversample classifier

oversampling_tb_rf = np.array([[3704, 137], [625, 70]])

oversampling_tb_dt = np.array([[3242, 599], [506, 189]])

oversampling_tb_gb = np.array([[1872, 1969], [149, 546]])

oversampling_tb_xgb = np.array([[1750, 2091], [245, 450]])

oversampling_tb_lr = np.array([[2040, 1801], [187, 508]])

oversampling_tb_mlp = np.array([[2267, 1574], [326, 369]])

# List of confusion matrices
conf_matrices = {
    # for each NaN classifier
    "NaN_RF": nan_tb_rf,
    "NaN_DT": nan_tb_dt,
    "NaN_GB": nan_tb_gb,
    "NaN_XGB": nan_tb_xgb,
    "NaN_LR": nan_tb_lr,
    "NaN_MLP": nan_tb_mlp,
    # for each balanced class weight technique classifier
    "custom_RF": custom_tb_rf,
    "custom_DT": custom_tb_dt,
    "custom_GB": custom_tb_gb,
    "custom_XGB": custom_tb_xgb,
    "custom_LR": custom_tb_lr,
    # for each balanced class weight technique classifier
    "balanced_DT": balanced_tb_dt,
    "balanced_GB": balanced_tb_gb,
    "balanced_LR": balanced_tb_lr,
    "balanced_RF": balanced_tb_rf,
    "balanced_XGB": balanced_tb_xgb,
    # for each oversample classifier
    "oversampling_DT": oversampling_tb_dt,
    "oversampling_GB": oversampling_tb_gb,
    "oversampling_LR": oversampling_tb_lr,
    "oversampling_RF": oversampling_tb_rf,
    "oversampling_XGB": oversampling_tb_xgb,
    "oversampling_MLP": oversampling_tb_mlp,
}


# Function to perform McNemar test and print results
def perform_mcnemar_test(matrix1, matrix2, name1, name2):
    b = matrix1[0, 1] + matrix1[1, 0]
    c = matrix2[0, 1] + matrix2[1, 0]
    contingency_table = np.array(
        [[matrix1[0, 0] + matrix2[0, 0], b], [c, matrix1[1, 1] + matrix2[1, 1]]]
    )
    chi2, p = mcnemar(ary=contingency_table, corrected=True)
    print(f"McNemar Test between {name1} and {name2}")
    print("Chi-squared:", chi2)
    print("p-value:", p)
    print("-" * 30)


# Compare all pairs of classifiers
for name1, matrix1 in conf_matrices.items():
    for name2, matrix2 in conf_matrices.items():
        if name1 != name2:
            perform_mcnemar_test(matrix1, matrix2, name1, name2)

## Conclusion

# '''XGBoost (XGB) and Decision Tree (DT): There is no statistically significant difference between these two classifiers.
# All other pairs: Show statistically significant differences in their performance.
# The XGBoost classifier's performance is similar to the Decision Tree classifier, as indicated by the non-significant p-value.
# All other comparisons between classifiers indicate statistically significant differences, suggesting that these classifiers perform differently from each other on the given dataset.
# ''''''

McNemar Test between NaN_RF and NaN_DT
Chi-squared: 87.17311946902655
p-value: 9.942596260370355e-21
------------------------------
McNemar Test between NaN_RF and NaN_GB
Chi-squared: 0.08655221745350501
p-value: 0.7686069279587817
------------------------------
McNemar Test between NaN_RF and NaN_XGB
Chi-squared: 0.025405786873676783
p-value: 0.8733600977337836
------------------------------
McNemar Test between NaN_RF and NaN_LR
Chi-squared: 0.000708215297450425
p-value: 0.97876895097249
------------------------------
McNemar Test between NaN_RF and NaN_MLP
Chi-squared: 9.86245110821382
p-value: 0.0016868621629592758
------------------------------
McNemar Test between NaN_RF and custom_RF
Chi-squared: 1591.969518442623
p-value: 0.0
------------------------------
McNemar Test between NaN_RF and custom_DT
Chi-squared: 83.68204558087827
p-value: 5.811060490224053e-20
------------------------------
McNemar Test between NaN_RF and custom_GB
Chi-squared: 0.07147962830593281
p-value: 0.7891

##### 2. Multiclass Classifiers:

In [9]:
import numpy as np
from mlxtend.evaluate import mcnemar

# Multiclass classifier confusion matrices
confusion_matrices = {
    # Confusion matrices for each NaN classifier
    "NaN_rf": np.array([[65, 94, 39], [57, 109, 98], [22, 82, 120]]),
    "NaN_dt": np.array([[72, 77, 49], [81, 104, 79], [54, 70, 100]]),
    "NaN_gb": np.array([[57, 88, 53], [45, 116, 103], [15, 68, 141]]),
    "NaN_xgb": np.array([[65, 89, 44], [62, 118, 84], [31, 77, 116]]),
    "NaN_lr": np.array([[50, 96, 52], [48, 117, 99], [26, 65, 133]]),
    "NaN_mlp": np.array([[76, 78, 44], [63, 106, 95], [43, 71, 110]]),
    # Confusion matrices for each custom weight technique classifier
    "custom_rf": np.array([[66, 95, 37], [62, 121, 81], [28, 69, 127]]),
    "custom_dt": np.array([[59, 83, 56], [81, 99, 84], [44, 70, 110]]),
    "custom_gb": np.array([[91, 49, 58], [87, 70, 107], [27, 48, 149]]),
    "custom_xgb": np.array([[81, 82, 35], [81, 97, 86], [35, 67, 122]]),
    "custom_lr": np.array([[89, 52, 57], [83, 68, 113], [41, 43, 140]]),
    # Confusion matrices for each oversample classifier
    "oversampling_rf": np.array([[91, 64, 43], [73, 100, 91], [33, 68, 123]]),
    "oversampling_dt": np.array([[80, 70, 48], [88, 98, 78], [44, 79, 101]]),
    "oversampling_gb": np.array([[94, 41, 63], [92, 59, 113], [29, 42, 153]]),
    "oversampling_xgb": np.array([[73, 73, 52], [87, 76, 101], [34, 69, 121]]),
    "oversampling_lr": np.array([[90, 50, 58], [89, 65, 110], [45, 42, 137]]),
    "oversampling_lr": np.array([[99, 67, 32], [97, 87, 80], [63, 74, 87]]),
}


def get_binary_table(matrix1, matrix2, class1, class2):
    table = np.zeros((2, 2), dtype=int)
    for i in range(3):
        for j in range(3):
            if i == class1 and j == class1:
                table[0, 0] += matrix1[i, j]
                table[0, 0] += matrix2[i, j]
            elif i == class1 and j == class2:
                table[0, 1] += matrix1[i, j]
                table[0, 1] += matrix2[i, j]
            elif i == class2 and j == class1:
                table[1, 0] += matrix1[i, j]
                table[1, 0] += matrix2[i, j]
            elif i == class2 and j == class2:
                table[1, 1] += matrix1[i, j]
                table[1, 1] += matrix2[i, j]
    return table


def mcnemar_test_for_pair(matrix1, matrix2):
    classes = [0, 1, 2]
    results = {}

    for i in range(len(classes)):
        for j in range(i + 1, len(classes)):
            tb_b = get_binary_table(matrix1, matrix2, classes[i], classes[j])

            chi2, p = mcnemar(ary=tb_b, corrected=True)
            results[f"Class {classes[i]} vs Class {classes[j]}"] = {
                "chi-squared": chi2,
                "p-value": p,
            }

    return results


classifiers = list(confusion_matrices.keys())
all_results = {}

for i in range(len(classifiers)):
    for j in range(i + 1, len(classifiers)):
        key = f"{classifiers[i]} vs {classifiers[j]}"
        all_results[key] = mcnemar_test_for_pair(
            confusion_matrices[classifiers[i]], confusion_matrices[classifiers[j]]
        )

# Print all results
for pair, results in all_results.items():
    print(f"\nResults for {pair}:")
    for class_pair, result in results.items():
        print(
            f'{class_pair} - chi-squared: {result["chi-squared"]}, p-value: {result["p-value"]}'
        )


Results for NaN_rf vs NaN_dt:
Class 0 vs Class 1 - chi-squared: 3.313915857605178, p-value: 0.0686956172900934
Class 0 vs Class 2 - chi-squared: 0.7378048780487805, p-value: 0.39036496254880615
Class 1 vs Class 2 - chi-squared: 1.750759878419453, p-value: 0.1857812335807854

Results for NaN_rf vs NaN_gb:
Class 0 vs Class 1 - chi-squared: 21.975352112676056, p-value: 2.7617449828327876e-06
Class 0 vs Class 2 - chi-squared: 22.6046511627907, p-value: 1.9900225604473457e-06
Class 1 vs Class 2 - chi-squared: 7.122507122507122, p-value: 0.007612218247778954

Results for NaN_rf vs NaN_xgb:
Class 0 vs Class 1 - chi-squared: 13.142384105960264, p-value: 0.0002886902839849097
Class 0 vs Class 2 - chi-squared: 6.1838235294117645, p-value: 0.012892338994381351
Class 1 vs Class 2 - chi-squared: 1.4193548387096775, p-value: 0.23350962263557928

Results for NaN_rf vs NaN_lr:
Class 0 vs Class 1 - chi-squared: 23.91864406779661, p-value: 1.004937915095615e-06
Class 0 vs Class 2 - chi-squared: 12.6906