## Description:

This dataset contains information on the signs and symptoms of newly diagnosed diabetic patients or those at risk of developing diabetes. The data was collected through direct questionnaires administered to patients at the Sylhet Diabetes Hospital in Sylhet, Bangladesh, and approved by a doctor.

## Purpose:

To predict the likelihood of diabetes at an early stage using data mining techniques.

## Variables Table:

- `age (Feature, Integer):` Age of the patient
- `gender (Feature, Categorical):` Gender of the patient
- `polyuria (Feature, Binary):` Presence of polyuria (Yes/No)
- `polydipsia (Feature, Binary):` Presence of polydipsia (Yes/No)
- `sudden_weight_loss (Feature, Binary):` Experience of sudden weight loss (Yes/No)
- `weakness (Feature, Binary):` Experience of weakness (Yes/No)
- `polyphagia (Feature, Binary):` Presence of polyphagia (Yes/No)
- `genital_thrush (Feature, Binary):` Presence of genital thrush (Yes/No)
- `visual_blurring (Feature, Binary):` Experience of visual blurring (Yes/No)
- `itching (Feature, Binary):` Experience of itching (Yes/No)


## Import Statements


In [1]:
# Data Analysis
import pandas as pd
import numpy as np

# Data Visulization
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme("notebook")
import plotly.express as px

# Model Training
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.metrics import (
    classification_report,
    confusion_matrix,
    accuracy_score,
    r2_score,
    jaccard_score,
)

# Other
from itertools import product

In [2]:
def plot_confusion_matrix(cm, classes, title, cmap, normalize=False):
    """
    This function is used to plot the confusion matrix for the Classification models
    """

    if normalize:
        cm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized Confusion Matrix")
    else:
        print("UnNormalize Confusion Matrix")
    print(cm)

    plt.title(title)
    plt.imshow(cm, interpolation="nearest", cmap=cmap)
    plt.colorbar()

    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes)
    plt.yticks(tick_marks, classes)

    fmt = ".2f" if normalize else "d"
    threshold = cm.max() / 2.0
    labels = [["TP", "TN"], ["FP", "FN"]]

    for i, j in product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(
            j,
            i,
            f"{format(cm[i,j],fmt)}\n{labels[i][j]}",
            horizontalalignment="center",
            color="white" if cm[i, j] > threshold else "black",
        )

    plt.xlabel("Predicted Labels")
    plt.ylabel("Actual Labels")
    plt.tight_layout()
    plt.show()

SyntaxError: incomplete input (313360176.py, line 2)