SVM and Naïve Bayes 

##### In this analysis, we aim to build and evaluate two classification models, Support Vector Machine (SVM) and Naive Bayes, using the 'drugdataset.csv' provided by John Hughes. The dataset contains patient attributes such as age, sex, blood pressure levels, cholesterol levels, and sodium-to-potassium ratio.
##### The primary objective is to assess the performance of these models in predicting the type of drug prescribed (drugA, drugB, drugC, drugX, drugY) based on the given features. We will compare the models using metrics like accuracy, precision, recall, and F1-score to determine which model offers the best predictive capability for this dataset.

In [28]:
#Load Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [29]:
# Load the dataset
drug_data = pd.read_csv("drugdataset.csv")
#Display 5 rows of Data
drug_data.head()


Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,1,2,1,25.355,drugY
1,47,0,1,1,13.093,drugC
2,47,0,1,1,10.114,drugC
3,28,1,0,1,7.798,drugX
4,61,1,1,1,18.043,drugY


In [30]:
#Generate key statistics using: describe()
drug_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,200.0,44.315,16.544315,15.0,31.0,45.0,58.0,74.0
Sex,200.0,0.48,0.500854,0.0,0.0,0.0,1.0,1.0
BP,200.0,1.09,0.821752,0.0,0.0,1.0,2.0,2.0
Cholesterol,200.0,0.515,0.501029,0.0,0.0,1.0,1.0,1.0
Na_to_K,200.0,16.084485,7.223956,6.269,10.4455,13.9365,19.38,38.247


In [31]:
#Identify number of Classes (i.e. drugs)
target_names=drug_data.Drug.unique()
target_names

array(['drugY', 'drugC', 'drugX', 'drugA', 'drugB'], dtype=object)

In [32]:
# create x and y variables
X = drug_data.drop('Drug', axis=1).to_numpy() # Drop 'Drug' column and convert remaining data to a NumPy array.
y = drug_data['Drug'].to_numpy() # Convert 'Drug' column to a NumPy array as target labels.

In [33]:
#Create Training and Test Datasets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=100)

# Step 5: Scale the Data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [35]:
#Script for SVM and NB
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix  


for name, model in [('SVM', SVC(kernel='linear', random_state=100)), 
                    ('Naive Bayes', GaussianNB())]:
    model.fit(X_train_scaled, y_train)
    predictions = model.predict(X_test_scaled)

    # Print Confusion Matrix and Classification Report
    print(f"\nModel: {name}")
    print("Confusion Matrix:\n", confusion_matrix(y_test, predictions))
    print(f"\nModel: {name}")
    print("Classification Report:\n", classification_report(y_test, predictions, target_names=target_names))


Model: SVM
Confusion Matrix:
 [[ 5  0  0  0  0]
 [ 0  2  0  0  1]
 [ 0  0  3  0  0]
 [ 0  0  0 11  0]
 [ 0  0  0  1 17]]

Model: SVM
Classification Report:
               precision    recall  f1-score   support

       drugY       1.00      1.00      1.00         5
       drugC       1.00      0.67      0.80         3
       drugX       1.00      1.00      1.00         3
       drugA       0.92      1.00      0.96        11
       drugB       0.94      0.94      0.94        18

    accuracy                           0.95        40
   macro avg       0.97      0.92      0.94        40
weighted avg       0.95      0.95      0.95        40


Model: Naive Bayes
Confusion Matrix:
 [[ 5  0  0  0  0]
 [ 0  3  0  0  0]
 [ 0  0  3  0  0]
 [ 0  0  0 10  1]
 [ 1  1  3  1 12]]

Model: Naive Bayes
Classification Report:
               precision    recall  f1-score   support

       drugY       0.83      1.00      0.91         5
       drugC       0.75      1.00      0.86         3
       drugX    