STUDENT: Joel S. Mollel

NUMBER: C00313599

ALGORITHM: Support Vector Machines(SVM)

We will use the code where SVM is working as a classifier


Provided with SVM Code, we are required to

i) make sure it runs

ii)Change some hyperparameters and see the impact

iii)Use another dataset and perform other operations, and simulate as an app

(i) Running the code

=> The module did not run due to the deprication of the scikit-learn generator named sklearn.datasets.samples_generator, used in the code. This has been replaced by sklearn.datasets.make_blobs.
See the code below, I have commented

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns; sns.set()

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=50, centers=2,
                  random_state=0, cluster_std=0.60)

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plt.show()


(ii) Changing the hyperparameters, in this case i will focus on kernels

I will use the following kernels values

'linear', 'poly', 'rbf', 'sigmoid'

Due to the nature of our data, all kernels produced similar results. This means my data is linearly separable or close to it. Run the code and see results.

In [None]:
from sklearn.svm import SVC
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.60)


kernels = ['linear', 'poly', 'rbf', 'sigmoid'] # I am trying different kernels

for kernel in kernels:
    
    model = SVC(kernel=kernel)     # Instantiating the model with the kernel

    # Now, Fitting  the model
    model.fit(X, y)
    
    # then, we are plotting the decision boundary
    plt.figure()
    plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
    plt.title(f'SVM with {kernel} Kernel')
    plt.show()


iii)Using another dataset and perform other operations, and simulate as an app

I will use the dataset drug200.csv, our drug recommendation dataset. A drug to be prescribed to patient is predicted using parametes like age, weight,sex,BP, cholestrol, and Na-to-k


step1: Import the libraries and the dataset

In [None]:
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

df = pd.read_csv('drug200.csv')

step2: Encoding categorical features and target variable(Drug)

In [None]:

label_encoders = {}
for column in ['Sex', 'BP', 'Cholesterol']:  # features ('Sex', 'BP', 'Cholesterol')
    le = LabelEncoder()
    df[column] = le.fit_transform(df[column])
    label_encoders[column] = le


le_target = LabelEncoder()
df['Drug'] = le_target.fit_transform(df['Drug']) # target variable (Drug)


Step3: Defining features and target variable/label

In [None]:
X = df[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values 
y = df['Drug'].values  


Step4: Splitting data into training (70%) and testing(30%) datasets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

kernels = ['linear', 'poly', 'rbf', 'sigmoid']  #  different kernels


Step5: Training the model and testing accuracy

In [None]:

for kernel in kernels:
    model = SVC(kernel=kernel)  
    model.fit(X_train, y_train) 
    
    # Testing accuracy
    accuracy = model.score(X_test, y_test)
    print(f'Accuracy for {kernel} kernel: {accuracy:.4f}')

Simple app

==> Now Prompting user input for flexibility in testing the model

Repeating the same code in addition to lines that will be prompting user input

step1: Importing libraries and dataset

In [None]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

data = pd.read_csv('drug200.csv')

step2: Processing the data, declaration of features and target

In [None]:
X = data[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']]  # Features
y = data['Drug']  # Target

step3: Encoding the categorical columns

In [None]:
label_encoder_sex = LabelEncoder()
X.loc[:, 'Sex'] = label_encoder_sex.fit_transform(X['Sex'])  # make F = 0, M = 1

label_encoder_bp = LabelEncoder()
X.loc[:, 'BP'] = label_encoder_bp.fit_transform(X['BP'])  # make  LOW = 0, NORMAL = 1, HIGH = 2

label_encoder_cholesterol = LabelEncoder()
X.loc[:, 'Cholesterol'] = label_encoder_cholesterol.fit_transform(X['Cholesterol'])  # make NORMAL = 0, HIGH = 1

step4: Splitting the dataset, 70% training and 30% testing datasets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

step5: train the model and prompt user input, encode the inputs and predict the drug

We will use Linear kernel because it is the one that had best performance in our model test with the accuracy of 0.9833

In [None]:
model = SVC(kernel='linear', C=1.0)
model.fit(X_train, y_train)

print("Enter the following details to predict the drug type:")

age = float(input("Age: "))
sex = input("Sex (M/F): ").strip().upper()  
bp = input("Blood Pressure (LOW, NORMAL, HIGH): ").strip()
cholesterol = input("Cholesterol (NORMAL, HIGH): ").strip()
na_to_k = float(input("Sodium to Potassium Ratio (Na_to_K): "))

if sex not in ['M', 'F']:
    print("Invalid input for sex! Please enter M or F.")
    sex = 'M'  
if bp not in ['LOW', 'NORMAL', 'HIGH']:
    print("Invalid input for Blood Pressure! Defaulting to NORMAL.")
    bp = 'NORMAL'  
if cholesterol not in ['NORMAL', 'HIGH']:
    print("Invalid input for Cholesterol! Defaulting to NORMAL.")
    cholesterol = 'NORMAL'  
    
sex = label_encoder_sex.transform([sex])[0]   ##transformation starts
bp = label_encoder_bp.transform([bp])[0]
cholesterol = label_encoder_cholesterol.transform([cholesterol])[0]

input_data = pd.DataFrame([[age, sex, bp, cholesterol, na_to_k]], columns=['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K'])

predicted_drug = model.predict(input_data)

print(f"Predicted drug: {predicted_drug[0]}")
