You are working on a medical diagnosis problem where u need to classify whether a patient has a particular disease based on health indicators like age, bp, cholesterol levels, etc. How do you preprocess the data to ensure it is suitable for training(data cleaning) for SVM in medical diagnosis problem. Calculate accuracy, display  and classification report.


In [11]:
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler

dataset = {
    'age': [25, 30, np.NaN, 40, 45, 50, 55, 60, 65, 70, 75, 30, 35, 40, 45],
    'bp': [120, 122, np.NaN, 130, 135, 140, 142, 145, 148, 150, 152, 123, 128, 133, 138],
    'cholesterol': [180, 190, 200, np.NaN, 220, 230, 240, 250, 260, 270, 280, 195, 205, 215, 225],
    'disease_presence': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1]
}
df = pd.DataFrame(dataset)

# Calculate mean for each feature
mean_age = np.nanmean(df['age'])
mean_bp = np.nanmean(df['bp'])
mean_cholesterol = np.nanmean(df['cholesterol'])

# Replace missing values with mean
df['age'].fillna(mean_age, inplace=True)
df['bp'].fillna(mean_bp, inplace=True)
df['cholesterol'].fillna(mean_cholesterol, inplace=True)

# Convert DataFrame to arrays for further processing
X = df[['age', 'bp', 'cholesterol']].values
y = df['disease_presence'].values

# Feature scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train SVM model
svm_model = SVC()
svm_model.fit(X_train, y_train)


In [12]:
# Make predictions
y_pred = svm_model.predict(X_test)
y_pred

array([1, 0, 0])

In [13]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Display classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Display confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Accuracy: 1.0
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         2
           1       1.00      1.00      1.00         1

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3

Confusion Matrix:
[[2 0]
 [0 1]]
