**1. Data Loading:** Load the Iris dataset. You can use libraries like pandas or scikit-learn's datasets module.

**2. Data Exploration:** Perform some exploratory data analysis to understand your dataset better. Look at the number of samples, feature names, target names, etc.

**3. Data Visualization:** Visualize the data using libraries like matplotlib or seaborn. You can create scatter plots, histograms, etc.

**4. Data Preprocessing:** Preprocess the data if necessary. This might include scaling the features using StandardScaler or MinMaxScaler.

**5. Train-Test Split:** Split the dataset into a training set and a test set. You can use the train_test_split function from scikit-learn.

**6. Model Creation:** Create an SVM model using scikit-learn's SVC class. You can start with default parameters.

**7. Model Training:** Train the model on the training data using the fit method.

**8. Model Evaluation:** Evaluate the model on the test data. You can look at metrics like accuracy, precision, recall, etc.

**9. Prediction:** Finally, use the trained model to make predictions on new data.

**10. Confusion Matrix:** SHow the confusion matrix of SVM

11. MAE, MSE: Calculate bold text MAE, MSE for SVM




In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Load Dataset

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score

# Load the dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/diabetes_prediction_dataset (1).csv")  # Replace "your_dataset.csv" with the path to your dataset file


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 9 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   gender               100000 non-null  object 
 1   age                  100000 non-null  float64
 2   hypertension         100000 non-null  int64  
 3   heart_disease        100000 non-null  int64  
 4   smoking_history      100000 non-null  object 
 5   bmi                  100000 non-null  float64
 6   HbA1c_level          100000 non-null  float64
 7   blood_glucose_level  100000 non-null  int64  
 8   diabetes             100000 non-null  int64  
dtypes: float64(3), int64(4), object(2)
memory usage: 6.9+ MB


# PreProcessing

In [None]:
# Encode smoking history
smoking_map = {
    'not current': 0,
    'former': 1,
    'No Info': 2,  # We'll handle this later
    'current': 1,
    'never': 0,
    'ever': 0
}

df['smoking_history'] = df['smoking_history'].map(smoking_map)

# Encode gender history
gender = {
    'Female': 0,
    'Male': 1,
    'Other':2
}

df['gender'] = df['gender'].map(gender)

In [None]:
df.isna().sum()

gender                 0
age                    0
hypertension           0
heart_disease          0
smoking_history        0
bmi                    0
HbA1c_level            0
blood_glucose_level    0
diabetes               0
dtype: int64

In [None]:
df.head()

Unnamed: 0,gender,age,hypertension,heart_disease,smoking_history,bmi,HbA1c_level,blood_glucose_level,diabetes
0,0,80.0,0,1,0,25.19,6.6,140,0
1,0,54.0,0,0,2,27.32,6.6,80,0
2,1,28.0,0,0,0,27.32,5.7,158,0
3,0,36.0,0,0,1,23.45,5.0,155,0
4,1,76.0,1,1,1,20.14,4.8,155,0


# Dataset Split

In [None]:
# Splitting into features and target
X = df.drop(['diabetes'], axis=1)
y = df['diabetes']

# Splitting into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression

In [None]:
# Train Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

# Predict
lr_y_pred = lr_model.predict(X_test)

# Support Vector Machine(SVM)

In [None]:
# Train SVM model
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)

# Predict
svm_y_pred = svm_classifier.predict(X_test)

# Decision Tree

In [None]:
# Train Decision Tree Classifier
dt_classifier = DecisionTreeClassifier()
dt_classifier.fit(X_train, y_train)

# Predict
dt_y_pred = dt_classifier.predict(X_test)

# Accuracy Score, Confusion Matrix

In [None]:
# For Linear Regression
lr_accuracy = lr_model.score(X_test, y_test)
print("Linear Regression Accuracy:", lr_accuracy)

# For SVM
svm_accuracy = accuracy_score(y_test, svm_y_pred)
print("SVM Accuracy:", svm_accuracy)

# For Decision Tree Classifier
dt_accuracy = accuracy_score(y_test, dt_y_pred)
print("Decision Tree Accuracy:", dt_accuracy)




Linear Regression Accuracy: 0.344349432304604
SVM Accuracy: 0.9585
Decision Tree Accuracy: 0.95405


In [None]:
# Confusion Matrix for Decision Tree
dt_conf_matrix = confusion_matrix(y_test, dt_y_pred)
print("Confusion Matrix for Decision Tree:")
print(dt_conf_matrix)

Confusion Matrix for Decision Tree:
[[17813   479]
 [  440  1268]]


In [None]:
#for svm
cm = confusion_matrix(y_test, dt_y_pred)
print(cm)

In [None]:
# Assuming dt_y_pred contains the predicted labels and y_test contains the true labels
conf_matrix = confusion_matrix(y_test, dt_y_pred)

# Extracting values
TN = conf_matrix[0][0]
FP = conf_matrix[0][1]
FN = conf_matrix[1][0]
TP = conf_matrix[1][1]

print("True Positive (TP):", TP)
print("False Positive (FP):", FP)
print("False Negative (FN):", FN)
print("True Negative (TN):", TN)

True Positive (TP): 1268
False Positive (FP): 479
False Negative (FN): 440
True Negative (TN): 17813


In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Calculate MAE for Decision Tree Classifier
dt_mae = mean_absolute_error(y_test, dt_y_pred)
print("Mean Absolute Error (MAE) for Decision Tree:", dt_mae)

# Calculate MSE for Decision Tree Classifier
dt_mse = mean_squared_error(y_test, dt_y_pred)
print("Mean Squared Error (MSE) for Decision Tree:", dt_mse)

# Calculate RMSE for Decision Tree Classifier
dt_rmse = np.sqrt(dt_mse)
print("Root Mean Squared Error (RMSE) for Decision Tree:", dt_rmse)


Mean Absolute Error (MAE) for Decision Tree: 0.04595
Mean Squared Error (MSE) for Decision Tree: 0.04595
Root Mean Squared Error (RMSE) for Decision Tree: 0.21435951110226018


In [None]:
#exploratory Learning
from sklearn import datasets, svm
from sklearn.model_selection import cross_val_score
# Perform cross-validation
scores = cross_val_score(model, X, y, cv=5)

print(f"Cross-validation scores: {scores}")
print(f"Average cross-validation score: {scores.mean()}")