<img src='banner.png' style='width:900px; height:500px'>

# 🧠 **Diabetes Prediction Using SVM**
> *Machine Learning Model to Predict Whether a Patient Has Diabetes Based on Medical Metrics*

---

### 🔧 Step 1: Import Required Libraries

We start by importing essential libraries for:
- **Data handling** (`pandas`)
- **Modeling** (Support Vector Machine via `SVC`)
- **Preprocessing** (`LabelEncoder`)
- **Data Splitting** (`train_test_split`)
- **Model Evaluation** (`accuracy_score`)

```python



In [1]:
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split as tts
from sklearn.metrics import accuracy_score

## 📂 Step 2: Load the Dataset

We begin by loading the diabetes dataset from a CSV file into a Pandas DataFrame for analysis and modeling.

> **File:** `Dataset of Diabetes .csv`

---

*Next, we'll explore and preprocess this data to prepare it for our SVM model.*


In [2]:
df=pd.read_csv('Dataset of Diabetes .csv')

## 🔍 Step 3: Check for Missing Values

Before proceeding, it’s crucial to identify any missing or null values in the dataset that might affect model performance.

This step summarizes the count of null values in each feature column.

---

*Handling missing data properly ensures the integrity of our predictions.*


In [3]:
df.isna().sum()  #Checking for null values

ID           0
No_Pation    0
Gender       0
AGE          0
Urea         0
Cr           0
HbA1c        0
Chol         0
TG           0
HDL          0
LDL          0
VLDL         0
BMI          0
CLASS        0
dtype: int64

## 🧹 Step 4: Data Cleaning and Encoding

- We drop the **`ID`** column since it’s just an identifier and doesn’t help prediction.
- Convert categorical string features into numerical form:
  - Encode **`CLASS`** (target variable) as 0/1 for non-diabetic/diabetic.
  - Encode **`Gender`** as numeric values to use in the model.

---

*Encoding categorical variables is essential for machine learning models like SVM that require numerical input.*


In [4]:
df.drop(columns=['ID'],inplace=True)   #removing unnessecery features
LE=LabelEncoder()
df['CLASS']=LE.fit_transform(df['CLASS'])     #Encoding String values
df['Gender']=LE.fit_transform(df['Gender'])

## 🎯 Step 5: Define Features and Target

- Separate the **independent variables (features)** into `X`
- Separate the **dependent variable (target)**, which is the `CLASS` column, into `y`

This prepares our data for training the machine learning model.

---


In [5]:
x=df.iloc[:,:-1]   #seperating independent and dependent variables
y=df.iloc[:,-1]

## 🔀 Step 6: Split Data into Training and Testing Sets

- We split the dataset into:
  - **Training set (70%)** — used to train the model
  - **Testing set (30%)** — used to evaluate model performance

- The `random_state=4` ensures reproducibility of results.

---

*Splitting data properly prevents overfitting and helps validate the model’s effectiveness on unseen data.*


In [6]:
x_train,x_test,y_train,y_test=tts(x,y,test_size=0.3,random_state=4)  #splitting dataset into training and testind data

## 🧠 Step 7: Train and Predict Using Support Vector Machine (SVM)

- Initialize the SVM classifier with default parameters.
- Train the model on the **training data** (`x_train`, `y_train`).
- Use the trained model to **predict** the target values for the test data (`x_test`).

---

*Support Vector Machines are powerful classifiers that find the optimal boundary between classes.*


In [7]:
svm=SVC()                 #training svm model
svm.fit(x_train,y_train)
y_pred=svm.predict(x_test)

## 📊 Step 8: Evaluate Model Accuracy

- Calculate the **accuracy score** by comparing the predicted labels (`y_pred`) with the true labels (`y_test`).
- Accuracy indicates the proportion of correct predictions made by the model.

---

*An accuracy of 0.84 means the model correctly predicts diabetes status 84% of the time on unseen data.*


In [8]:
acc=accuracy_score(y_test,y_pred)    #calculating accuracy
acc

0.84

# 🎯 Conclusion

In this project, we successfully built a **Support Vector Machine (SVM)** model to predict whether a patient has diabetes based on medical features such as age, gender, blood metrics, and BMI.

- The dataset was cleaned and preprocessed, including encoding categorical variables.
- We split the data into training and testing sets to validate the model.
- The trained SVM model achieved an accuracy of approximately **84%**, indicating good predictive performance.
- This approach demonstrates how machine learning can assist healthcare professionals in early diabetes detection, potentially improving patient outcomes.

---

**Next Steps:**
- Tune the SVM hyperparameters for better accuracy.
- Explore other classification models like Random Forest or XGBoost.
- Incorporate feature engineering and handle any class imbalance.
- Deploy the model in a real-world clinical setting.

Thank you for reviewing this project! 🚀
