# Here we will see how the diabetes predictor model has trained

First we need a dataset to train our model. In this model I have used PIMA Indians Diabetes Dataset from Kaggle.

Step 1: Setting up Machine Learning (ML) Program
```bash
diabetes-prediction/
│
├── data/
│   └── diabetes.csv         # Dataset
├── model/
│   └── diabetes_model.py     # Script to build and train the model
├── saved_model/
│   └── model.pkl             # The saved model (after training)
└── requirements.txt          # Python dependencies

```

Step 2: Collect the dataset

Download the PIMA Indians Diabetes Dataset from [Kaggle](https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database).

Step 3: Install Required Dependencies

```bash
# requirements.txt
pandas==2.1.0
numpy==1.25.0
scikit-learn==1.3.0
joblib==1.2.0
```

```bash
pip install -r requirements.txt
```

Step 4: Preprocess the Data

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
data_path = 'data/diabetes.csv'
data = pd.read_csv(data_path)

# Select key features and the target variable
X = data[['Glucose', 'BloodPressure', 'BMI', 'Age']]
y = data['Outcome']  # Target (whether or not the person has diabetes)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Data preprocessed successfully.")

Step 5: Build and Train the Model

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Initialize and train model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

# Predictions
y_pred = model.predict(X_test_scaled)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Step 6: Save the Model



In [4]:
import joblib

# Save the trained model
joblib.dump(model, 'saved_model/model.pkl')

# Save the scaler
joblib.dump(scaler, 'saved_model/scaler.pkl')

print("Model and scaler saved.")

This will create two files in the saved_model/ directory:

    model.pkl: The trained RandomForest model.
    scaler.pkl: The scaler used for preprocessing (important for making predictions later).

Step 7: Test Loading the Model (Optional)

In [6]:
import joblib
# Test loading the saved model
loaded_model = joblib.load("saved_model/model.pkl")
loaded_scaler = joblib.load("saved_model/scaler.pkl")

# Test prediction with loaded model and scaler
test_input = X_test.iloc[0:1, :]  # Take one test example
test_input_scaled = loaded_scaler.transform(test_input)
predicted = loaded_model.predict(test_input_scaled)

print(f"Predicted: {predicted}")
