# **ML Basics Workflow**

**ðŸ”— Open in Google Colab:** [Click here to run this notebook in Colab](https://colab.research.google.com/drive/11sKkvemiI3LbNcdpvspmUV1QsKgXRyU7?usp=sharing)

This notebook demonstrates:
- Loading a toy dataset
- Features & labels
- Train/test split
- Training a classifier and regressor
- Model evaluation

## **1. Load a Toy Dataset**

In [None]:
from sklearn.datasets import load_iris, load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, mean_squared_error
import pandas as pd
import joblib
import numpy as np

iris = load_iris()
diabetes = load_diabetes()

print("Datasets loaded successfully.")

In [None]:
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target
iris_df

In [None]:
iris_df.info()

In [None]:
iris_df.describe()

In [None]:
iris_df['target'].value_counts()

## **2. EDA Using ProfileReport**

In [None]:
!pip install ydata_profiling

In [None]:
from ydata_profiling import ProfileReport
ProfileReport(iris_df)

## **2.5. Prepare Classification Data**


In [None]:
X_class = iris.data
y_class = iris.target

print("Classification features shape:", X_class.shape)
print("Classification labels shape:", y_class.shape)


## **3. Train/Test Split (Classification)**

In [None]:
Xc_train, Xc_test, yc_train, yc_test = train_test_split(
    X_class, y_class, test_size=0.2, random_state=42
)

## **4. Fit a Simple Classifier (Logistic Regression)**

In [None]:
clf = LogisticRegression(max_iter=300)
clf.fit(Xc_train, yc_train)

## **5.5. Export Classifier Model**


In [None]:
# Export the classifier
joblib.dump(clf, 'iris_classifier.pkl')
print("Classifier exported to 'iris_classifier.pkl'")


## **5. Compute Accuracy & Explanation**

In [None]:
preds = clf.predict(Xc_test)
acc = accuracy_score(yc_test, preds)

print("Accuracy:", acc)
print("Accuracy = percentage of correctly predicted samples.")

## **6. Regression Example (Diabetes Dataset)**

In [None]:
X_reg = diabetes.data
y_reg = diabetes.target

print("Features shape:", X_reg.shape)
print("Labels shape:", y_reg.shape)
print("Example feature row:", X_reg[0])
print("Example target:", y_reg[0])

## **7. Train/Test Split (Regression)**

In [None]:
Xr_train, Xr_test, yr_train, yr_test = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42
)

## **8. Fit a Simple Regressor (Linear Regression)**

## **9.5. Export Regressor Model**


In [None]:
# Export the regressor
joblib.dump(reg, 'diabetes_regressor.pkl')
print("Regressor exported to 'diabetes_regressor.pkl'")


## **10. Load and Use Exported Classifier for Prediction**


In [None]:
# Load the exported classifier
loaded_clf = joblib.load('iris_classifier.pkl')

# Get user input for iris flower features
print("Enter the features of the iris flower you want to classify:")
print("(Typical ranges: sepal length 4-8, sepal width 2-5, petal length 1-7, petal width 0-3)")
print()

sepal_length = float(input("Enter sepal length (cm): "))
sepal_width = float(input("Enter sepal width (cm): "))
petal_length = float(input("Enter petal length (cm): "))
petal_width = float(input("Enter petal width (cm): "))

# Create input array
unseen_iris = [[sepal_length, sepal_width, petal_length, petal_width]]

# Make prediction
prediction = loaded_clf.predict(unseen_iris)
prediction_proba = loaded_clf.predict_proba(unseen_iris)

print("\n" + "="*50)
print("PREDICTION RESULTS:")
print("="*50)
print(f"Input features: {unseen_iris[0]}")
print(f"Predicted class: {prediction[0]} ({iris.target_names[prediction[0]]})")
print(f"\nPrediction probabilities:")
for i, (name, prob) in enumerate(zip(iris.target_names, prediction_proba[0])):
    print(f"  {name}: {prob:.4f} ({prob*100:.2f}%)")


## **11. Load and Use Exported Regressor for Prediction**


In [None]:
# Load the exported regressor
loaded_reg = joblib.load('diabetes_regressor.pkl')

# Get user input for diabetes patient features
print("Enter the 10 features for the diabetes patient:")
print("(Note: These are standardized features, typically in range -0.1 to 0.1)")
print("Feature names: age, sex, bmi, bp, s1, s2, s3, s4, s5, s6")
print()

features = []
feature_names = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
for i, name in enumerate(feature_names, 1):
    value = float(input(f"Enter {name} (feature {i}/10): "))
    features.append(value)

# Create input array
unseen_diabetes = np.array([features])

# Make prediction
regression_prediction = loaded_reg.predict(unseen_diabetes)

print("\n" + "="*50)
print("PREDICTION RESULTS:")
print("="*50)
print(f"Input features: {unseen_diabetes[0]}")
print(f"Predicted diabetes progression: {regression_prediction[0]:.2f}")
print("\nNote: The prediction represents the disease progression")
print("one year after baseline (higher values indicate more progression).")


In [None]:
reg = LinearRegression()
reg.fit(Xr_train, yr_train)

## **9. Evaluate Regression Model (MSE)**

In [None]:
reg_preds = reg.predict(Xr_test)
mse = mean_squared_error(yr_test, reg_preds)

print("Mean Squared Error:", mse)
print("MSE = average squared difference between predictions and actual values.")