π§ ML Basics & Optimization β Task 2
This project demonstrates the fundamental machine learning workflow using the Iris dataset. It covers building a baseline model, applying basic optimizations, and evaluating performance improvements β all in a clean, reproducible setup.
π Project Structure ML Basics & Optimization/ ββ data/ # dataset folder (empty for now, using sklearn built-in) ββ outputs/ # model outputs, plots, and reports β ββ classification_report.txt β ββ confusion_matrix.png β ββ task2_iris_best_model.joblib ββ src/ β ββ task2_iris.py # main script for Task 2 ββ requirements.txt # dependencies ββ venv/ # virtual environment (not uploaded)
π― Objective
To:
Build a baseline ML model on the Iris dataset.
Apply simple optimization techniques such as feature scaling and hyperparameter tuning.
Compare results and visualize model performance.
π§© Dataset
Name: Iris Dataset (built-in from scikit-learn)
Samples: 150
Features:
sepal length (cm)
sepal width (cm)
petal length (cm)
petal width (cm)
Classes (targets):
0 = Setosa
1 = Versicolor
2 = Virginica
βοΈ Environment Setup 1οΈβ£ Clone or create the folder
Make sure your folder is named ML Basics & Optimization
2οΈβ£ Create and activate a virtual environment πͺ On Windows (PowerShell) python -m venv venv .\venv\Scripts\Activate.ps1
π§ On macOS / Linux python3 -m venv venv source venv/bin/activate
3οΈβ£ Install dependencies pip install -r requirements.txt
If any install fails, upgrade pip first: python -m pip install --upgrade pip
π How to Run the Project
From inside the project folder (with the virtualenv activated):
python "src/task2_iris.py"
This will:
Train a baseline Logistic Regression model
Train an improved model using StandardScaler + GridSearchCV
Save:
β classification_report.txt
β confusion_matrix.png
β task2_iris_best_model.joblib
All outputs will be saved in the outputs/ folder.
π Results Summary Model Description Accuracy Baseline Logistic Regression (no scaling, default params) 0.9667 Improved StandardScaler + GridSearchCV (tuned C) 0.0333
π§© Performance Gain: 0.0333 - 1.0 (improvement after optimization)
π Outputs Explained File Description classification_report.txt Precision, recall, and F1-score per class confusion_matrix.png Visualization of true vs predicted classes task2_iris_best_model.joblib Saved trained pipeline (StandardScaler + model) π§ Key Learnings
How to structure a basic ML project cleanly.
How to use the scikit-learn pipeline and grid search.
How scaling improves model convergence.
Importance of evaluation metrics beyond raw accuracy.