Provides a tutorial for working with machine learning algorithms in Python. Adapted from the following tutorial.
machine-learning
│ 01_check_versions.py
│ 02_check_imports.py
│ 03_analyze_flowers.py
│ environment.yml
│ Makefile
│ README.md
│
├───data
│ iris.csv
│
└───figures
figure-01-box-plot.png
figure-02-histogram.png
figure-03-scatter-matrix.png
figure-04-spot-check.png
Displays version numbers for the packages installed in the Conda environment.
Checks that all packages necessary for the iris flower workflow import correctly.
Machine learning tutorial analyzing iris flowers.
Provides recipe for the Conda environment.
Provides recipes for automating the project.
Contains input data used by the 03_analyze_flowers.py
script.
Contains all figures created by the 03_analyze_flowers.py
script. Note this directory is empty by default. The project tree diagram shows what it looks like once the 03_analyze_flowers.py
script has completed.
Install Miniconda:
- Download the latest Miniconda installer for your operating system
- Follow the Miniconda installation instructions (Windows, Linux, macOS)
Open terminal:
- Open the operating system default terminal
Clone GitLab repository:
(base) > git clone https://github.com/calekochenour/machine-learning-tutorial.git
Create Conda environment:
(base) > cd machine-learning-tutorial
(base) > conda env create -f environment.yml
Activate Conda environment:
(base) > conda activate machine-learning-tutorial
Launching the repository with Binder will open a JupyterLab instance and set up the Conda enviroment automatically. The scripts can then be run direclty within this JupyterLab instance. Note that the Binder is non-persistent; once closed, any file changes will not be saved.
Click the icon below to launch the project with Binder (also listed at the top of the repository). It may take some time to set up and congfigure the environment.
These examples provide syntax to run all scripts, followed by expected outputs. Note that some results for 03_analyze_flowers.py
may differ slightly from what is provided here, due to inherent randomness within the classification algorithms.
Script 1 - check versions:
(machine-learning-tutorial) > python 01_check_versions.py
Python: 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:01:37) [MSC v.1935 64 bit (AMD64)]
matplotlib: 3.8.1
numpy: 1.26.0
pandas: 2.1.3
scikit-learn/sklearn: 1.3.2
scipy: 1.11.3
Script 2 - check imports:
(machine-learning-tutorial) > python 02_check_imports.py
SUCCESS: Imported packages without error.
Script 3 - analyze flowers:
(machine-learning-tutorial) > python 03_analyze_flowers.py
Shape: (150, 5)
First 20 records:
sepal-length sepal-width petal-length petal-width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa
10 5.4 3.7 1.5 0.2 Iris-setosa
11 4.8 3.4 1.6 0.2 Iris-setosa
12 4.8 3.0 1.4 0.1 Iris-setosa
13 4.3 3.0 1.1 0.1 Iris-setosa
14 5.8 4.0 1.2 0.2 Iris-setosa
15 5.7 4.4 1.5 0.4 Iris-setosa
16 5.4 3.9 1.3 0.4 Iris-setosa
17 5.1 3.5 1.4 0.3 Iris-setosa
18 5.7 3.8 1.7 0.3 Iris-setosa
19 5.1 3.8 1.5 0.3 Iris-setosa
Summary:
sepal-length sepal-width petal-length petal-width
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
Class distribution: class
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
dtype: int64
Algorithm evaluation:
LR: 0.942 (0.065)
LDA: 0.975 (0.038)
KNN: 0.958 (0.042)
CART: 0.933 (0.05)
NB: 0.95 (0.055)
SVM: 0.983 (0.033)
Algorithm prediction:
Accuracy score: 0.967
Confusion matrix:
[[11 0 0]
[ 0 12 1]
[ 0 0 6]]
classification report:
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 11
Iris-versicolor 1.00 0.92 0.96 13
Iris-virginica 0.86 1.00 0.92 6
accuracy 0.97 30
macro avg 0.95 0.97 0.96 30
weighted avg 0.97 0.97 0.97 30
To run the scripts in a more automated way, use the Makefile recipes. There is one recipie for each script. In addition, there is a recipe to run all three scripts in succession. Finally, there is a recipe to delete the figures created by the 03_analyze_flowers.py
script. Outputs are omitted from these commands, as they will match the outputs shown in the previous section.
Script 1 - check versions:
(machine-learning-tutorial) > make versions
Script 2 - check imports:
(machine-learning-tutorial) > make imports
Script 3 - analyze flowers:
(machine-learning-tutorial) > make flowers
Run all three scripts in succession:
(machine-learning-tutorial) > make all
or
(machine-learning-tutorial) > make
Delete figures created by script 3 - analyze flowers (Windows):
(machine-learning-tutorial) > make clean-windows
Delete figures created by script 3 - analyze flowers (Linux - for use with Binder):
(machine-learning-tutorial) > make clean-linux