This is the most famous dataset in ML and best for beginners who wants to get there hands dirty with ML/Data Science. Having less features and observations of the Iris flowers, no missing values or outliers to deal with, this makes implementing ML models easier and simple.
Since the project is clean and small, we will use this to our advantage and get practice on how to perform data visualization with matplotlib and seaborn (Data Visualization Libraries), implement most used feature selection methods in ML/Data Science project, and apply all classification models on this dataset. This will give us practice and hands on experience on how and when to implement and which works best given the dataset.
- Creator: R.A. Fisher
- Donor: Michael Marshall
This project contains 1 file and 2 folders:
report.ipynb
: This is the main file where I have performed my work on the project.export/
: Folder containing HTML and PDF version file of notebook.plots/
: Contains images of all the plots that are displayed inreport.ipynb
file.
Associated Task | Classification |
Data Set Characteristics | Multivariate |
Attribute Characteristics | Real |
Number of Instances | 150 |
Number of Attributes | 4 |
Missing Values? | No |
Area | Life |
The data set contains 3 classes of 50 instances each, total 150 instances, where each class refers to a type of Iris plant. One class is linearly separable from the other 2 and the latter are not linearly separable from each other.
Predicting attribute: Class of Iris plant.
Attribute Information: We have 4 features in this dataset and a target variable class
.
- sepal length in cm.
- sepal width in cm.
- petal length in cm.
- petal width in cm.
- Class:
- Iris Setosa
- Iris Versicolour
- Iris Virginica
This project was solved with the following versions of libraries installed:
Libraries\Language | Use | Version |
---|---|---|
Python | Language Used for the project | 3.7.0 |
NumPy | For Scientific Computing | 1.15.2 |
Pandas | For Data Analysis | 0.23.4 |
matplotlib | For Visualization | 3.0.0 |
seaborn | For Visualization | 0.9.0 |
scikit-learn | ML Library for training & testing data | 0.20.0 |
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included in it.
You will also need to have software Jupyter Notebook installed to run and execute report.ipynb
file. You can also use Jupyterlab too to run and execute, Jupyterlab is better version of Jupyter Notebook. Instructions to download Jupyterlab can be found here.
In a terminal or command window, navigate to the top-level project directory Iris_Flower
(that contains this README) and run one of the following commands:
ipython notebook report.ipynb
or
jupyter notebook report.ipynb
or if you have 'Jupyter Lab' installed
jupyter lab
This will open the Jupyter/iPython Notebook or Jupyterlab software and project file in your browser.