Chin Hock Yang
This project aims to perform data analysis on Iris flower classes in a garden and to create a program that is capable of making predictions of a new Iris flower. The project repository contains of 2 main files, analysis.ipynb and app.py.
This Jupyter Notebook assesses the quality of the Iris data, and constructs Supervised Machine Learning models such as Decision Tree, Logistic Regression and K-Nearest Neighbours models that are capable of making predictions on a new Iris flower data. It also contains codes that perform Unsupervised Learning such as identifying data clusters using K-Means Clustering and using Cosine-Similarity to measure the "similarity" between data points. A Logistic Regression model is generated and stored in the model folder within this notebook, which will be used by the programme created in app.py.
This Python script contains a setup of a Plotly Dash dashboard application. Presently the application is best-fitted for usage on Google Chrome browser on a laptop or computer monitor display. There are 3 main sections in this application.
| Section | Description |
|---|---|
| Data Exploration | View the proportion of the Iris classes, and inspect for missing data in the dataset |
| Comparison across Iris Types | Visualise the distribution and differences of the various attributes and characteristics of the Iris types |
| New Data Prediction | Provide value of the attributes of the new Iris data, and view the predicted Iris type and 10 most similar existing Iris data to the input Iris data |
New Data Comparison to Existing Data
This project uses local Virtual Environment to manage project dependencies. (environment creation and activation guide). To setup the repository, Python (version 3.8 and above) and Pip (package installer) will need to be first installed on the local machine.
- In the project root directory, run
python -m venv venvto create a virtual environment foldervenvat the root level of the repository - Activate the environment by running
source venv/bin/activate(for Mac Users) orvenv\Scripts\activate(for Windows Users) - Run the command
pip install -r requirements.txtafter activating virtual environment to install all necessary dependencies.
- Download the data
iris.datafrom the data source and rename the file asiris.csv. This file should contain 5 columns of data and should not contain any header rows. - Store the
iris.csvin thedatafolder. - If the name of the main dataset is changed, the filename within
pd.read_csvfunction in Section 1.2 ofanalysis.ipynband theIRIS_CSV_DATA_FILENAMEvariable inapp.pywill need to be updated accordingly.
- Run all the cells in the
analysis.ipynbJupyter Notebook to generate abest_model.pklfile (containing a fitted Logistic Regression model) - Within the notebook, changes and improvements can be made to the model and the model can be saved into the
modelfolder using thesave_filefunction in the notebook. - If the name of the main predictive model is changed, the filename within
load_filefunction in Section 6.3 ofanalysis.ipynband theMODEL_PKL_FILENAMEvariable inapp.pywill need to be updated accordingly.
- Activate the virtual environment in the terminal or command prompt
- Run
python app.pyto start the Dashboard - The app should be running on http://localhost:8050/


