Skip to content

This GitHub repository contains a comprehensive analysis of the popular Iris dataset using various machine learning algorithms, including Logistic Regression, Support Vector Machines (SVM), and Random Forest. Additionally, it explores the impact of different data split ratios (80-10-10 vs. 60-20-20) on model performance.

License

Notifications You must be signed in to change notification settings

VaderSame/Iris-Dataset

Repository files navigation

Red Wine Quality Analysis

This repository contains a machine learning project to classify different types of iris plants based on their features. The dataset used is the famous Iris dataset, often used in pattern recognition and machine learning literature.

Table of Contents

  1. Project Overview
  2. Dataset
  3. Instructions
  4. Requirements
  5. Getting Started
  6. License

Project Overview

This project analyzes the quality of red wine based on various attributes using Python and Jupyter Notebook. It includes data exploration, data preparation, modeling with two regression algorithms, model evaluation, feature importance analysis, and a conclusion summarizing key insights.

Dataset

The dataset used in this project is the "Red Wine Quality" dataset, found in the winequality-red.csv file. It contains various chemical and sensory attributes of red wines, along with a quality rating. This project is about predicting the quality of red wine using machine learning algorithms.

Instructions

Project Structure

The project is organized into several phases using the same Jupyter Notebook: "Red Wine Quality.ipynb."

Data Exploration and Preparation

  1. Open the Jupyter Notebook.
  2. Follow the code and documentation to perform data exploration and preparation.
  3. Ensure the dataset is cleaned, and data is ready for modeling.

Regression Modeling

  1. Implemented two regression algorithms for wine quality prediction.
  2. Trained and evaluated the models.

Model Evaluation

  1. Included results for three metrics (e.g., RMSE, MAE, R-squared) for each model.
  2. Compared the results to identify the better-performing model.

Feature Importance Analysis

  1. Calculated and visualized feature importances for at least one of the regression models.

Conclusion

  1. Summarize key insights obtained from the analysis.
  2. Include one limitation of the analysis.
  3. Comment on future work that could be done to improve the analysis.

Requirements

To run the project, you need the following libraries and tools installed:

  • Python 3.x
  • Jupyter Notebook
  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Scikit-learn

Getting Started

  1. Clone this repository to your local machine.
  2. Install the required libraries using pip install -r requirements.txt.
  3. Open the Jupyter Notebook "Red Wine Quality.ipynb" to start your analysis.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

This GitHub repository contains a comprehensive analysis of the popular Iris dataset using various machine learning algorithms, including Logistic Regression, Support Vector Machines (SVM), and Random Forest. Additionally, it explores the impact of different data split ratios (80-10-10 vs. 60-20-20) on model performance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published