Skip to content

Using EDA, IRIS Data Set, one of the oldest data sets created in 1936, has been explored in this repository.

Notifications You must be signed in to change notification settings

deveshSingh06/Exploratory_Data_Analysis_EDA_on_IRIS_Data_Set

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 

Repository files navigation

Exploratory-Data-Analysis-EDA-on-IRIS-Data-Set

Using EDA I have tried to explore IRIS Data Set, one of the oldest data sets created in 1936. For information on IRIS Data Set, read this.

What is Exploratory Data Analysis(EDA) ?

  • It is a task of analyzing the given data using tools from Statistics, Linear Algebra, Plotting tools, etc.
  • It is the very first thing to do before providing a model using machine learning.
  • Here, we explore the given data as much as we can, hence the name Exploratory.

Note: While exploring the data one must also try to clean the data by using some data cleaning techniques such as deduplication, removing extra spaces, change text to the proper case, spell check, etc. To learn about some data cleaning and pre-processing techniques, check my notebook here.

Balanced Data Set

  • The IRIS Data Set used for this problem is balanced data set.
  • That is, each species of the flowers in the data set consists of equal number of data points as given below:
  1. Iris Virginica : 50
  2. Iris Versicolor : 50
  3. Iris Setosa : 50

Features Used

The following features of the flowers are used for the analysis:

  • Sepal Length
  • Sepal Width
  • Petal Length
  • Petal Width

EDA on the data set has been applied using the following:

  • Univariate Analysis
    • An analysis performed on a single variable(a unique feature) is known as a univariate analysis.
    • The following plots fall under the univariate analysis:
      • PDF(s)
      • CDF(s)
      • Box Plots
      • Violin Plots
  • Bivariate Analysis
    • An analysis performed on two variable(two features) is known as a bivariate analysis.
    • The following plots fall under the bivariate analysis:
      • Pair-Plots
      • Scatter Plots

About

Using EDA, IRIS Data Set, one of the oldest data sets created in 1936, has been explored in this repository.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published