# Iris dataset : exploration and visualization
![image info](https://miro.medium.com/max/1400/1*f6KbPXwksAliMIsibFyGJw.png)

Ref: https://towardsdatascience.com/the-iris-dataset-a-little-bit-of-history-and-biology-fb4812f5a7b5

As shown in the figure above, there are indeed morphological similarities between the three flowers. In this notebook, we will explore the different measurements related to these flowers, and see if we can discriminate the type of flower based on these measurements.


In this notebook you will learn:
* Examining a data type images
* Indexing and selecting segments/slices from our DataFrame
* Descriptive statistics on data type images
* Filtering using boolean operations

## Load data
upload the csv file in the brief and put it in the same directory as this notebook.

In [None]:
# load pandas library 
import pandas as pd

In [None]:
# Use the read_csv method to read the file
with open("iris_csv.csv", "r") as file:
    df = pd.read_csv(file)


In [None]:
# display the header of the df dataframe using the .head() method
df.head()

In [None]:
# Explore the data using summary statistics, use the .describe() method
df.describe()

In [None]:
# display the names of the columns.
df.columns

In [None]:
# show 2nd column
df.sepal_width

## Indexing and selection of segments/slices in our DataFrame



In [None]:
# create a sub_df with two columns "sepal_length" and "class"
sub_df = df[["sepal_length", "flower"]]
sub_df

In [None]:
# take the first 10 rows of the petal_length column in a sub_df2 dataframe
sub_df2 = df["petal_length"].head(10)
sub_df2

In [None]:
# get all the instances with "sepal_length" > 5
df[df["sepal_length"] > 5]

## Descriptive statistics on DataFrames


In [None]:
# calculate the max, min, mean of the sepal_width column
df.sepal_width.describe()

In [None]:
# calculate the sum of the values in column 2
df.sepal_width.sum()

In [None]:
# calculate the maximum value of each instance of the 4 columns "sepallength","sepalwidth", "petallength", "petalwidth"
df.drop("flower", axis = 1).max()

In [None]:
# check that the maximum value is always the one of "sepal_length"
df.drop("flower", axis = 1).idxmax(axis=1).value_counts()

In [None]:
# Are there any missing values in the database?
df.isnull().value_counts()

In [None]:
# display flower types in Iris dataset (hint the type is unique)
df.flower.unique()

In [None]:
# how many flowers for each type
df.flower.value_counts()

In [None]:
# what is the average by type for each column ("sepalength", "sepalwidth", "petallength", "petalwidth")
df.groupby('flower').mean()

## Visualization

In [None]:
# load the library matplotlib.pyplot
import matplotlib.pyplot as plt

In [None]:
# display the number of instances in each class by a bar chart
df.flower.value_counts().plot.bar(xlabel = 'Class', ylabel = 'Count')

In [None]:
# display a scatter plot between x: "sepal_length", and y: "petal_length", 
# where each flower type has its own color

import seaborn as sns

sns.scatterplot(x='sepal_length', y='petal_length', hue='flower', data=df)

Bravo !