# Visualizing the Iris Dataset

---
**Objective:** In this notebook, we will learn how we can create plots using python Matplotlib library.

##Dataset Description
- **sepal length (cm)**
- **sepal width (cm)**
- **petal length (cm)**
- **petal width (cm)**
- **target:**	Setosa (0), VersiColour (1), and Virginica (2)

###Import necessary Dependencies


In [None]:
# basic libraries
import numpy as np
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt

# additional methods from matplotlib
from matplotlib import markers
import matplotlib.patches as patches
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline

###Import Dataset

In [None]:
iris = datasets.load_iris()
df= pd.DataFrame(                  )
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

df.head()

Unnamed: 0,species
0,setosa
1,setosa
2,setosa
3,setosa
4,setosa


Unnamed: 0,species
0,setosa
1,setosa
2,setosa
3,setosa
4,setosa
...,...
145,virginica
146,virginica
147,virginica
148,virginica


![alt text](https://www.researchgate.net/profile/Wolfgang-Engelmann/publication/265877256/figure/fig3/AS:669555605114885@1536645904238/Flower-of-a-plant-with-peduncle-a-calyx-consisting-of-sepals-petals-stamen-and-in.png)

[Image source](https://www.researchgate.net/figure/Flower-of-a-plant-with-peduncle-a-calyx-consisting-of-sepals-petals-stamen-and-in_fig3_265877256)

# MATPLOTLIB

### 1. Scatter plot
A scatter plot visually represents the correlation between two variables, showcasing the extent to which one variable is influenced by the other. The proximity of the plotted data points to a straight line indicates the strength of the correlation. When the line goes from low x and y values to high x and y values, the variables exhibit a positive correlation. Conversely, if the line descends from a high y-value to a high x-value, the variables display a negative correlation.

In [None]:
#Basic plot
plt.plot(                     )
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.title('Relationship between sepal length & width')
plt.show()

There is no relation between sepal length (cm) & sepal width (cm)

In [None]:
plt.plot(            )
plt.xlabel('sepal length (cm)')
plt.ylabel('petal length (cm)')
plt.title('Relationship between sepal & petal length')
plt.show()

There is a positive correlation between sepal length (cm) & petal length (cm)

In [None]:
plt.plot(                )
plt.xlabel('sepal length (cm)')
plt.ylabel('petal width (cm)')
plt.title('Relationship between sepal length & width')
plt.show()

There is a positive correlation between sepal length (cm) & petal width (cm)

In [None]:
plt.plot(                )
plt.xlabel('sepal width (cm)')
plt.ylabel('petal length (cm)')
plt.title('Relationship between sepal width & petal length')
plt.show()

There is no relation between sepal width (cm) & petal length (cm)

In [None]:
plt.plot(                 )
plt.xlabel('sepal width (cm)')
plt.ylabel('petal width (cm)')
plt.title('Relationship between sepal & petal width')
plt.show()

There is no relation between sepal width (cm) & petal width (cm)

In [None]:
plt.plot( 'petal length (cm)', 'petal width (cm)', data=df, linestyle='none', marker='o')
plt.xlabel('petal length (cm)')
plt.ylabel('petal width (cm)')
plt.title('Relationship between petal length & width')
plt.show()

There is strong positive relation between petal length (cm) & petal width (cm)



### 2. Bubble plot
A bubble chart, also known as a bubble plot, is an expanded version of a scatter plot that allows for the examination of relationships between three numeric variables. Each data point in the chart is represented by its horizontal position, vertical position, and the size of the dot. It is important to note that using too many bubbles can make the chart difficult to interpret, so it is generally not recommended for large datasets. Additionally, when creating a bubble chart, it is crucial to ensure that the area of the circles accurately reflects the values being represented, rather than just the radius. This prevents any exaggeration of the variations in the data.

In [None]:
# use the scatter function  since bubble plot is very close to a scatterplot.
# we will use  ‘s‘ argument to map a third numerical variable to the color of the marker.

#we will use species column to set the color and size of bubbles





print(size)

plt.scatter(df['sepal length (cm)'], df['sepal width (cm)'], s=df.groupby('species')['species'].transform('count'), c=df.species.map(color), marker='o',linewidth=1, alpha=0.5)
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.title('Bubble plot showing relation across sepal length, sepal width, & Iris class')
plt.show()

1. specify the color of bubbles using c arg
2. specify the shape of bubble using marker arg
3. specify edge width using linewidth arg

### 3. Box plot
A boxplot is a standardized visual representation that summarizes the distribution of data using five key values: the minimum, the first quartile (Q1), the median, the third quartile (Q3), and the maximum. It provides insights into the presence of outliers and their specific values. Additionally, a boxplot can indicate whether the data is skewed and the extent of the skewness.

In [None]:
plt.boxplot(                    )
plt.ylabel('sepal length (cm)')
plt.title('Sepal Length Box plot')

**Sepal length:**
1. has no outliers
2. negatively skewed: median closest to the max
3. not symmetric: whiskers are no equals


In [None]:
plt.boxplot(                     )
plt.ylabel('sepal width (cm)')
plt.title('Sepal width Box plot')

**Sepal width:**
1. has outliers
2. postively skewed: median closest to the min
3. not symmetric: whiskers are no equals

In [None]:
plt.boxplot(df['petal length (cm)'])
plt.ylabel('petal length (cm)')
plt.title('petal Length Box plot')

**Petal length:**
1. has no outliers
2. negatively skewed: median closest to the max
3. not symmetric: whiskers are no equals, distance between median &1Q, median &3Q are not equal

In [None]:
plt.boxplot(df['petal width (cm)'])
plt.ylabel('petal width (cm)')
plt.title('petal width Box plot')

**Petal width:**
1. has no outliers
2. negatively skewed: median closest to the min
3. not symmetric: whiskers are no equals, distance between median &1Q, median &3Q are not equal

### 4. 1D Histogram plot
A frequency distribution displays the frequency or occurrence of each unique value within a dataset. It is specifically designed for numeric values. The variable is divided into multiple bins or categories, and the height of the bars represents the number of observations falling within each bin.

In [None]:



plt.title('sepal length histogram')
plt.show()

In [None]:



plt.title('sepal width histogram')
plt.show()

In [None]:
plt.hist(df['petal length (cm)'], 16)
plt.title('petal length histogram')
plt.show()

In [None]:
plt.hist(df['petal width (cm)'], 16)
plt.title('petal width histogram')
plt.show()

### 5. 3D scatter plot


In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')



ax.view_init(30, 185)
plt.show()
