![logo](../../img/license_header_logo.png)
> **Copyright &copy; 2021 CertifAI Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). <br>
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0**

# 02 - Creating different types of graphs with Pyplot
## Introduction
There are different types of plots available in Matplotlib, each has its usage with certain specific data. Proper selection of plots is very essential and this needs to be understood before moving forward with the creation of plots. The most commonly used plots are:
1. Bar Plots
2. Histograms
3. Pie Plots
4. Area Plots
5. Scatter Plots
6. Time Series Graph

## Notebook Outline
Below is the outline for this tutorial:
1. [Histograms](#Histograms)
2. [Scatter plot](#Scatter)
    1. [Scatter plot with groups / categorical data](#categorical)
3. [Box plot](#Box)
    1. [Multiple box plot](#Multiple) 
4. [Customizing Matplotlib with rcParams](#Customizing)
5. [Summary](#Summary)
6. [Reference](#Reference)



## What will we accomplish?
In this hands-on, we will focus on creating the histogram, scatter plot, box plot and the way to plot with categorical variable. Beside, we will demonstrate how to fix the figure size for all the plot by using `rcParams`
    
First, let's import the Pyplot library

In [None]:
# YOUR CODE HERE


## <a name="Histograms">Histograms</a>
Histograms are created using `plt.hist()` function. It has an attribute `bin` that takes into input the range/nature of bins you want for the histogram. Leaving the `bin` attribute empty will assign the default bin value which is 10. For details, please refer to [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)
![image](https://user-images.githubusercontent.com/59526258/113844926-917b6000-97c7-11eb-9403-f0f76f8f9e6e.png)


For example, you have to collect personal height data from 250 students. You plan to use a histogram to visualize your data.
>**Expected Result** :<br>
![image](https://user-images.githubusercontent.com/59526258/114672697-01459980-9d38-11eb-940c-994f2e187d9c.png)

In [None]:
np.random.seed(0)
# Random generate data with mean = 170 , standard deviation = 10 for 250 student
x = np.random.normal(170, 10, 250)
# YOUR CODE HERE


You can specify the limit of the bins by assigning it to `bins` attributes
>**Expected Result** :<br>
![image](https://user-images.githubusercontent.com/59526258/114672870-2fc37480-9d38-11eb-880e-372f109e6731.png)

In [None]:
bins = np.arange(120,220,5)
# YOUR CODE HERE


##  <a name="Scatter">Scatter plot</a>
Scatter plot can be created using `plt.scatter()` function. It has some useful attributes that we need to encounter such as : 
![image](https://user-images.githubusercontent.com/59526258/113953545-a64d0780-984a-11eb-9222-6cc37649ed0d.png)
Please refer [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) for more details in the scatter plot.

As an example, we take residential home sales in Ames, Iowa between 2006 and 2010 as data for scatter plot visualization. The data set contains many explanatory variables on the quality and quantity of physical attributes of residential homes in Iowa sold between 2006 and 2010. Most of the variables describe information a typical home buyer would like to know about a property (square footage, number of bedrooms and bathrooms, size of lot, etc.). 

In [None]:
import pandas as pd
house_data = pd.read_csv('https://raw.githubusercontent.com/josephpconley/R/master/openintrostat/OpenIntroLabs/(4)%20lab4/data%20%26%20custom%20code/AmesHousing.csv')
house_data.head()

We will create a scatter plot by taking Living Area Above Ground(Gr Liv Area) and Sales Price as axes to study the correlation between them. Let's take `alpha = 0.5` to create a better result.
>*Notes: alpha -> The alpha blending value, between 0 (transparent) and 1 (opaque).*

>**Expected Result** :<br>
![image](https://user-images.githubusercontent.com/59526258/114673083-69947b00-9d38-11eb-97c1-5b7bf4155d63.png)

In [None]:
x= house_data['Gr Liv Area']
y = house_data['SalePrice']

# YOUR CODE HERE


As we can observe from the scatter plot, the more area there is above ground level, the higher the price of the house was.

###  <a name="categorical">Scatter plot with groups / categorical data</a>
Categorical data can be visualized using a scatter plot to have a clear picture of how the data is been grouped.

>**Expected Result** :<br>
![image](https://user-images.githubusercontent.com/59526258/114673233-8cbf2a80-9d38-11eb-8843-38d99aaed96c.png)


In [None]:
# Create data
N = 60
g1 = (0.6 + 0.6 * np.random.rand(N), np.random.rand(N))
g2 = (0.4+0.3 * np.random.rand(N), 0.5*np.random.rand(N))
g3 = (0.3*np.random.rand(N),0.3*np.random.rand(N))

data = (g1, g2, g3)
colors = ("red", "green", "blue")
groups = ("coffee", "tea", "water")
# YOUR CODE HERE



## <a name="Box">Box plot</a>
>*A Box Plot is also known as Whisker plot is created to display the summary of the set of data values having properties like minimum, first quartile, median, third quartile and maximum. In the box plot, a box is created from the first quartile to the third quartile, a verticle line is also there which goes through the box at the median. Here x-axis denotes the data to be plotted while the y-axis shows the frequency distribution.* - ***GeeksforGeeks***

![image](https://user-images.githubusercontent.com/59526258/113961101-18781900-9858-11eb-9ee4-68bb19efb9f0.png)
For box plot details, please refer to [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html)

Let's start with a single box plot
>**Expected Result** :<br>
![image](https://user-images.githubusercontent.com/59526258/114673463-cb54e500-9d38-11eb-839d-74926d13e3d0.png)

In [None]:
# Creating dataset
np.random.seed(10)
data = np.random.normal(100, 10, 200)
  
# Creating plot

# YOUR CODE HERE


As you can notice, the mean of the data is around 100, lower quartile is about 95, upper quartile is about 106. Besides, the data contains few outliers.

>*Notes: You may convert the data into Pandas series and use `df.describe()` to print the statistical value*

In [None]:
series = pd.Series(data)
series.describe()

### <a name="Multiple">Multiple box plot </a>
Let's try to create multiple box plot
>**Expected Result** :<br>
![image](https://user-images.githubusercontent.com/59526258/114673641-fb03ed00-9d38-11eb-9229-24b8e9b5dfbf.png)

In [None]:
# Creating dataset
np.random.seed(10)
  
data_1 = np.random.normal(100, 10, 200)
data_2 = np.random.normal(90, 20, 200)
data_3 = np.random.normal(80, 30, 200)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4]

# YOUR CODE HERE


##  <a name="Customizing">Customizing Matplotlib with rcParams</a>
Each time Matplotlib loads, it defines a runtime configuration (rc) containing the default styles for every plot element you create. This configuration can be adjusted at any time using the `plt.rc` convenience routine. For details, please refer to [here](https://jakevdp.github.io/PythonDataScienceHandbook/04.11-settings-and-stylesheets.html).

Let's try to change the figure size to (10,8) using `plt.rcParams["figure.figsize"]`

In [None]:
# YOUR CODE HERE


Let's plot the box plot again and observe the difference 

In [None]:
# Creating plot
plt.boxplot(data)
  
# show plot
plt.show()

As you notice, the figure size has enlarged to (10,8) without specifically using `plt.figure()`. Please feel free to try on other plots as well.

# <a name="Summary">Summary</a>
From this tutorial, you should have learned:
1. Creating different types of graphs with Pyplot.
2. Familiar with the process of creating Histogram, Scatter Plot and Boxplot.

Congratulations, that concludes this lesson. 

##  <a name="Reference">Reference</a>
1. [Headstart to Plotting Graphs using Matplotlib library](https://www.analyticsvidhya.com/blog/2020/10/headstart-to-plotting-graphs-using-matplotlib-library/)
2. [Matplotlib Histograms](https://www.w3schools.com/python/matplotlib_histograms.asp)
3. [Matplotlib Scatter Plot - Tutorial and Examples](https://stackabuse.com/matplotlib-scatterplot-tutorial-and-examples/)