# Notes:
**Two types of plotting layers**

As we discussed in the video lectures, there are two styles/options of ploting with `matplotlib`. Plotting using the Artist layer and plotting using the scripting layer.

**Option 1: Scripting layer (procedural method) - using matplotlib.pyplot as 'plt'**

>You can use `plt` i.e. `matplotlib.pyplot` and add more elements by calling different methods procedurally; for example, `plt.title(...)` to add title or `plt.xlabel(...)` to add label to the x-axis.


**Option 2: Artist layer (Object oriented method) - using an `Axes` instance from Matplotlib (preferred)**

>You can use an `Axes` instance of your current plot and store it in a variable (eg. `ax`). You can add more elements by calling methods with a little change in syntax (by adding "_set__" to the previous methods). For example, use `ax.set_title()` instead of `plt.title()` to add title,  or `ax.set_xlabel()` instead of `plt.xlabel()` to add label to the x-axis. 

>This option sometimes is more transparent and flexible to use for advanced plots (in particular when having multiple plots. 





## 1. Area plot
1. The unstacked plot has a default transparency (alpha value) at 0.5. We can modify this value by passing in the `alpha` parameter. 
> If data are unstacked, each column contains observations from one group. There is no grouping column.

2. Area plots are **stacked by default**
> If data are stacked, one column has  to be grouping column and the rest contain the values
>> To produce a stacked area plot, **each column must be either all positive or all negative values (any NaN values will defaulted to 0)**. To produce an unstacked plot, pass `stacked=False`. 

3.  [stacked vs unstacked data link](https://support.minitab.com/en-us/minitab/19/help-and-how-to/manipulate-data-in-worksheets-columns-and-rows/supporting-topics/data-types-and-arrangements/stacked-and-unstacked-data/)

## 2. Histogram
A histogram is a way of representing the _frequency_ distribution of numeric dataset.
> it partitions the **x-axis** into _bins_, assigns each data point in our dataset to a bin, and then counts the number of data points that have been assigned to each bin.

>**y-axis** is the frequency or the number of data points in each bin. Note that we can change the bin size and usually one needs to tweak it so that the distribution is displayed nicely.

## 3. Bar chart
A bar plot is a way of representing data where the _length_ of the bars represents the magnitude/size of the feature/variable. Bar graphs usually represent numerical and categorical variables grouped in intervals. 

To create a bar plot, we can pass one of two arguments via `kind` parameter in `plot()`:
-   `kind=bar` creates a _vertical_ bar plot
-   `kind=barh` creates a _horizontal_ bar plot

**Vertical bar plot**<br>
In vertical bar graphs, the x-axis is used for labelling, and the length of bars on the y-axis corresponds to the magnitude of the variable being measured. Vertical bar graphs are particuarly useful in analyzing time series data. One disadvantage is that they lack space for text labelling at the foot of each bar. 

**Horizontal Bar Plot**<br>
Sometimes it is more practical to represent the data horizontally, especially if you need more room for labelling the bars. In horizontal bar graphs, the y-axis is used for labelling, and the length of bars on the x-axis corresponds to the magnitude of the variable being measured. As you will see, there is more room on the y-axis to  label categetorical variables.

## 4. Pie chart
Pie charts are generally used to show percentage or proportional data and usually the percentage represented by each category is provided next to the corresponding slice of pie. Pie charts are good for displaying data for around 6 categories or fewer.

## 5. Box plot
A box plot is a way of statistically representing the _distribution_ of the data through five main dimensions: 

-   **Minimun:** Smallest number in the dataset.
-   **First quartile:** Middle number between the `minimum` and the `median`.
-   **Second quartile (Median):** Middle number of the (sorted) dataset.
-   **Third quartile:** Middle number between `median` and `maximum`.
-   **Maximum:** Highest number in the dataset.

<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Images/boxplot_complete.png" width=440, align="center">

## 6. Scatter plot
A `scatter plot` (2D) is a useful method of comparing variables against each other. `Scatter` plots look similar to `line plots` in that they both map independent and **dependent** variables on a 2D graph. While the datapoints are connected together by a line in a line plot, they are not connected in a scatter plot. The data in a scatter plot is considered to express a **trend**. With further analysis using tools like regression, we can mathematically calculate this relationship and use it to predict trends outside the dataset.

## 7. Bubble plot - normallized 

A `bubble plot` is a variation of the `scatter plot` that displays three dimensions of data (x, y, z). The datapoints are replaced with bubbles, and the size of the bubble is determined by the third variable 'z', also known as the weight. In `maplotlib`, we can pass in an array or scalar to the keyword `s` to `plot()`, that contains the weight of each point.