# <center> Pandas Data Visualization </center>

- [Data Visualization with Pandas](#section_1)
- [Chart Visualization](#section_2)
- [Table Visualization](#section_3)

<hr>

### Data Visualization with Pandas <a class="anchor" id="section_1"></a>

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization provides an accessible way to see and understand trends, outliers, and patterns in data. 

In this lesson, we will be focusing on two data visualization methods using Pandas library:

- `chart` such as histogram, pie chart, and time series

- `table` visualization with HTML styling to highlight specific parts of your DataFrame based on predefined conditions

### Chart Visualization <a class="anchor" id="section_2"></a>

Graphical representation of data using graphs such as histograms, pie charts, box plots, and so on.

The main tool to create data charts is the [`plot()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) method which can be applied to a pandas series and dataframe objects. This method has a lot of built-in parameters. We will need to identify the name of the dataframe, the x and y variable and most importantly, the `kind` parameter, which includes many different chart types.

Let’s see how we can create some popular data visualization graphs using this function. 

### Time series plot

Our first plot is called a `time series` plot. 

Time series is a very popular visualization method to show how value moves through time. We will use it to plot this [airline-passengers](https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv) dataset.

In [1]:
# Import pandas


In [1]:
# Import the dataset as a CSV file using this linke: https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv

# Display DataFrame


We can see this dataset has two columns, the first column represents the date which is basically one month intervals starting from 1949 forward. The other column represents the number of passengers aggregated by thousands.

In order to make a time series plot for this dataframe, we need to move the "month" column to be our index value. 

In [2]:
# Set the Month values as the DataFrame index


Now the column "month" was assigned as the index value. Let’s start to plot this dataframe.

In [3]:
# Plot the DataFrame


The function generated a time series plot or what is also known as the line chart, which is the default behavior of the plot function. 

Notice how we used the `figsize` parameter to change the size of this plot which takes the height and width in inches.

This example shows you how easy it is to apply the plot function once you have your dataframe ready and all the data are in the correct shape.

<br>

### Scatter plot

Another common data visualization scenario for data analysts is to explore the relationship between two different variables, to see if there's any kind of positive or negative relationship or if they have any impact on each other. 

The most common type of plots to visualize relationships between variables is something called the `scatter plot`.

In the example below, we will use a dataset called the [IRIS dataset](https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv). This dataset is basically about three different types of flowers and we will try to explore the relationship between some of its variables.

In [8]:
# Import the Iris dataset as a CSV file 


# Display DataFrame


After we display the data, we see there are five different columns, four of them are numerical values for Sepal Length and Width, and Petal Length and Width. The last column is the species of the flower. Let's explore the relationship between two of these variables using a scatter plot.

In [4]:
# Apply a scatter plot to examine the relationship between variables


We assigned a few parameters for the [`plot()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) function to make the visualization. The first parameter is `kind`, which is basically to identify this figure as a scatter plot, and then it needs to identify the values that represent the x-axis and the y-axis. We also added the size of the figure. 

After running the cell, we see there is some sort of positive relationship between the two variables. The higher one value goes, the other value will also go higher. 

Data professionals can also apply some coloring scheme to highlight different data dimensions. By using different colors for different values, the plot function can give us another dimension and depth to our data visualization.

How about we color different groups of data points based on their species value?

In [6]:
# Map species to different color values


# Apply a scatter plot to examine the relationship between variables


In the example above, we assigned 3 color values to represent each flower species value type. Then we applied that color scheme as a parameter to our [`plot()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) function.

After running the cell, we see our changes replaced each follower species with its corresponding color scheme. 

This flexibility gives data analysts much more depth and information as well as understanding about their data.

### Table Visualization <a class="anchor" id="section_3"></a>

The other type of visualization we can also do with the Pandas library is to apply some sort of styling on our DataFrame objects. Pandas table visualization can be achieved by adding styling instructions to DataFrame objects in order to be rendered as CSS styles. 

In [7]:
# Add bar style to sepal_width variable


In the example above, we applied the horizontal bar style to the variable "sepal_width" while the data is sorted in descending order. We notice in the top row, the bar line covers the entire cell and its size continue to get smaller for the rest of the DataFrame.

In this lesson, we have learned two main techniques available in Pandas library to visualize data either as graphs such as pie charts, bar charts, time series charts or scatter plots, or as a styling option on the data frame object.

To learn more about Pandas, stay tuned!