___

<a href='https://github.com/ai-vithink'> <img src='https://avatars1.githubusercontent.com/u/41588940?s=200&v=4' /></a>
___

# Pandas Built-in Data Visualization

In this lecture we will learn about pandas built-in capabilities for data visualization! It's built-off of matplotlib, but it baked into pandas for easier usage!  

Let's take a look!

## Imports

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline

In [None]:
from IPython.display import HTML
HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a href="javascript:code_toggle_err()">here</a>.''')
# To hide warnings, which won't change the desired outcome.

In [None]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
    border: 3px  black solid !important;
  color: black !important;
}
# For having gridlines 

In [None]:
import warnings
warnings.filterwarnings("ignore")


In [None]:
df1 = pd.read_csv('df1',index_col = 0)

In [None]:
df1.head()
# Index is a time series

In [None]:
df2 = pd.read_csv('df2') # Non time series

In [None]:
df2.head() # Random data with sequential index

In [None]:
# Say we wanted to have a histrogram for all the values of A column of df1
# Pandas can do that in the following manner :
df1['A'].hist() # DataFrame_name['Column_Name'].hist()
# Calls matplotlib under the hood, so matplotlib arguments are applicable.

In [None]:
df1['A'].hist(bins=30)

In [None]:
# The styles don't look that stylish so we import seaborn as sns and then try plotting again and see how our plot looks


In [None]:
df1['A'].hist()

In [None]:
df1['A'].plot(kind='hist',bins = 20) # Calling off of DataFrame

In [None]:
df1['A'].plot.hist()

# Plot Types

There are several plot types built-in to pandas, most of them statistical plots by nature:

* df.plot.area     
* df.plot.barh     
* df.plot.density  
* df.plot.hist     
* df.plot.line     
* df.plot.scatter
* df.plot.bar      
* df.plot.box      
* df.plot.hexbin   
* df.plot.kde      
* df.plot.pie

You can also just call df.plot(kind='hist') or replace that kind argument with any of the key terms shown in the list above (e.g. 'box','barh', etc..)
___

Let's start going through them!

## Area

In [None]:
df2.plot.area()

In [None]:
# For transparency on area plot use alpha
df2.plot.area(alpha = 0.4)

## Barplots

In [None]:
df2

In [None]:
df2.plot.bar()

* As df is small and with sequential index then we do the bar plotting, if index is categorical then as a category it will list each category as column
* We can create bar plot to be stacked by passing in stacked = True

In [None]:
df2.plot.bar(stacked = True)

## Histograms

In [None]:
df1['A'].plot.hist(bins=50)

## Line Plots

In [None]:
# df1.plot.line(x=df1.index,y="A",figsize=(12,3),lw=1)
# line parameter calls the lineplot, and we have to specify x and y, x is actual index  

## Scatter Plots

In [None]:
df1.plot.scatter(x='A',y='B')
# Creates scatter plot

You can use c to color based off another column value
Use cmap to indicate colormap to use. 
For all the colormaps, check out: http://matplotlib.org/users/colormaps.html

In [None]:
df1.plot.scatter(x='A',y='B',c='C')

* Set the colour based off of another column using c = 'Column_name'
* On running that we get a black and white plot
* What happens is we have A, B and C a 3-D plot as we have a v/s b and third argument c as well comes into picture.

In [None]:
df1.plot.scatter(x='A',y='B',c='C',cmap='magma')

* If one prefers to show things by size and not by colour then instead what we can do is pass in s and dataframe column to it.

In [None]:
df1.plot.scatter(x='A',y='B',s=df1['C'])

In [None]:
df1.plot.scatter(x='A',y='B',s=df1['C']*10) # points too small, multiply by some factor
# Plot is A vs B and size tells their C value relative to each other.

## BoxPlots

In [None]:
df2.plot.box() # Box plot done for us per column

## Hexagonal Bin Plot

Useful for Bivariate Data, alternative to scatterplot:

In [None]:
df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
df.plot.hexbin(x='a',y='b')
# Like a scatter-plot instead these are hexagonal bins.

In [None]:
# Change gridsize to increase the hexagonal bin size
# Hexagonals are more darker when there are more points inside them.
df.plot.hexbin(x='a',y='b',gridsize=25)

In [None]:
df.plot.hexbin(x='a',y='b',gridsize=25,cmap='coolwarm')

## Kernel Density Estimation plot (KDE)

In [None]:
# Call column and plot.kde()
df2['a'].plot.kde() # instead of kde() , density() can be called too

In [None]:
df2['a'].plot.density()

In [None]:
# Can be done for entire DataFrame as well KDE/Density
df2.plot.kde()

That's it! Hopefully you can see why this method of plotting will be a lot easier to use than full-on matplotlib, it balances ease of use with control over the figure. A lot of the plot calls also accept additional arguments of their parent matplotlib plt. call. 

Next we will learn about seaborn, which is a statistical visualization library designed to work with pandas dataframes well.

Before that though, we'll have a quick exercise for you!

# Great Job!