# Plotting and Visualization

- Making informative visualizations (sometimes called plots) is an important task in data analysis.
<br>
- It may be a part of the exploratory process—for example, to help identify outliers or needed data transformations, or as a way of generating ideas for models.


In [None]:
import numpy as np
import pandas as pd

## matplotlib basics

In [None]:
import matplotlib.pyplot as plt

In [None]:
import numpy as np
data = np.arange(10)
data

In [None]:
plt.plot(data)

### Figures and Subplots

Plots in matplotlib reside within a Figure object. You can create a new figure with `plt.figure()`

In [None]:
fig = plt.figure()

- this returns a blank figure
- need to add subplots :- using the `add_subplot()` method of the figure object

<!--{talk about kwargs and args of the fig object :- shift + tab}-->

In [None]:
ax1 = fig.add_subplot(221)

alternative way :- `fig.add_subplot(2,2,1)`

In [None]:
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)

In [None]:
fig

In [None]:
plt.plot(np.random.randn(50).cumsum(), 'k--')

When you issue a plotting command like plt.plot([1.5, 3.5, -2, 1.6]), matplotlib draws on the last figure and subplot used (creating one if necessary), thus hiding the figure and subplot creation.

Note:- *One nuance of using Jupyter notebooks is that plots are reset after each cell is evaluated, so for more complex plots you must put all of the plotting commands in a single notebook cell.*

In [None]:
fig = plt.figure()

ax1 = fig.add_subplot(221)
plt.plot(np.random.randn(200),np.random.randn(200),'go')

ax2 = fig.add_subplot(222)
plt.plot(np.arange(18),5**(np.arange(18)),'y-o')

ax3 = fig.add_subplot(223)
plt.plot(np.arange(0,10,0.01),(np.sin(10*np.arange(0,10,0.01)))*(np.exp(-np.arange(0,10,0.01))),'r')

ax4 = fig.add_subplot(224)
plt.plot(np.random.randn(50).cumsum(),'--')


<!--{will talk about colors, markers and line styles here itself}-->

<!--{talk about figsize , the cell below will be created later}-->

In [None]:
fig = plt.figure(figsize=(16,9))

ax1 = fig.add_subplot(221)
plt.plot(np.random.randn(200),np.random.randn(200),'go')

ax2 = fig.add_subplot(222)
plt.plot(np.arange(18),5**(np.arange(18)),'y-o')

ax3 = fig.add_subplot(223)
plt.plot(np.arange(0,10,0.01),(np.sin(10*np.arange(0,10,0.01)))*(np.exp(-np.arange(0,10,0.01))),'r')

ax4 = fig.add_subplot(224)
plt.plot(np.random.randn(50).cumsum(),'--')

plt.show()

<!--{talk about plotting very large numbers}-->

<!--{introduce markdown:- allows to have latex, html and so on as input:-<br>
for now}<br>-->
note that, "that" ugly thing is :-


$$y=sin(10x)e^{-x}$$


<!--{talk about the last plot (abs)}-->

changing plots directly (using a direct call to `axes_subplot.plot()` method)

In [None]:
ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

In [None]:
plt.show()

In [None]:
fig = plt.figure(figsize=(16,9))

ax1 = fig.add_subplot(221)
plt.plot(np.random.randn(200),np.random.randn(200),'go')

ax2 = fig.add_subplot(222)
plt.plot(np.arange(16),5**(np.arange(16)),'y-o')

ax3 = fig.add_subplot(223)
plt.plot(np.arange(0,10,0.01),(np.sin(10*np.arange(0,10,0.01)))*(np.exp(-np.arange(0,10,0.01))),'r')

ax4 = fig.add_subplot(224)
plt.plot(np.random.randn(50).cumsum(),'--')

ax1.clear()
ax2.clear()

ax1.hist(np.random.randn(100), bins=20, alpha=0.3)
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

## the efficient way 

In [None]:
fig, axes = plt.subplots(2, 3)
axes

"axes" will be an array of subplots:- can be accessed as shown in the previous notebooks

In [None]:
axes[0][2]

### Colors, Markers, and Line Styles

In [None]:
plt.figure()
plt.plot(np.random.randn(30).cumsum(),'ko--')
# k denotes black here

equivalent code:-<br>
`plot(np.random.randn(30).cumsum(), color='k', linestyle='dashed', marker='o')`

note:- if you're working in ipython or a python script, you'll need to clear plots after you're done with them and wanna plot a new one...<br>
do that by using :-<br>
`plt.close('all')`<br>
.. after you've exported (discussed ahead) them if necessary

In [None]:
data = np.random.randn(30).cumsum()

fig = plt.figure(figsize=(16,9))

plt.plot(data, 'k--', label='Default')

plt.plot(data, 'k-', drawstyle='steps-post', label='steps-post')

plt.legend(loc='best')

plt.show()

<!--{show steps-pre as well }-->

### Ticks, Labels, and Legends

#### Setting the title, axis labels, ticks, and ticklabels

In [None]:
fig = plt.figure(figsize=(16,9))
ax = fig.add_subplot(111)

ax.plot(np.random.randn(1000).cumsum())

In [None]:
fig = plt.figure(figsize=(16,9))
ax = fig.add_subplot(111)

ax.plot(np.random.randn(1000).cumsum())

ticks = ax.set_xticks([0, 250, 500, 750, 1000])
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'],
                            rotation=30, fontsize='large')

In [None]:
fig = plt.figure(figsize=(16,9))
ax = fig.add_subplot(111)

ax.plot(np.random.randn(1000).cumsum())

ticks = ax.set_xticks([0, 250, 500, 750, 1000])
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'],
                            rotation=900, fontsize='large')

ax.set_title('This session\'s first titled graph ... :->',fontsize=20)
ax.set_xlabel('Stages')

plt.show()

#### Adding legends

In [None]:
fig = plt.figure(figsize=(16,9))
ax = fig.add_subplot(111)
ax.plot(np.random.randn(1000).cumsum(), 'y', label='one')
ax.plot(np.random.randn(1000).cumsum(), 'b--', label='two')
ax.plot(np.random.randn(1000).cumsum(), 'r.', label='three')
ax.legend(loc='best')

<!--{show the legend jumping around for multiple plots}-->

### Saving Plots to File

 - file type inferred from the extension
 - possible types:
     - pdf
     - png
     - jpeg
     - and more..

In [None]:
plt.plot(np.random.randn(100),np.random.randn(100),"ro")
plt.savefig('saved_fig.png', dpi=400, bbox_inches='tight')

Note:- can also be written to a file

### matplotlib Configuration

In [None]:
plt.rc('figure', figsize=(10, 10))
# rc:- run configs:- common end 
# configuration files in rc

the first argument is what you want to configure..

can pass parameters as a dictionary

In [None]:
font_options = {'family' : 'monospace',
                'weight' : 'bold',
                'size'   : 10}

plt.rc('font', **font_options)

## Plotting with pandas and seaborn

### Why seaborn:-
 - matplotlib can be a fairly low-level tool. You assemble a plot from its base components: the data display (i.e., the type of plot: line, bar, box, scatter, contour, etc.), legend, title, tick labels, and other annotations.
 -  Seaborn is a statistical graphics library created by Michael Waskom. Seaborn simplifies creating many common visualization types.


NOTE:-
<br>    Importing seaborn modifies the default matplotlib color schemes and plot styles to improve readability and aesthetics. Even if you do not use the seaborn API, you may prefer to import seaborn as a simple way to improve the visual aesthetics of general matplotlib plots.


### Line Plots

inbuilt functionality provided for series to produce line plots

In [None]:
s = pd.Series(np.random.randn(10).cumsum(), index=np.arange(0, 100, 10))
s.plot()

for Data Frames:- treat as a set of series

In [None]:
df = pd.DataFrame(np.random.randn(10, 4).cumsum(0),
                  columns=['A', 'B', 'C', 'D'],
                  index=np.arange(0, 100, 10))
df.plot()

#### Importance of matplotlib
Additional keyword arguments to plot are passed through to the respective matplotlib plotting function, so you can further customize these plots by learning more about the matplotlib API.

### Bar Plots

In [None]:
fig, axes = plt.subplots(2, 1)
data = pd.Series(np.random.rand(16), index=list('abcdefghijklmnop'))
data.plot.bar(ax=axes[0], color='k', alpha=1)
data.plot.barh(ax=axes[1], color='k', alpha=0.7)
plt.show()

alpha channel:- transperency map<br>


In [None]:
df = pd.DataFrame(np.random.rand(6, 4),
                  index=['one', 'two', 'three', 'four', 'five', 'six'],
                  columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
df

In [None]:
df.plot.bar()
plt.show()

In [None]:
df.plot.barh(stacked=True, alpha=0.7)

# <center>Sample</center> 

In [None]:
tips = pd.read_csv('tips.csv')
tips

aim:- to make stacked plot showing the percentage of data points for each party size on each day

In [None]:
tips.head(10)

In [None]:
tips.tail(10)

In [None]:
party_counts = pd.crosstab(tips['day'], tips['size'])
party_counts

ignoring 1 and 6 :- too less 

In [None]:
party_counts = party_counts.loc[:, 2:5]
party_counts

#### Normalize the sum to 1 (plotting fractions)

In [None]:
party_counts.shape

In [None]:
party_counts.sum(1).shape

In [None]:
party_counts.sum(1)

In [None]:
# pcts :- percentages
# using broadcasting here
party_pcts = party_counts.T / party_counts.sum(1)
party_pcts =party_pcts.T

In [None]:
party_pcts.plot.bar(title = " Fraction of parties by size on each day")

### Now showing an example using seaborn 

In [None]:
import seaborn as sns

tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

tips.head()

In [None]:
sns.barplot(x='tip_pct', y='day', data=tips)
plt.title("Tipping percentage by day with error bars")
plt.show()

In [None]:
sns.set(style = "whitegrid")

In [None]:
sns.barplot(x='tip_pct', y='day', hue='time', data=tips, orient='h')
# hue allows to split the bars using another categorical variable
plt.title("Tipping percentage by day and time")
plt.show()

In [None]:
sns.set(style = "whitegrid")

### Histograms and Density Plots

A histogram is a kind of bar plot that gives a discretized display of value frequency. The data points are split into discrete, evenly spaced bins, and the number of data points in each bin is plotted.

In [None]:
tips['tip_pct'].plot.hist(bins=50)

In [None]:
tips['tip_pct'].plot.hist(bins=100)

A related plot type is a density plot, which is formed by computing an estimate of a continuous probability distribution that might have generated the observed data. 

In [None]:
tips['tip_pct'].plot.density()


### plotting multiple distributions together

In [None]:
comp1 = np.random.normal(0, 1, size=200)
comp2 = np.random.normal(10, 2, size=200)

values = pd.Series(np.concatenate([comp1, comp2]))
#axis is set to zero by default

sns.distplot(values, bins=100, color='green')

plt.title(" Normalized histogram of normal mixture with density estimate ")
plt.show()

# Enjoy analysing and visualizing