#### Introduction

In this notebook we will discuss about matplotlib.


Matplotlib is an excellent 2D and 3D graphics library for generating scientific figures. Some of the many advantages of this library include:

- Easy to get started
- Support for  LATEX  formatted labels and texts
- Great control of every element in a figure, including figure size and DPI.
- High-quality output in many formats, including PNG, PDF, SVG, EPS, and PGF.
- GUI for interactively exploring figures and support for headless generation of figure files (useful for batch jobs).

One of the key features of matplotlib that I would like to emphasize, and that I think makes matplotlib highly suitable for generating figures for scientific publications is that all aspects of the figure can be controlled programmatically. This is important for reproducibility and convenient when one needs to regenerate the figure with updated data or change its appearance.

pyplot is file in the matplotlib library that contains functions and classes to plot.

pyplot is a collection of command style functions that make Matplotlib. Each pyplot function makes some change to a figure.

More information at the Matplotlib web page: http://matplotlib.org/


To get started using Matplotlib, import the matplotlib.plot module under the name plt

In [None]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

"""
This command is used to display the plots in this Jupyter notebook. 
If this is not included, then the plot will be shown in a new window outside the Jupyter notebook.
"""
%matplotlib inline

# %matplotlib notebook

#### Line Chart
The line chart is one of the simplest and most widely used data visualization techniques. A line chart displays information as a series of data points or markers connected by straight lines. You can customize the shape, size, color, and other aesthetic elements of the lines and markers for better visual clarity.

We can visualize how the yield of apples changes over time using a line chart. To draw a line chart, we can use the plt.plot function.

In [None]:
# Here's some basic code to generating one of the most simple graphs that we can. 
x = [1, 2, 3, 4]
y = [5,6,7,8]
#plt.plot(x,y) # supplying list to plot()

plt.plot(x,y);
#plt.show()

Calling the plt.plot function draws the line chart as expected. It also returns a list of plots drawn [<matplotlib.lines.Line2D at 0x7ff2d3f12c10>], shown within the output. We can include a semicolon (;) at the end of the last statement in the cell to avoiding showing the output and display just the graph.

In [None]:
plt.plot(x,y);

Let's enhance the plot step-by-step to make it more informative 

#### Figure titles, Axis labels 

A title can be added to each axis instance in a figure. To set the title, use the set_title method in the axes instance:
We are supposed to labels on each axis and we need a title to our graph and put the grid 

Similarly, with the methods xlabel and ylabel, we can set the labels of the X and Y axes:

In [None]:
plt.plot(x,y)
plt.title('simple graph')
plt.ylabel('Y')
plt.xlabel('X')
plt.grid(True)

#### Figure size, aspect ratio and DPI

Matplotlib allows the aspect ratio, DPI and figure size to be specified when the Figure object is created, using the figsize and dpi keyword arguments. figsize is a tuple of the width and height of the figure in inches, and dpi is the dots-per-inch (pixel per inch). To create an 800x400 pixel, 100 dots-per-inch figure, we can do:

In [None]:
fig = plt.figure(figsize=(8,4), dpi=100)

In [None]:
# Change the size of the figure using the method figure()
plt.figure(figsize=(10,5))
plt.plot(x,y);

#### Saving figures
To save a figure to a file we can use the savefig method in the Figure class:

In [None]:
# How to save the figure
fig = plt.figure()
plt.plot(x,y)
plt.title(' simple graph')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.grid(True)
fig.savefig('test.png')
#fig.savefig('/Users/pramodgupta/Desktop/test.png')
#plt.figure().savefig('my_fig.png')


In [None]:
!ls -lh test.png

###### What formats are available and which ones should be used for best quality?
Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF and PDF. For scientific papers, I recommend using PDF whenever possible. (LaTeX documents compiled with pdflatex can include PDFs using the includegraphics command). In some cases, PGF can also be good alternative.



In [None]:
fig.canvas.get_supported_filetypes()

In [None]:
# Next to change the line width 
plt.plot(x,y, linewidth = 6)
plt.title(' simple graph')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.grid(True)


In [None]:
# Change the color 
#plt.plot(x,y, color = 'g')
plt.plot(x,y,'g')
plt.title(' simple graph')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.grid(True)

#### Plotting Multiple Lines 

In [None]:
# plot two sets of data on the same graph
x1 = [5,8,10,12]
y1 = [12,16,6,10]

x2 = [6,9,11, 13]
y2 = [6,15,7, 15]

# can plot specifically, after just showing the defaults:
plt.plot(x1,y1, color = 'b', linewidth = 4)   # blue
plt.plot(x2,y2, color = 'k', linewidth = 6)   # black

# plt.plot(x1,y1)
# plt.plot(x2,y2)


# plt.plot(x1,y1,linewidth=5)
# plt.plot(x2,y2,linewidth=6)

plt.title('simple graph')

plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.grid(True)

In [None]:
#plt.plot(x1,y1,'g',x2,y2,'r')
plt.plot(x1,y1,'g',x2,y2,'r', linewidth = 5)
plt.title('Multiple plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)

#### Legends
Now we want to add the legend to show which line represent what? we will see how to add grid also 

To add the legend use label="label text" keyword argument when plots or other objects are added to the figure, and then using the legend method without arguments to add the legend to the figure.

The advantage with this method is that if curves are added or removed from the figure, the legend is automatically updated accordingly.

The legend function takes an optional keyword argument loc that can be used to specify where in the figure the legend is to be drawn. The allowed values of loc are numerical codes for the various places the legend can be drawn. See http://matplotlib.org/users/legend_guide.html#legend-location for details. Some of the most common loc values are:

plt.legend(loc=0) # let matplotlib decide the optimal location

plt.legend(loc=1) # upper right corner

plt.legend(loc=2) # upper left corner

plt.legend(loc=3) # lower left corner

plt.legend(loc=4) # lower right corner

many more options are available

In [None]:
plt.plot(x1,y1,label='Company 1')
plt.plot(x2,y2,label='Company 2')

plt.title('company wise analysis')
plt.ylabel('Revenue')
plt.xlabel('X')
plt.grid()
#plt.legend();
#plt.legend(loc = 'upper right');
plt.legend(loc = 4)

In [None]:
plt.plot(x1,y1)
plt.plot(x2,y2)

plt.title('company wise analysis')
plt.ylabel('Revenue')
plt.xlabel('X')
plt.grid()
plt.legend(['Company1','Company2'], loc = 'upper right' );

#### Line Markers
We can also show markers for the data points on each line using the marker argument of plt.plot. Matplotlib provides many different markers, like a circle, cross, square, diamond, etc. You can find the full list of marker types here: https://matplotlib.org/3.1.1/api/markers_api.html .

In [None]:
plt.plot([1, 2, 3, 4], [1, 4, 9, 16],marker =  "x")
#plt.plot([1, 2, 3, 4], [1, 4, 9, 16],"x")
plt.grid(True)


In [None]:
"""
In "ro", "r" stands for red and "o" stands for circles.

"""
plt.plot([1, 2, 3, 4], [1, 4, 9, 16],"ro")
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.grid(True)



#### Styling Lines and Markers
The plt.plot function supports many arguments for styling lines and markers:

- color or c: Set the color of the line (supported colors)
- linestyle or ls: Choose between a solid or dashed line
- linewidth or lw: Set the width of a line
- markersize or ms: Set the size of markers
- markeredgecolor or mec: Set the edge color for markers
- markeredgewidth or mew: Set the edge width for markers
- markerfacecolor or mfc: Set the fill color for markers
- alpha: Opacity of the plot

Check out the documentation for plt.plot to learn more: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot .

In [None]:
"""
Here we are plotting three graphs in the same area.
In this example, we are generating a graph that has lines connecting the points as well as dots 
at each point.

The fmt argument provides a shorthand for specifying the marker shape, line style, and line color. It can be provided as the third argument to plt.plot.

fmt = '[marker][line][color]'

In the plot() the first three values will be used for the first graph. t, t, "r--" means, 
both x and y values are t, and "r" in "r--" means red and "--" means the line is dotted. 
Note that this graph will be linear. 

The next three values in plot() will be used for the second graph. x takes values from t, 
while y takes values from t**2, 
"b" in "bs" means blue and "s" stands for square.

The last three values in the plot() will be used for the third graph. x takes values from t,
while y takes values from t**3, 
"g" in "g^" is for green and "^" for triangle. 


"""
t  = np.arange(0, 4.1, 0.2)
# red dashes, blue squares and green triangles
plt.plot(t, t, "r--", label='x')
plt.plot(t, t**2, "bs", label='x^2')
plt.plot(t, t**3, "g^",label='x^3')
plt.grid(True)
plt.legend();


In [None]:
#plt.plot(t, t, ls='--' )
plt.plot(t,t**2,marker='s', c='b', ls='--', lw=3, mec = 'r', ms=10,alpha=0.8)


#### Formatting text: LaTeX, fontsize, font family

The figure above is functional, but it does not (yet) satisfy the criteria for a figure used in a publication. First and foremost, we need to have LaTeX formatted text, and second, we need to be able to adjust the font size to appear right in a publication.

Matplotlib has great support for LaTeX. All we need to do is to use dollar signs encapsulate LaTeX in any text (legend, title, label, etc.). For example, "$y=x^3$".

But here we can run into a slightly subtle problem with LaTeX code and Python text strings. In LaTeX, we frequently use the backslash in commands, for example \alpha to produce the symbol  α . But the backslash already has a meaning in Python strings (the escape code character). To avoid Python messing up our latex code, we need to use "raw" text strings. Raw text strings are prepended with an 'r', like r"\alpha" or r'\alpha' instead of "\alpha" or '\alpha':

In [None]:
t  = np.arange(0, 4.1, 0.2)
# red dashes, blue squares and green triangles
plt.plot(t, t, "r--", label=r"$y = \alpha$")
plt.plot(t, t**2, "bs", label=r"$y = \alpha^2$")
plt.plot(t, t**3, "g^",label=r"$y = \alpha^3$")
#plt.xlabel(r'$\alpha$', fontsize=18)
plt.xlabel('alpha', fontsize=18)
plt.ylabel(r'$y$', fontsize=18)
plt.title('Mutliple Line Graph');
plt.grid(True)
plt.legend(loc =2);

We can also change the global font size and font family, which applies to all text elements in a figure (tick labels, axis labels and titles, legends, etc.):

You can also edit default styles directly by modifying the matplotlib.rcParams dictionary. Learn more: https://matplotlib.org/3.2.1/tutorials/introductory/customizing.html#matplotlib-rcparams .

In [None]:
# Update the matplotlib configuration parameters:
#matplotlib.rcParams.update({'font.size': 18, 'font.family': 'serif'})

matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

In [None]:
t  = np.arange(0, 4.1, 0.2)
# red dashes, blue squares and green triangles
plt.plot(t, t, "r--", label=r"$y = \alpha$")
plt.plot(t, t**2, "bs", label=r"$y = \alpha^2$")
plt.plot(t, t**3, "g^",label=r"$y = \alpha^3$")
plt.xlabel(r'$\alpha$', fontsize=18)
plt.ylabel(r'$y$', fontsize=18)
plt.title('Mutliple Line Graph');
plt.grid(True)
plt.legend(loc =2);

Or, alternatively, we can request that matplotlib uses LaTeX to render the text elements in the figure:



In [None]:
t  = np.arange(0, 4.1, 0.2)
# red dashes, blue squares and green triangles
plt.plot(t, t, "r--", label=r"$y = \alpha$")
plt.plot(t, t**2, "bs", label=r"$y = \alpha^2$")
plt.plot(t, t**3, "g^",label=r"$y = \alpha^3$")
plt.xlabel(r'$\alpha$', fontsize=18)
plt.ylabel(r'$y$', fontsize=18)
plt.title('Mutliple Line Graph');
plt.grid(True)
plt.legend(loc =2)

In [None]:
# restore
matplotlib.rcParams.update({'font.size': 12, 'font.family': 'sans', 'text.usetex': False})

In [None]:

y = [1, 4, 9, 16, 25,36,49, 64]
x1 = [1, 16, 30, 42,55, 68, 77,88]
x2 = [1,6,12,18,28, 40, 52, 65]
fig = plt.figure()
plt.plot(x1,y,'ys-') # solid line with yellow colour and square marker
plt.plot(x2,y,'go--') # dash line with green colour and circle marker
plt.legend(labels = ('tv', 'Smartphone'), loc = 'lower right') # legend placed at lower right
plt.title("Advertisement effect on sales")
plt.xlabel('medium')
plt.ylabel('sales')
plt.grid(True)

#### Setting limits
Matplotlib automatically arrives at the minimum and maximum values of variables to be displayed along x, y axes of a plot. However, it is possible to set the limits explicitly by using xlim() and ylim() functions.

In [None]:
x = np.arange(1,10)
plt.plot(x,np.exp(x),'r')
plt.grid()
plt.title('Exponential function graph')
plt.xlabel('x')
plt.ylabel('exponential of x')

In [None]:
x = np.arange(1,10)
plt.plot(x,np.exp(x),'k')
plt.title('exp(x)')
plt.ylim(0,8000)
plt.xlim(0,9)
#plt.axis([0, 9, 0, 8000])
plt.grid(color='r', ls = '-.', lw = 0.5)

#### Setting Ticks
We can explicitly determine where we want the axis ticks with set_xticks and set_yticks, which both take a list of values for where on the axis the ticks are to be placed. We can also use the set_xticklabels and set_yticklabels methods to provide a list of custom text labels for each tick location:

In [None]:
x = np.arange(1,10)
plt.plot(x,np.exp(x),'g')
plt.title('exp(x)')
plt.ylim(0,10000)
plt.xlim(0,10)
x_tix = np.arange(0,11,2)
plt.xticks(x_tix)
y_tix = np.arange(0,11000, 1000)
plt.yticks(y_tix)
plt.grid()

There are a number of more advanced methods for controlling major and minor tick placement in matplotlib figures, such as automatic placement according to different policies. See http://matplotlib.org/api/ticker_api.html for details.



#### Twin axes
Sometimes it is useful to have dual x or y axes in a figure; for example, when plotting curves with different units together. Matplotlib supports this with the twinx and twiny functions:

In [None]:
fig, ax1 = plt.subplots()
x = np.linspace(1,10,100)
ax1.plot(x, x**2, lw=2, color="blue")
ax1.set_ylabel(r"area $(m^2)$", fontsize=18, color="blue")
for label in ax1.get_yticklabels():
    label.set_color("blue")
ax1.grid(True) 
ax2 = ax1.twinx()
ax2.plot(x, x**3, lw=2, color="red")
ax2.set_ylabel(r"volume $(m^3)$", fontsize=18, color="red")

for label in ax2.get_yticklabels():
    label.set_color("red")
ax2.grid(True)

#### Text annotation
Annotating text in matplotlib figures can be done using the text function. It supports LaTeX formatting just like axis label texts and titles:

In [None]:
fig, ax = plt.subplots()

ax.plot(x, x**2, x, x)

ax.text(7, 80, r"$y=x^2$", fontsize=20, color="blue")
ax.text(5, 6, r"$y=x$", fontsize=20, color="green");

In [None]:
"""
Here we are considering an example to include annotation to a plot.

t and s are input and output values respectively.

Are assiging the plot to the variable called line.

We use annotate() and supply the text that should be added inside the plot. In this case, 
the annotated text is "local max". 
xy = (2,1) specifies the location of the arrow, 
xytext=(3, 1.5) specifies the location of the text. 
A dictionary with one key-value pairs is assigned to arrowprops. 
facecolor is for the color of the arrow. 

in ylim we specify the height of the plot.

"""
t = np.arange(0.0, 5.0, 0.01)
s = np.cos(2*np.pi*t)
plt.plot(t, s, "r")

plt.annotate("local minima", xy = (1.5, -1), xytext=(3, -1.5), arrowprops = dict(facecolor="black"))
#plt.annotate("local maxima", xy = (1, 1), xytext=(3, 1.5), arrowprops = dict(facecolor="black"))
plt.ylim(-2, 2)
plt.grid()

#### Improving Default Styles using Seaborn
An easy way to make your charts look beautiful is to use some default styles from the Seaborn library. These can be applied globally using the sns.set_style function. You can see a full list of predefined styles here: https://seaborn.pydata.org/generated/seaborn.set_style.html .



In [None]:
import seaborn as sns


In [None]:
#sns.set_style("darkgrid")
sns.set_style("whitegrid")

In [None]:
x = np.arange(1,10)
plt.plot(x,np.exp(x),'g')
plt.title('exp(x)')

#### Multi plot


In [None]:
""" 
We are defining subplots. 
Subplots are used to show two or more plots in a single image. 
"""

t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)

fig, axes = plt.subplots(nrows=1, ncols=2)

axes[1].plot(t1, np.exp(-t1), "bo")
axes[1].set_xlabel('t1')
axes[1].set_ylabel('exp(-t1)')
axes[1].set_title('exp(-t1)')

axes[0].plot(t2, np.exp(-t2), "r--")
axes[0].set_xlabel('t2')
axes[0].set_ylabel('exp(-t2)')
axes[0].set_title('exp(-t2)')

fig.tight_layout()

That was easy, but it isn't so pretty with overlapping figure axes and labels, right?

We can deal with that by using the fig.tight_layout method, which automatically adjusts the positions of the axes on the figure canvas so that there is no overlapping content:

In [None]:
t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)


"""
211 in subplot() means 2 plots, and 1 is for the location of the plot. 
since we want to plot two graphs, have to specify the location for each plot.

"""

"""
The input for the first plot is t1 for x-values, f(t1) for y-values and "bo" ensures that the 
points are blue circles. 
"""

plt.subplot(211)
plt.plot(t1, np.exp(-t1), "bo")
"""
The number 212 in subplot if for the second plot. 2 represents the position of the second plot.

The x-values of the plot are t2, the y-values are cosine of t2 with slight variation. "r--" 
guarantees red dashes. 

"""
plt.subplot(212)
plt.plot(t2, np.cos(2*np.pi*t2), "r--")
plt.tight_layout()

In [None]:
""" 

We are defining subplots. Subplots are used to show two or more plots in a single image. 
"""

t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)


"""
211 in subplot() means 2 plots, and 1 is for the location of the plot. 
since we want to plot two graphs, have to specify the location for each plot.

"""
#plt.subplot(121)

"""
The input for the first plot is t1 for x-values, f(t1) for y-values and "bo" ensures that the 
points are blue circles. 
"""

#plt.plot(t1, np.exp(-t1), "bo")

plt.subplot(221)
plt.plot(t1, np.exp(-t1), "bo")
"""
The number 212 in subplot if for the second plot. 2 represents the position of the second plot.

The x-values of the plot are t2, the y-values are cosine of t2 with slight variation. "r--" 
guarantees red dashes. 

"""
plt.subplot(222)
plt.plot(t1, np.exp(-t1), "ro")


plt.subplot(223)
plt.plot(t1, np.exp(-t1), "go")

plt.subplot(224)
plt.plot(t2, np.cos(2*np.pi*t2), "r--")
plt.tight_layout()

#### Logarithmic scale

It is also possible to set a logarithmic scale for one or both axes. This functionality is in fact only one application of a more general transformation system in Matplotlib. Each of the axes' scales are set separately using set_xscale and set_yscale methods which accept one parameter (with the value "log" in this case):

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(10,4))
x = np.linspace(0, 5, 10)      

axes[0].plot(x, x**2, x, np.exp(x))
axes[0].set_title("Normal scale")

axes[1].plot(x, x**2, x, np.exp(x))
axes[1].set_yscale("log")
axes[1].set_title("Logarithmic scale (y)");

In [None]:
""" 

We are defining subplots. Subplots are used to show two or more plots in a single image. 
"""

t1 = np.arange(0.0, 5.0, 0.1)
t2 = np.arange(0.0, 5.0, 0.02)


"""
211 in subplot() means 2 plots, and 1 is for the location of the plot. 
since we want to plot two graphs, have to specify the location for each plot.

"""
#plt.subplot(121)

"""
The input for the first plot is t1 for x-values, f(t1) for y-values and "bo" ensures that the 
points are blue circles. 
"""

#plt.plot(t1, np.exp(-t1), "bo")

plt.subplot(211)
plt.plot(t1, np.exp(-t1), "bo")
"""
The number 212 in subplot if for the second plot. 2 represents the position of the second plot.

The x-values of the plot are t2, the y-values are cosine of t2 with slight variation. "r--" 
guarantees red dashes. 

"""
plt.subplot(212)
plt.plot(t2, np.cos(2*np.pi*t2), "r--")

plt.tight_layout()

In [None]:
"""
Practice Question

Two plots stacked horizontally. 
"""


In [None]:
fig,ax = plt.subplots(2,2)
fig.tight_layout() # Or equivalently,  "plt.tight_layout()"
#fig, ax = plt.subplots(2,2, constrained_layout=True)

ax[0,0].plot(t1, np.exp(-t1), "bo" )
ax[0,0].set_title('First plot ')

ax[0,1].plot(t2, np.cos(2*np.pi*t2), "r--")
ax[0, 1].set_title('Second plot')

ax[1,0].plot(t2, np.cos(2*np.pi*t2), "g--")
ax[1, 0].set_title('Third plot')

ax[1,1].plot(t2, np.cos(2*np.pi*t2), "k--")
ax[1, 1].set_title('Fourth plot');

In [None]:
np.random.seed(145)
np.random.randn(10)

#### Histogram
A histogram represents the distribution of a variable by creating bins (interval) along the range of values and showing vertical bars to indicate the number of observations in each bin.

For example, let's visualize the distribution of values of sepal width in the flowers dataset. We can use the plt.hist function to create a histogram.

In [None]:
np.random.seed(145)
x = np.random.randn(100000)
# the histogram of the data
#plt.hist(x)
plt.hist(x, bins = 20)
plt.xlabel("x")
plt.ylabel("Frequency")
plt.title("Histogram")
x_tix = np.arange(-3,3)
plt.xticks(x_tix)
y_tix = np.arange(0,15000, 1000)
plt.yticks(y_tix)
plt.grid(True)

#### Multiple Histograms
Similar to line charts, we can draw multiple histograms in a single chart. We can reduce each histogram's opacity so that one histogram's bars don't hide the others'. We will use iris data set which comes with seaborn 

Let's draw separate histograms for each species of flowers. 

In [None]:
# Load data into a Pandas dataframe
import seaborn as sns
df = sns.load_dataset("iris")
df.head()

In [None]:
plt.title("Distribution of Sepal Width")
plt.hist(df.sepal_width);

We can immediately see that the sepal widths lie in the range 2.0 - 4.5, and around 35 values are in the range 2.9 - 3.1, which seems to be the most populous bin.

#### Controlling the size and number of bins
We can control the number of bins or the size of each one using the bins argument.

In [None]:
# Specifying the number of bins
plt.hist(df.sepal_width, bins=5);

In [None]:
# Specifying the boundaries of each bin
plt.hist(df.sepal_width, bins=np.arange(2, 5, 0.25));

#### Multiple Histograms
Similar to line charts, we can draw multiple histograms in a single chart. We can reduce each histogram's opacity so that one histogram's bars don't hide the others'.

Let's draw separate histograms for each species of flowers.

In [None]:
setosa_df = df[df.species == 'setosa']
versicolor_df = df[df.species == 'versicolor']
virginica_df = df[df.species == 'virginica']

In [None]:
plt.hist(setosa_df.sepal_width, alpha=0.4, bins=np.arange(2, 5, 0.25));
plt.hist(versicolor_df.sepal_width, alpha=0.4, bins=np.arange(2, 5, 0.25));

We can also stack multiple histograms on top of one another.

In [None]:
plt.title('Distribution of Sepal Width')

plt.hist([setosa_df.sepal_width, versicolor_df.sepal_width, virginica_df.sepal_width],  
         stacked=True);

plt.legend(['Setosa', 'Versicolor', 'Virginica']);

In [None]:
"""
Practice Question
Create an array of 10000 elements drawn from a uniform distribution 
between 0 and 10. Then plot an histogram with 10 bins. Then save the figure as hist_uni.png
"""


#### Bar chart
Bar charts are quite similar to line charts, i.e., they show a sequence of values. We can use the plt.bar function to draw a bar chart.

In [None]:
course = ['C1', 'C2', 'C3', 'C4', 'C5']
students = [23,17,35,29,12]
plt.bar(course,students, color = 'g')
plt.title('Bar Graph')
plt.xlabel('Class')
plt.ylabel('Students')
plt.grid()

In [None]:
course = ['C1', 'C2', 'C3', 'C4', 'C5']
students = [23,17,35,29,12]
plt.bar(course,students, color=['black', 'red', 'green', 'blue', 'cyan'])
plt.title('Bar Graph')
plt.xlabel('Class')
plt.ylabel('Students')
plt.grid()

In [None]:
plt.barh(course,students, color = 'r')
plt.title('Horizontal Bar Graph')
plt.xlabel('Class')
plt.ylabel('Students')
plt.grid()

In [None]:
# If the data being plotted consists of multiple columns
df = pd.DataFrame(np.random.rand(10,4), columns = ['a','b','c','d'])
df

In [None]:
df.plot(kind = 'bar')
#plt.show()

In [None]:
# If you would prefer stacked bars, you can use the stacked parameter, 
# setting it to true

df.plot(kind = 'bar', stacked = True)
plt.show()

In [None]:
x = [5,8,10]
y = [12,16,6]

x2 = [6,9,11]
y2 = [6,15,7]


# plt.bar(x, y, align='center')

# plt.bar(x2, y2, color='g', align='center')

plt.bar(x, y)

plt.bar(x2, y2, color='g')

plt.title('Bar chart')
plt.ylabel('Y axis')
plt.xlabel('X axis');

In [None]:
# Stacked bar The optional bottom parameter of the pyplot.bar() function allows you to specify 
# a starting value for a bar. Instead of running from zero to a value, 
N = 5
menMeans = (20, 35, 30, 35, 27)
womenMeans = (25, 32, 34, 20, 25)
menStd = (2, 3, 4, 1, 2)
womenStd = (3, 5, 2, 3, 3)
ind = np.arange(N)    # the x locations for the groups
width = 0.35       # the width of the bars: can also be len(x) sequence

p1 = plt.bar(ind, menMeans, width)
p2 = plt.bar(ind, womenMeans, width,bottom=menMeans)
#p1 = plt.bar(ind, menMeans)
#p2 = plt.bar(ind, womenMeans, bottom=menMeans)
plt.ylabel('Scores')
plt.title('Scores by group and gender')
plt.xticks(ind, ('G1', 'G2', 'G3', 'G4', 'G5'))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), ('Men', 'Women'));

#### Bar Plots with Averages
Let's look at another sample dataset included with Seaborn, called tips. The dataset contains information about the sex, time of day, total bill, and tip amount for customers visiting a restaurant over a week.

In [None]:
df_tips = sns.load_dataset("tips");

In [None]:
df_tips.head()

We might want to draw a bar chart to visualize how the average bill amount varies across different days of the week. One way to do this would be to compute the day-wise averages and then use plt.bar 

(try it as an exercise).

However, since this is a very common use case, the Seaborn library provides a barplot function which can automatically compute averages.

In [None]:
av= df_tips.groupby('day')['total_bill'].mean()

In [None]:
plt.bar(df_tips.day.unique(),av);

In [None]:
sns.barplot(x='day', y='total_bill', data=df_tips);

The lines cutting each bar represent the amount of variation in the values. For instance, it seems like the variation in the total bill is relatively high on Fridays and low on Saturday.

We can also specify a hue argument to compare bar plots side-by-side based on a third feature, e.g., sex.

In [None]:
sns.barplot(x='day', y='total_bill', hue='sex', data=df_tips);

You can make the bars horizontal simply by switching the axes.

#### Scatter Plot

In a scatter plot, the values of 2 variables are plotted as points on a 2-dimensional grid. Additionally, you can also use a third variable to determine the size or color of the points. Let's try out an example.

In [None]:
# Simple scatter plot

study_time = np.array([145, 170, 165, 184, 175, 159, 180, 172])
score = np.array([80, 92, 89, 90, 95, 75, 96, 85])
plt.scatter(study_time, score) 
plt.title("Study Time Vs Score")
plt.xlabel("Study Time")
plt.ylabel("Score")
plt.grid()

In [None]:
x = np.random.randint(low=1, high=11, size=50)
y = x + np.random.randint(1, 10, size=x.size)
# scatter plot detials
plt.scatter(x=x, y=y, marker='o', c='r', edgecolor='green')
#plt.title('Scatter: x versus y')
plt.title('Scatter: $x$ versus $y$')
# plt.xlabel('x')
# plt.ylabel('y^2')

plt.xlabel('$x$')
plt.ylabel('$y^2$')



Let's try to visualize the relationship between sepal length and sepal width. Our first instinct might be to create a line chart using plt.plot.

In [None]:
# Load data into a Pandas dataframe
df = sns.load_dataset("iris")

In [None]:
plt.plot(df.sepal_length, df.sepal_width);

The output is not very informative as there are too many combinations of the two properties within the dataset. There doesn't seem to be simple relationship between them.

We can use a scatter plot to visualize how sepal length & sepal width vary using the scatterplot function from the seaborn module (imported as sns).

In [None]:
sns.scatterplot(x=df.sepal_length, y=df.sepal_width);


#### Adding Hues
Notice how the points in the above plot seem to form distinct clusters with some outliers. We can color the dots using the flower species as a hue. We can also make the points larger using the s argument.

In [None]:
sns.scatterplot(x=df.sepal_length, y=df.sepal_width, hue=df.species);
#sns.scatterplot(x=flowers_df.sepal_length, y=flowers_df.sepal_width, hue=flowers_df.species, s=100);


Adding hues makes the plot more informative. We can immediately tell that Setosa flowers have a smaller sepal length but higher sepal widths. In contrast, the opposite is true for Virginica flowers.

#### Plotting using Pandas Data Frames
Seaborn has in-built support for Pandas data frames. Instead of passing each column as a series, you can provide column names and use the data argument to specify a data frame.

In [None]:
plt.title('Petal Dimensions')
sns.scatterplot(x='petal_length', 
                y='petal_width', 
                hue='species',
                data=df);

#### Pair Plot

In [None]:
df.head()

In [None]:
from pandas.plotting import scatter_matrix
#print(df)
scatter_matrix(df, figsize = (6,6),diagonal = 'kde');
#plt.show()

#### Pair plots with Seaborn
Seaborn also provides a helper function sns.pairplot to automatically plot several different charts for pairs of features within a dataframe.

In [None]:
df.head()

In [None]:
sns.pairplot(df, hue='species');

##### Pie Chart

In [None]:
"""
Example of a pie chart. The number of catergories is given in companies and the angle for 
each category is given in market_share
.
"""
companies = ["A", "B", "C", "D", "E"]
market_share = [15, 20, 25, 10, 30]
Explode = [0, 0.3, 0, 0.1, 0]
plt.pie(market_share, explode = Explode, labels=companies, shadow = True, startangle=45)
#plt.pie(market_share,labels=companies, shadow = True)
plt.axis('equal'); # adjustes the size of the pie

In [None]:
##Explode breaks a pie piece. In this example we are breaking catergory "B" of the pie

companies = ["A", "B", "C", "D", "E"]
market_share = [15, 20, 25, 10, 30]
Explode = [0.3, 0, 0, 0, 0]
plt.pie(market_share, explode = Explode, labels=companies)
plt.axis('equal') # adjustes the size of the pie
plt.show()

#### Box plot

In [None]:
np.random.seed(10)
x1 = np.random.normal(90, 20, 200)
plt.boxplot(x1);

In [None]:
# If the data being plotted consists of multiple columns
df = pd.DataFrame(np.random.randn(10,4))
df.boxplot();
#plt.show()

#### 3D plot

In [None]:
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')
z = np.linspace(0, 1, 100)
x = z * np.sin(20 * z)
y = z * np.cos(20 * z)
ax.plot3D(x, y, z, 'gray')
ax.set_title('3D line plot')

In [None]:
#
# RESTART THE NOTEBOOK: the matplotlib backend can only be selected before pylab is imported!
# (e.g. Kernel > Restart)
# 
import matplotlib
#matplotlib.use('Qt4Agg') # or for example MacOSX
import matplotlib.pylab as plt
import numpy as np
%matplotlib notebook

In [None]:
# Now, open an interactive plot window with the Qt4Agg backend
fig, ax = plt.subplots()
t = np.linspace(0, 10, 100)
ax.plot(t, np.cos(t) * np.sin(t))
plt.show()

#### Reference

- https://matplotlib.org/3.1.1/tutorials/introductory/pyplot.html

- http://www.matplotlib.org - The project web page for matplotlib.- 
- https://github.com/matplotlib/matplotlib - The source code for matplotlib.
- http://matplotlib.org/gallery.html - A large gallery showcaseing various types of plots matplotlib can create. Highly recommended!
- http://scipy-lectures.github.io/matplotlib/matplotlib.html - Another good matplotlib reference.

 https://seaborn.pydata.org/examples/index.html
 