# Ploting and Visulisation

In [1]:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np

Everything in matplotlib is organized in a hierarchy. 
 * At the top of the hierarchy is the matplotlib "state-machine envvironment" which is provided by the matplotlib.pyplot module.
 * The next level down in the hierarchy is the first level of the object-oriented interface, in which pyplot is used **only for a few functions such as figure creation**, and the user explicitly creates and keeps track of the figure and axes objects. 
  * At this level, the user uses pyplot to create figures, and through those figures, one or more axes objects can be created. These axes objects are then used for most plotting actions.
 * For even more control -- which is essential for things like embedding matplotlib plots in GUI applications -- the pyplot level may be dropped completely, leaving a purely object-oriented approach.

![anatomy.png](attachment:anatomy.png)

### Axes
This is what you think of as 'a plot', it is the region of the image with the data space. A given __figure__ can contain many __Axes__, but a given __Axes__ object can only be in one __Figure__. The __Axes__ contains two (or three in the case of 3D) __Axis__ objects which take care of the data limits (the data limits can also be controlled via set via the *set_xlim()* and *set_ylim()* __Axes__ methods). Each __Axes__ has a title (set via *set_title()*), an x-label (set via *set_xlabel()*), and a y-label (set via *set_ylabel()*). 

### Axis
These are the number-line-like objects.(???) They take care of setting the graph limits and generating the ticks (the marks on the axis) and ticklabels (strings labeling the ticks). The location of the ticks is determined by a *Locator* object and the ticklabel strings are formatted by a *Formatter*. The conbination of the correct *Locator* and *Formatter* gives very fine control over the tick locations and labels.

### Artist
Everything you can see on the figure is an artist (even the *Figure*, *Axes*, and *Axis* objects). Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes, or moved from one to another.

Hierarchy:
* Figure
  * Axes: set_title(), set_xlabel(), set_ylabel()
   * Axis: control the data limits
     * Locator object: controls the ticks location
     * Formatter: format the ticklabel strings

## Types of inputs to plotting functions

All of plotting functions expect *np.array* or *np.ma.masked_array* as input. Classes that are 'array-like' such as pandas data objects and *np.matrix* may or may not work as intended. It is best to convert these to *np.array* objects prior to plotting.

In [2]:
# to convert a pandas.DataFrame
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.rand(4, 5), columns = list('abcde'))
a_asndarray = a.values
type(a_asndarray)

numpy.ndarray

In [3]:
# to convert a np.matrix
b = np.matrix([[1, 2], [3, 4]])
b_asarray = np.asarray(b)
type(b_asarray)

numpy.ndarray

### Figure
The figure keeps track of all the child **Axes**, a smattering of 'special' artists (titles, figure legends, etc).

The easiest way to create a new figure is with pyplot:

In [9]:
fig = plt.figure()    # an empty figure with no axes
fig.suptitle('No axes on this figure')    # add a title so we know which it is
fig, ax_list = plt.subplots(2, 2)    # a figure with a 2x2 grid of Axes. (this fig is a new one, different from fig from the first command)
# plt.subplots(2, 3, sharex=True, sharey=True) 可以共享x y 轴。
_ = ax_list[0, 0].hist(np.random.randn(100), bins=20, color='k', alpha=0.3) # careful with the ax_list slice!!!
ax_list[0, 1].scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))
# subplots_adjsut 命令，下边介绍。此命令需要跟fig的创建命令同框才有效。
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=0.5) # None和0是不一样的

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Adjusting the spacing around subplots
You can change the spacing using the *subplots_adjust* method on *Figure* objects, also available as a top-level function:

In [5]:
plt.subplots_adjust(left=2, bottom=3, right=3, top=4, wspace=1, hspace=1)
fig, axe = plt.subplots(1, 2)
fig.subplots_adjust(wspace=0, hspace=0)

<IPython.core.display.Javascript object>

### Matplotlib, pyplot, and pylab;
 * matplotlib is the whole package
 * pyplot is a module in matplotlib
 * pylab is a module that gets installed alonside matplotlib （deprecated and strongly dicourage to use it)
 
pyplot provides the 'state-machine interface'.

In [6]:
# all these plots are added to the last fig exist.
x = np.linspace(0, 2, 100)

plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')
         
plt.xlabel('x label')
plt.ylabel('y label')
         
plt.title('Simple Plot')

plt.legend()
         
plt.show()

The first call to *plt.plot* will automatically create the necessary figure and axes to archieve the desired plot. Subsequent call to *plt.plot* re-use the current axes and each add another line. 

pylab is deprecated and its use is strongly discouraged.

For non-interactive ploting it is suggested to use pyplot to create the figures and then the OO interface for ploting.

### coding style

Typically one finds oneself making the same plots with different data sets, which leads to needing to write specialized functions to do the plotting. The recommended function signature is something like:

In [7]:
def my_plotter(ax, data1, data2, param_dict):
    """
    A helper function to make a graph
    
    Parameters
    ----------
    ax : Axes
        The axes to draw to
        
    data1 : array
        The x data
        
    data2 : array
        The y data
        
    param_dict : dict
        Dictionary of kwargs to pass to ax.plot
        
    Returns
    -------
    out : list
        list of artists added
    """
    out = ax.plot(data1, data2, **param_dict)
    return out

# which you would then use as:

data1, data2, data3, data4 = np.random.randn(4, 100)
fig, ax = plt.subplots(1, 2)
my_plotter(ax[0], data1, data2, {'marker': 'x'})
my_plotter(ax[1], data3, data4, {'marker': 'o'})

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11789e5f8>]

### Colors, Markers, and Line Styles

## Backends    (可选读物)
### What is a backend?
matplotlib targets many different use cases and output formats. To support all of these use cases, matplotlib can target different outputs, and each of these capabilities is called a backend; the "frontend" is the user facing code, i.e., the plotting code， whereas the "backend" does all the hard work behind-the-scenes to make the figure. There are two types of backends:
 * interactive backends: (for use in pygtk, wxpython, tkinter, qt4, or macosx)
  * pyplot functions automatically draw to the screen
  * using object method calls in addition to pyplot functions, then call *draw()* whenever you want to refresh the plot.
 * non-interactive backends: hardcopy backends to make image files
  * use non-interactive ......

There are four ways to configure the backend. If they conflict each other, the method mentioned last in the following list will be used: (此处需要的时候再继续看)

## pandas 透视表和交叉表：
 * 透视表：pivot table
 * 交叉表：
 
下面从书中导入一组数据来举例练习：

In [8]:
tips = pd.read_csv('pydata-book/examples/tips.csv')
tips.head()

Unnamed: 0,total_bill,tip,smoker,day,time,size
0,16.99,1.01,No,Sun,Dinner,2
1,10.34,1.66,No,Sun,Dinner,3
2,21.01,3.5,No,Sun,Dinner,3
3,23.68,3.31,No,Sun,Dinner,2
4,24.59,3.61,No,Sun,Dinner,4
