# Ploting and Visulisation

In [1]:
%matplotlib notebook
import matplotlib.pyplot as plt

Everything in matplotlib is organized in a hierarchy. 
 * At the top of the hierarchy is the matplotlib "state-machine envvironment" which is provided by the matplotlib.pyplot module.
 * The next level down in the hierarchy is the first level of the object-oriented interface, in which pyplot is used **only for a few functions such as figure creation**, and the user explicitly creates and keeps track of the figure and axes objects. 
  * At this level, the user uses pyplot to create figures, and through those figures, one or more axes objects can be created. These axes objects are then used for most plotting actions.
 * For even more control -- which is essential for things like embedding matplotlib plots in GUI applications -- the pyplot level may be dropped completely, leaving a purely object-oriented approach.

![anatomy.png](attachment:anatomy.png)

### Figure
The figure keeps track of all the child **Axes**, a smattering of 'special' artists (titles, figure legends, etc).

The easiest way to create a new figure is with pyplot:

In [2]:
fig = plt.figure()    # an empty figure with no axes
fig.suptitle('No axes on this figure')    # add a title so we know which it is
fig, ax_list = plt.subplots(2, 2)    # a figure with a 2x2 grid of Axes. (this fig is a new one, different from fig from the first command)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Axes
This is what you think of as 'a plot', it is the region of the image with the data space. A given __figure__ can contain many __Axes__, but a given __Axes__ object can only be in one __Figure__. The __Axes__ contains two (or three in the case of 3D) __Axis__ objects which take care of the data limits (the data limits can also be controlled via set via the *set_xlim()* and *set_ylim()* __Axes__ methods). Each __Axes__ has a title (set via *set_title()*), an x-label (set via *set_xlabel()*), and a y-label (set via *set_ylabel()*). 

### Axis
These are the number-line-like objects.(???) They take care of setting the graph limits and generating the ticks (the marks on the axis) and ticklabels (strings labeling the ticks). The location of the ticks is determined by a *Locator* object and the ticklabel strings are formatted by a *Formatter*. The conbination of the correct *Locator* and *Formatter* gives very fine control over the tick locations and labels.

### Artist
Everything you can see on the figure is an artist (even the *Figure*, *Axes*, and *Axis* objects). Most Artists are tied to an Axes; such an Artist cannot be shared by multiple Axes, or moved from one to another.

Hierarchy:
* Figure
  * Axes: set_title(), set_xlabel(), set_ylabel()
   * Axis: control the data limits
     * Locator object: controls the ticks location
     * Formatter: format the ticklabel strings

## Types of inputs to plotting functions

All of plotting functions expect *np.array* or *np.ma.masked_array* as input. Classes that are 'array-like' such as pandas data objects and *np.matrix* may or may not work as intended. It is best to convert these to *np.array* objects prior to plotting.

In [4]:
# to convert a pandas.DataFrame
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.rand(4, 5), columns = list('abcde'))
a_asndarray = a.values

## pandas 透视表和交叉表：
 * 透视表：pivot table
 * 交叉表：
 
下面从书中导入一组数据来举例练习：

In [9]:
tips = pd.read_csv('pydata-book/examples/tips.csv')
tips.head()

Unnamed: 0,total_bill,tip,smoker,day,time,size
0,16.99,1.01,No,Sun,Dinner,2
1,10.34,1.66,No,Sun,Dinner,3
2,21.01,3.5,No,Sun,Dinner,3
3,23.68,3.31,No,Sun,Dinner,2
4,24.59,3.61,No,Sun,Dinner,4


'C:\\Users\\Jianbin\\Documents\\GitHub'