In [86]:
import numpy as np
import pandas as pd
import sys
print('Python version: ',sys.version)
print('NumPy version: \t',np.__version__)
print('Pandas version:\t',pd.__version__)

Python version: 	 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)]
NumPy version: 		 1.12.1
Pandas version:		 0.20.1
This notebook is based on NumPy 1.12.0 and Pandas 0.20.1, if your version is not completely matched, some problems may occur while running it


**This notebook is based on Python 3.5.2, NumPy 1.12.0 and Pandas 0.20.1, if the versions are not completely matched, some problems may occur while running it.**

# <span id = "SETTING UP">SETTING UP</span>

## <span id = "What is NumPy">1.1 What is NumPy</span>

At the core of the NumPy package, is the *ndarrray* object.

### Difference between NumPy arrays and Python sequence

- NumPy arrays have a fixed size at creation, unlike Python lists, which can grow dynamically. Changing the size of an *ndarray* will create a new array and delete the original
- The elements in a Numpy array are all required to be of the same data type, and thus will be the same size in memory. **The exception:** one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.
- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python's built-in sequences.
- A growing plethora of scientific and mathematical... er... Nevermind, it means NumPy is getting more and more important.

The points about **sequence size and speed** are particularly important in scientific computing. As a simple example, consider the case of multiplying each element in a 1-D sequence with the corresponding element in another sequence of the same length. If the data are stored in two Python lists, `a` and `b`, we could iterate over each element:

```
c = []
for i in range(len(a)):
    c.append(a[i]*b[i])
```

This produces the correct answer, but if `a` and `b` each contain millions of numbers, we will pay the price for the inefficiencies of looping in Python. We could accomplish the same task much more quickly in C by writing (for clarity we neglect variable declarations and initializations, memory allocation, etc.)

```
for (i=0; i< rows; i++):{
    c[i] = a[i] + b[i];
}
```

This saves all the overhead involved in interpreting the Python code and manipulating Python objects, but at the expense of the benefits gained from coding in Python. Futhermore, the coding work required increases with the dimensionality of our data. In the case of a 2-D array, for example, the C code (abridged as before) expands to

```
for (i = 0; i < rows; i++):{
    for (j = 0; j < columns; j++):{
        c[i][j] = a[i][j]*b[i][j];
    }
}
```

NumPy gives us the best of both worlds: element-by-element operations are the "default mode" when an *ndarray* is involved, but the element-by-element operation is speedily executed by pre-compiled C code. In NumPy

```
c = a * b
```

does what the earlier examples do, at **near-C speeds**, but with the code **simplicity** we expect from something based on Python. Indeed, the NumPy idiom is even simpler!
This last example illustrates two of NumPy's features which are the basis of much of its power: **vectorization** and **boradcasting**.

**Vectorization** describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just "behind the scenes" in optimized, pre-compiled C code. Vectorized code has many advantages, among which are
- vectorized code is more **concise** and **easier** to read
- fewer lines of code generally means **fewer bugs**
- the code more closely resembles **standard mathematical notation** (making it easier, typically, to correctly code mathematical constructs)
- vectorization results in more "Pythonic" code. Without vectorization, our code would be littered with inefficient and difficult to read `for` loops
***
**Broadcasting** is the term used to describe the implicit element-by-element behavior of operations; generally speaking, in NumPy all operations, not just arithmetic operations, but logical, bit-wise, functional, etc., behave in this implicit element-by-element fashion, i.e., the broadcast. Moreover, in the example above, `a` and `b` could be multidimensional arrays of the same shape, or a scalar and no array, or even two arrays, or even two arrays of with different shapes, provided that the smaller array is "expandable" to the shape of the larger in such a way that the resulting broadcasting is unambiguous. For detailed "rules" of broadcasting see [numpy.doc.broadcasting](#Broadcasting)
***

Numpy fully supports an **object-oriented approach**, startinng, once again, with *ndarray*. For example, *ndarray* is a class, possessing numerous methods and attributes. Many of its methods mirror functions in the outer-most (最外层的) NumPy namespace, giving the programmer complete freedom to code in whichever paradigm (范式) she prefers and/or which seems most approprite to the task at hand.

## <span id = "Installing NumPy">1.2 What is Numpy</span>

In most use cases the best way to install NumPy on your system is by using an pre-built package for your operating system. Please see http://scipy.org/install.html for links to available options.

For instructions on building for source package, see [*Building from source*](#BUILDING FROM SOURCE). This information is useful mainly for advanced users.

# <span id="QUICKSTART TUTORIAL">QUICKSTART TUTORIAL</span>

## <span id="Prerequisites">2.1 Prerequisites</span>

Before reading this tutorial you shold know a bit of Python. If you would like to refresh your memory, take a look at the [Python tutorial](http://docs.python.org/3/tutorial/ "Link to tutorial").

If you wish to work the examples in this tutorial, you must also have some software installed on your computer. Please see http://scipy.org/install.html for instructions.

## <span id="The Basics">2.2 The Basics</span>

NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all off the same type, indexed by a tuple of positive integers. In NumPy dimensions are called *axes*. The numbers of axes is *rank*.
For example, the coordinates of a point in 3D space `[1,2,1]` is an array of rank 1, because it has one axis. That axis has a length of 3. In the example pictured below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.
```
[[1.,0.,0.],
 [0.,1.,2.]]
```
NumPy's array class is called `ndarray`. It is also known by the alias `array`. Note that `numpy.array` is **not the same as the Standard Python Library class** `array.array`, which **only handles one-dimensional arrays** and offers less functionally. The more important attributes of an `ndarray` objects are:

- **ndarray.ndim**
    the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as *rank*
- **ndarray.shape**
    the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a       matrix with *n* rows and *m* columns, `shape` will be `(n,m)`. The length of the `shape` tuple is therefore the 
    rank, or number of dimensions, `ndim`.
- **ndarray.size**
    the total number of elements of the array. This is equal to the product of the elements of `shape`.
- **ndarray.dtype**
    an object describing the type of the elements in the array. One can create or specify dtype's using standard Python
    types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
- **ndarray.itemsize**
    the size in bytes of each element of the array. For example, an array of elements of type `float64` has `itemsize` 8
    =(64/8), while one of type `complex32` has `itemsize` 4 (=32/8). It is equivalent to `ndarray.dtype.itemsize`.
- **ndarray.data**
    the buffer containing the actual elements of the array. Normally, we won't need to use this attribute because we
    will access the elements in an array using indexing facilities.

### <span id = "2.2.1 An example">2.2.1 An example</span>

In [2]:
a = np.arange(15).reshape(3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [3]:
print(a.shape)
print(a.ndim)
print(a.dtype.name)
print(a.itemsize)
print(a.size)
print(type(a))

(3, 5)
2
int32
4
15
<class 'numpy.ndarray'>


In [4]:
b = np.array([6,7,8])
print(b)
b

[6 7 8]


array([6, 7, 8])

In [5]:
print(type(b))
type(b)

<class 'numpy.ndarray'>


numpy.ndarray

### <span id = "2.2.2 Array Creation">2.2.2 Array Creation</span>

There are several ways to creat arrays.
For example, you can create an array from a regular Python *list* or *tuple* using the `array` function. The type of the resulting array is deduced from the type of the elements in the sequences.

In [6]:
# import numpy as np
c = np.array([2,3,4])
c

array([2, 3, 4])

In [7]:
c.dtype

dtype('int32')

In [8]:
d = np.array([2.2,3.3,4.4])
d.dtype

dtype('float64')

***
**Important**  
A frequent error consists in calling `array` with multiple numeric arguments, rather than providing a single list of numbers as an argument.

***
`array` transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on.

In [9]:
b = np.array([(1.5,2,3), (4,5,6)])
b

array([[ 1.5,  2. ,  3. ],
       [ 4. ,  5. ,  6. ]])

The type of the array can also be explicitly specified at creation time:

In [10]:
c = np.array( [ [1,2], [3,4] ], dtype=complex)
c

array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. There minimize the necessity of growing arrays, an **expensive operation.**  
The function `zeros` createa an array full of zeros, the function `ones` creates an array full of ones, the the function `empty` creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is **`float64`**.

In [11]:
np.zeros( (3,4) )

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [12]:
np.ones( (2,3,4), dtype=np.int16) #dtype can also be specified

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [13]:
np.empty( (2,3) )               # uninitialized, output may vary

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

To create sequences of numbers, NumPy provides a function analogous to `range` that returns **arrays** instead of **lists**.

In [14]:
np.arange( 10, 30, 5 )

array([10, 15, 20, 25])

In [15]:
np.arange( 0, 2, 0.3 )

array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])

When `arange` is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite point  precision. For this reason, it is usually better to use the funtion `linspace` that receives as an argument the number of elements that we want, instead of the step:

In [16]:
from numpy import pi
np.linspace( 0, 2, 9 )         # 9 numbers from 0 to 2

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])

In [17]:
x = np.linspace( 0, 2*pi, 100)

In [18]:
f = np.sin(x)

**See also:**  
array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange.linspace, numpy.random.rand, numpy.random.randn, fromfunction, fromfile

### <span id = "2.2.3 Printing Arrays">2.2.3 Printing Arrays</span>

When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:
* the last axis is printed from left to right,
* the second-to-last is printed from top to bottom,
* the rest are also printed from top to bottom, with each slice separated from the next by an empty line.

One-dimensional arrays are then printed as rows, bidmensionals as matrices and tridimentionals as lists of matrices

In [19]:
a = np.arange(6)
print(a , '\n---a---')
b = np.arange(12).reshape(4,3)
print(b , '\n---b---')
c = np.arange(24).reshape(2,3,4)
print(c , '\n---c---')

[0 1 2 3 4 5] 
---a---
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]] 
---b---
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]] 
---c---


See [*below*](#2.3 Shape Manipulation) to get more details on `reshape`.

If an array is too *large* to be printed, NumPy automatically **skips** the *central part* of the array and **only** prints the *corners*:

In [20]:
print(np.arange(10000),'\n- - - - - - -\n',np.arange(10000).reshape(100,100))

[   0    1    2 ..., 9997 9998 9999] 
- - - - - - -
 [[   0    1    2 ...,   97   98   99]
 [ 100  101  102 ...,  197  198  199]
 [ 200  201  202 ...,  297  298  299]
 ..., 
 [9700 9701 9702 ..., 9797 9798 9799]
 [9800 9801 9802 ..., 9897 9898 9899]
 [9900 9901 9902 ..., 9997 9998 9999]]


***
**Important**
To disable this behaviour and force NumPy to print the entire array, you can change the printing options using **`set_printoptions`**.  
`>>> np.set_printoptions(threshold='nan')`
***

### <span id = "2.2.4 Basic Operations">2.2.4 Basic Operations</span>

Arithmetic operations on arrays apply ***elementwise***. A new array is created and filled with the result.

In [21]:
a = np.array( [20,30,40,50])
b = np.arange(4)
b

array([0, 1, 2, 3])

In [22]:
c = a-b
c

array([20, 29, 38, 47])

In [23]:
c**2

array([ 400,  841, 1444, 2209])

In [24]:
10*np.sin(a)

array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])

In [25]:
a<35

array([ True,  True, False, False], dtype=bool)

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the `dot` function or method:

In [26]:
A = np.array( [[1,1],
              [0,1]] )
B = np.array( [[2,0],
              [3,4]] )
print(A)
print(B)
print("\nTheir elementwise product is:\n",A*B)
print("\nTheir matrix product is:\n", A.dot(B))
print("\nOr we can write np.dot(A, B):\n", np.dot(A, B))

[[1 1]
 [0 1]]
[[2 0]
 [3 4]]

Their elementwise product is:
 [[2 0]
 [0 4]]

Their matrix product is:
 [[5 4]
 [3 4]]

Or we can write np.dot(A, B):
 [[5 4]
 [3 4]]


Some operations, such as **`+=`** and **`*=`**, act in place to *modify* an existing array **rather than** create a *new* one

In [27]:
a = np.ones((2,3), dtype = int)
b = np.random.random((2,3))
a *= 3
print('- - a - -\n', a)
b += a
print('- - b - -\n', b)

- - a - -
 [[3 3 3]
 [3 3 3]]
- - b - -
 [[ 3.04976786  3.66149786  3.37968256]
 [ 3.59406947  3.81112475  3.93220162]]


***
**Important**  
When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one (a behaviour known as upcasting)
```
>>> a += b
Traceback (most recent call last):
    ...
TypeError: Canot cast ufunc add output from dtype('float64') to dtype('int32') with casting rule 'same_kind'
```  
See the next cell for more information
***

In [28]:
a = np.ones(3, dtype=np.int32)
b = np.linspace(0,pi,3)
print('>>> b.dtype.name\n', b.dtype.name)
c = a+b
print('>>> c\n', c)
print('>>> c.dtype.name\n', c.dtype.name)
d = np.exp(c*1j)  # use 'j' to present the complex number unit
print('>>> d\n', d)
print('>>> d.dtype.name\n', d.dtype.name)

>>> b.dtype.name
 float64
>>> c
 [ 1.          2.57079633  4.14159265]
>>> c.dtype.name
 float64
>>> d
 [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
>>> d.dtype.name
 complex128


Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the `ndarray` class.

In [29]:
a = np.random.random((2,3))
print('>>> a\n', a)
print('>>> a.sum()\n', a.sum())
print('>>> a.min()\n', a.min())
print('>>> a.max()\n', a.max())

>>> a
 [[ 0.85635234  0.48532854  0.35547222]
 [ 0.54724554  0.91314307  0.28809035]]
>>> a.sum()
 3.4456320674
>>> a.min()
 0.288090353891
>>> a.max()
 0.913143074817


By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the **`axis`** parameter you can apply an operation along the specified axis of an array:  
*Here, **axis=1** for calculation along each **row** and **0** for **column***

In [30]:
b = np.arange(12).reshape(3,4)
print('>>> b\n', b)
print('>>> b.sum(axis=0)\n', b.sum(axis=0))
print('>>> b.min(axis=1)\n', b.min(axis=1))
print('>>> b.cumsum(axis=1)\n', b.cumsum(axis=1))

>>> b
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
>>> b.sum(axis=0)
 [12 15 18 21]
>>> b.min(axis=1)
 [0 4 8]
>>> b.cumsum(axis=1)
 [[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]]


### <span id = "2.2.5 Universal Functions">2.2.5 Universal Functions</span>

NumPy provides familiar mathematical functions such as *sin*, *cos*, and *exp*. In NumPy, these are called "universal functions" (`ufunc`). Within NumPy, these functions operate *elementwise* on an array, producing an *array* as **output**.

In [31]:
B = np.arange(3)
print(">>> B\n", B)
print(">>> np.exp(B)\n", np.exp(B))
print(">>> np.sqrt(B)\n", np.sqrt(B))

>>> B
 [0 1 2]
>>> np.exp(B)
 [ 1.          2.71828183  7.3890561 ]
>>> np.sqrt(B)
 [ 0.          1.          1.41421356]


In [32]:
C = np.array([2., -1., 4.])
print(">>> np.add(B, C)\n", np.add(B, C))

>>> np.add(B, C)
 [ 2.  0.  6.]


**See also:**  
all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, *inv*, lexsort, [max](https://docs.python.org/dev/library/functions.html#max "Link to max"), maximum, mean, median, [min](https://docs.python.org/dev/library/functions.html#min "Link to min"), minimum, nonzero, outer, prod, [re](https://docs.python.org/dev/library/re.html#module-re "Link to re"), [round](https://docs.python.org/dev/library/functions.html#round "Link to round"), sort, std, sum, trace, transpose, var, vdot, vectorize, where

### <span id = "2.2.6 Indexing, Slicing and Iterating">2.2.6 Indexing, Slicing and Iterating</span>

**One-dimensional** arrays can be indexed, sliced and iterated over, much like [lists](https://docs.python.org/3/tutorial/introduction.html#lists "Link to lists") and other Python sequences.

In [33]:
a = np.arange(10)**3
print(">>> a\n", a)
print(">>> a[2]\n", a[2])
print(">>> a[2:5]\n", a[2:5])
a[:6:2] = -1000
# equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set even position numbers -1000
print(">>> a\n", a)
print(">>> a[ : :-1]\n", a[ : :-1])
# the inverse of a
print('- - cube root of elements in a - -')
for i in a:
    print(i**(1/3.))

>>> a
 [  0   1   8  27  64 125 216 343 512 729]
>>> a[2]
 8
>>> a[2:5]
 [ 8 27 64]
>>> a
 [-1000     1 -1000    27 -1000   125   216   343   512   729]
>>> a[ : :-1]
 [  729   512   343   216   125 -1000    27 -1000     1 -1000]
- - cube root of elements in a - -
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0




***
*Why there is an error here?*  
Because the cubic root of a negative number is more than one, can be real of complex, who knows
***

***
**Multidimensional** arrays can have one index per axis. These indices are given in a tuple separated by commas:

In [34]:
def f(x,y):
    return 10*x+y
b = np.fromfunction(f,(5,4),dtype=int)
print(">>> b\n", b)

>>> b
 [[ 0  1  2  3]
 [10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]
 [40 41 42 43]]


In [35]:
print(">>> b[2,3]\n", b[2,3])
print(">>> b[0:5,1]\n", b[0:5,1])
print(">>> b[ : ,1]\n", b[ : ,1])
print(">>> b[1:3, : ]\n", b[1:3, : ])

>>> b[2,3]
 23
>>> b[0:5,1]
 [ 1 11 21 31 41]
>>> b[ : ,1]
 [ 1 11 21 31 41]
>>> b[1:3, : ]
 [[10 11 12 13]
 [20 21 22 23]]


When fewer indices are provided than the number of axes, the missing indeces are considered complete slices, here `b[-1]` is equivalent to `b[-1, :]`

In [36]:
print(">>> b[-1]\n", b[-1]) 

>>> b[-1]
 [40 41 42 43]


The expression within brackets in `b[i]` is treated as an `i` followed by as many instances of : as needed to represent the remaining axes. NumPy also allows you to write this using dots as `b[i,...]`.  
The **dots**(...) represent as many colons as needed to preduce a complete indexting tuple. For instance, if `x` is a rank 5 array (i.e., it has 5 axes), then
- `x[1,2,...]` is equivalent to `x[1,2,:,:,:]` or more axes,
- `x[...,3]` to `x[:,:,:,:,:,:,3]` and, 
- `x[4,...,5,:]` to `x[4,:,:,5,:]`.

In [37]:
c = np.array( [[[ 0,  1,  2],
               [ 10, 12, 13]],
              [[100,101,102],
               [110,112,113]]])
print(">>> c.shape\n", c.shape)
print(">>> c[1,...]\n", c[1,...])       # same as c[1,:,:] or c[1]
print(">>> c[...,2]\n", c[...,2])       # same as c[:,:,2]

>>> c.shape
 (2, 2, 3)
>>> c[1,...]
 [[100 101 102]
 [110 112 113]]
>>> c[...,2]
 [[  2  13]
 [102 113]]


***
**Iterating** over multidimensional arrays is done with respect to the first axis:

In [38]:
for row in b:
    print(row)

[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]


However, if one wants to perform an operation on **each element** in the array, one can use the **`flat`** attribute which is an [iterator](https://docs.python.org/2/tutorial/classes.html#iterators "Link to iterator") over all the elements of the array:

In [39]:
for ele in b.flat:
    print(ele)

0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43


**See also:**  
[*Indexing*](#3.4 Indexing), *arrays.indexing* (reference), newaxis, ndenumerate, indices

## <span id = "2.3 Shape Manipulation">2.3 Shape Manipulation</span>

### <span id = "2.3.1 Changing the shape of an array">2.3.1 Changing the shape of an array</span>

An array has a shape given by the number of elements along each axis

In [40]:
a = np.floor(10*np.random.random((3,4)))

In [41]:
print(a)
print(">>> a.shape\n", a.shape)

[[ 2.  4.  7.  5.]
 [ 3.  4.  1.  6.]
 [ 7.  0.  4.  3.]]
>>> a.shape
 (3, 4)


The shape of an array can be changed with various commands:

In [42]:
print(">>> a.ravel()\n", a.ravel())
a.shape = (6, 2)
print('- - After let the shape be (6,2) - -')
print(a)
print('- - After a transpose operation - -')
print(a.T)

>>> a.ravel()
 [ 2.  4.  7.  5.  3.  4.  1.  6.  7.  0.  4.  3.]
- - After let the shape be (6,2) - -
[[ 2.  4.]
 [ 7.  5.]
 [ 3.  4.]
 [ 1.  6.]
 [ 7.  0.]
 [ 4.  3.]]
- - After a transpose operation - -
[[ 2.  7.  3.  1.  7.  4.]
 [ 4.  5.  4.  6.  0.  3.]]


The order of the elements in the array resulting from ravel() is **normally "C-style", that is, the rightmost index "changes the fastest"**, so the element after a [0,0] is a [0,1], that is to say, *we move from the first element in the first row to the second element in the first row, if it exists, not the first element in the second row.*  
If the array is reshaped to some other shape, again the array is treated as "C-style". Numpy normally creates arrays stored in this order, so ravel() will usually not need to copy its argument, but if the array was made by taking slices of another array or created with unusual options, it may need to be copied. The function ravel() and reshape() can also be instructed, using an optional argument, to use FORTRAN-style arrays, in which the leftmost index changes the fastest.
***
The `reshape` function returns its argument with a modified shape, whereas the `ndarray.resize` method modifies the array itself:

In [43]:
a

array([[ 2.,  4.],
       [ 7.,  5.],
       [ 3.,  4.],
       [ 1.,  6.],
       [ 7.,  0.],
       [ 4.,  3.]])

In [44]:
a.reshape((2,6))
a

array([[ 2.,  4.],
       [ 7.,  5.],
       [ 3.,  4.],
       [ 1.,  6.],
       [ 7.,  0.],
       [ 4.,  3.]])

In [45]:
a.resize((2,6))
a

array([[ 2.,  4.,  7.,  5.,  3.,  4.],
       [ 1.,  6.,  7.,  0.,  4.,  3.]])

If a dimension is given as -1 in a reshapeing operation, the other dimensions are automatically calculated:

In [46]:
print(a.reshape((2,-1)))

[[ 2.  4.  7.  5.  3.  4.]
 [ 1.  6.  7.  0.  4.  3.]]


**Important**  
*ValueError: negative dimensions not allowed*, if you use -1 in a resizing operation like this: a.resize((2,-1))

**See also:**  
ndarray.shape, reshape, resize, ravel

### <span id = "2.3.2 Stacking together different arrays">2.3.2 Stacking together different arrays</span>

Several arrays can be stacked together along different axes:

In [47]:
a = np.floor(10*np.random.random((2,2)))
a

array([[ 7.,  2.],
       [ 1.,  7.]])

In [48]:
b = np.floor(10*np.random.random((2,2)))
b

array([[ 5.,  9.],
       [ 5.,  1.]])

In [49]:
np.vstack((a,b)) # stack them vertically

array([[ 7.,  2.],
       [ 1.,  7.],
       [ 5.,  9.],
       [ 5.,  1.]])

In [50]:
np.hstack((a,b)) # stack them horizontally

array([[ 7.,  2.,  5.,  9.],
       [ 1.,  7.,  5.,  1.]])

The function `column_stack` stacks 1D arrays as columns into a 2D array. It is equivalent to `vstack` only for 1D array:

In [51]:
np.column_stack((a,b)) # with 2D array

array([[ 7.,  2.,  5.,  9.],
       [ 1.,  7.,  5.,  1.]])

In [52]:
a = np.array([4., 2.])
b = np.array([2., 8.])
from numpy import newaxis

In [53]:
a[:,newaxis]

array([[ 4.],
       [ 2.]])

For arrays of with more than two dimensions, `hstack` stacks along their second axes, `vstack` stacks along their first axes, and `concatenate` allows for an optional arguments giving the number of the axis along which the concatenation should happen.
***
**Note**  
In complex cases, `r_` (real part) and `c_` (~complex part~) are useful for creating arrays by stacking numbers along one axis. They allow the use of range literals (":")  
**Important**  
Actually, `r_`, translates slice objects to concatenation along the first axis; `c_`, translates slice objects to concatenation along the second axis. For more uses see the documentation.

In [54]:
print(np.r_[1:4, 6,8,3:9])
print(np.c_[1:4, 5:8, 12:15])

[1 2 3 6 8 3 4 5 6 7 8]
[[ 1  5 12]
 [ 2  6 13]
 [ 3  7 14]]


When used with arrays as arguments, `r_` and `c_` are similar to `vstack` and `hstack` in their default behavior, but allow for an optional argument giving the number of the axis along which to concatenate.  
**See also:**  
hstack, vstack, column\_stack, concatenate, c\_,r\_

### <span id = "2.3.3 Splitting one array into several smaller ones">2.3.3 Splitting one array into several smaller ones</span>

Using `hsplit`, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur:

In [55]:
a = np.floor(10*np.random.random((2,12)))
print('>>> a\n', a)
print('>>> np.hsplit(a,3)\n', np.hsplit(a,3)) # this split array a into 3 equal length parts
print('>>> np.hsplit(a,(3,4))\n', np.hsplit(a,(3,4))) # this split a after the third and forth column, both horizontally

>>> a
 [[ 9.  1.  4.  3.  9.  6.  2.  2.  7.  5.  5.  6.]
 [ 5.  0.  4.  0.  3.  3.  1.  2.  4.  4.  7.  6.]]
>>> np.hsplit(a,3)
 [array([[ 9.,  1.,  4.,  3.],
       [ 5.,  0.,  4.,  0.]]), array([[ 9.,  6.,  2.,  2.],
       [ 3.,  3.,  1.,  2.]]), array([[ 7.,  5.,  5.,  6.],
       [ 4.,  4.,  7.,  6.]])]
>>> np.hsplit(a,(3,4))
 [array([[ 9.,  1.,  4.],
       [ 5.,  0.,  4.]]), array([[ 3.],
       [ 0.]]), array([[ 9.,  6.,  2.,  2.,  7.,  5.,  5.,  6.],
       [ 3.,  3.,  1.,  2.,  4.,  4.,  7.,  6.]])]


## <span id = "2.4 Copies and Views">2.4 Copies and Views</span>
While operating and manipulating arrays, their data is sometimes *copied into a new array* and sometimes not. There are **3 cases:**
### <span id = "2.4.1 No Copy at All">2.4.1 No Copy at All</span>
- Simple assignments make no copy of array objects or of their data

In [56]:
a = np.arange(12)
b = a
print('>>> b is a\n', b is a)
b.shape = 3,4
print('>>> b.shape = 3,4')
print('>>> a.shape\n', a.shape)

>>> b is a
 True
>>> b.shape = 3,4
>>> a.shape
 (3, 4)


- Function calls make no copy, for Python passes mutable objects as references

In [57]:
def f(x):
    print(id(x))
print(id(a)) #id is a unique identifier of an object

972310692848


In [58]:
f(a) # this can't apply the print function to f(a) cause there's no return!!!!

972310692848


### <span id = "2.4.2 View or Shallow Copy">2.4.2 View or Shallow Copy</span>
Different array objects can share the same data. The `view` method creates a new array object that looks at the same data.

In [59]:
c = a.view()
c is a

False

In [60]:
c.base is a #c is a view of the data owned by a

True

In [61]:
c.flags.owndata

False

In [62]:
c.shape = 2,6
a.shape                         # here a's shape doesn't change with c's

(3, 4)

In [63]:
c[0,4] = 1236454465
a                               # here a's value changes with c's

array([[         0,          1,          2,          3],
       [1236454465,          5,          6,          7],
       [         8,          9,         10,         11]])

Slicing an array returns a view of it!

In [64]:
s = a[:,1:3]
s

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

In [65]:
# Now we use slicing to change its view, which can result in the change of a
s[:]=10

In [66]:
a

array([[         0,         10,         10,          3],
       [1236454465,         10,         10,          7],
       [         8,         10,         10,         11]])

***
**Important** 
This copied array share nothing but the base with the original array.
Thus change the view data can lead to the same change in the original data.

### <span id = "2.4.3 Deep Copy">2.4.3 Deep Copy</span>

The `copy` method makes a complete copy of the array and its data.

In [67]:
d = a.copy()
print('d is a', d is a)
print('d.base is a', d.base is a)
d[0,0]=9999
a

d is a False
d.base is a False


array([[         0,         10,         10,          3],
       [1236454465,         10,         10,          7],
       [         8,         10,         10,         11]])

***
**Important** 
This copied array share nothing with the original array

### <span id = "2.4.4 Function and Methods Overview">2.4.4 Function and Methods Overview</span>

Here is a list of some useful NumPy functions and methods names ordered in categories. See *routines* for the full list.  
**Array Creation**  
```
arrange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r, zeros, zeros_like
```
**Conversions**
```
arrange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r, zeros, zeros_like
```
**Conversions**
```
ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat
```
**Manipulations**
```
array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack
```
**Questions**
```
all, any, nonzero, where
```
**Ordering**
```
argmax, argmin, argsort, max, min, ptp, searchsorted, sort
```
**Operations**
```
choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask, real, sum
```
**Basic Statistics**
```
cov, mean, std, var
```
**Basic Linear Algebra**
```
cross, dot, outer, linalg.svd, vdot
```

## <span id = "2.5 Less Basic">2.5 Less Basic</span>

### <span id = "2.5.1 Broadcasting rules">2.5.1 Broadcasting rules</span>

Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape.  
**The first rule of broadcasting** is that if all input arrays do not have the same number of dimensions, a "1" will be repeatedly prepended to the shapes of the smaller arrays until all the arrays have the same number of dimensions.  
**The second rule of broadcasting** ensures that arrays with a size of 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array elemnet is assumed to be the same along the dimension for the "broadcast" array.  
After application of the broadcasting rules, the sizes of all arrays must match. More details can be found in [*Broadcasting*](#3.5 Broadcasting).

## <span id = "2.6 Fancy indexing and index tricks">2.6 Fancy indexing and index tricks</span>
NumPy offers more indexing facilities than regular Python sequences. In addition to indexing by integers and slies, as we saw before, arrays can be indexed by **arrays of integers of booleans**.

### <span id = "2.6.1 Indexing with Arrays of Indices">2.6.1 Indexing with Arrays of Indices</span>

In [68]:
a = np.arange(12)**2
i = np.array([1,1,3,8,5])
j = np.array([[3,4],[9,7]])
print('a[i]: \n', a[i])
print('a[j]: \n', a[j])

a[i]: 
 [ 1  1  9 64 25]
a[j]: 
 [[ 9 16]
 [81 49]]


When the indexed array is *multidimensional*, a single array of indices refers to the first dimension of `a`. The following example shows this behavior by converting an image of labels into a color image using a palette调色板.

In [69]:
palette = np.array( [ [0,0,0],
                    [255,0,0],
                    [0,255,0],
                    [0,0,255],
                    [255,255,255] ] )
image = np.array( [ [0,1,2,0],
                    [0,3,4,0] ] )
palette[image]

array([[[  0,   0,   0],
        [255,   0,   0],
        [  0, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0, 255],
        [255, 255, 255],
        [  0,   0,   0]]])

***
**Important**  
The returned array is just the same shape of how you slicing the original array.
***
We can also give indexes for more than one dimension. The arrays of indices for each dimension must have the same shape.

In [70]:
a = np.arange(12).reshape(3,4)
i = np.array( [ [0,1],
                [1,2] ] )
j=np.array( [ [2,1],
              [3,3] ] )

In [72]:
print('a[i,j]:\n',a[i,j])
print('a[:,j]:\n',a[:,j])
print('a[i,:]:\n',a[i,:])

a[i,j]:
 [[ 2  5]
 [ 7 11]]
a[:,j]:
 [[[ 2  1]
  [ 3  3]]

 [[ 6  5]
  [ 7  7]]

 [[10  9]
  [11 11]]]
a[i,:]:
 [[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 4  5  6  7]
  [ 8  9 10 11]]]


***
**Important**  
Thus how you slicing must obey the rule of being a matrix.  
Another important thing to notice is shown below.

In [73]:
# now i and j can be seen as a tuple of two element with each element be a x-y plane coordinate
l = [i,j]
a[l]

array([[ 2,  5],
       [ 7, 11]])

In [74]:
s = np.array( [i,j] )

In [78]:
print('s:\n',s)
print('l:\n',l)
print('THEY　ARE NOT THE SAME ARRAY!!!')

s:
 [[[0 1]
  [1 2]]

 [[2 1]
  [3 3]]]
l:
 [array([[0, 1],
       [1, 2]]), array([[2, 1],
       [3, 3]])]
THEY　ARE NOT THE SAME ARRAY!!!


In [79]:
a[tuple(s)]

array([[ 2,  5],
       [ 7, 11]])

We cannot use `a[s]`, because this array will be interpreted as indexing the first dimension of a. Obviously out of range!
***

Another common use of indexing with arrays is the search of the maximum value of time-dependent series:

In [84]:
time = np.linspace(20,145,5)
data = np.sin(np.arange(20).reshape(5,4))
print('time:\n',time)
print('data:\n',data)
ind = data.argmax(axis = 0)
print('ind:\n',ind)
time_max = time[ind]
data_max = data[ind, range(data.shape[1])]
print('time_max:\n',time_max)
print('data_max:\n',data_max)
np.all(data_max == data.max(axis=0))

time:
 [  20.     51.25   82.5   113.75  145.  ]
data:
 [[ 0.          0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155   0.6569866 ]
 [ 0.98935825  0.41211849 -0.54402111 -0.99999021]
 [-0.53657292  0.42016704  0.99060736  0.65028784]
 [-0.28790332 -0.96139749 -0.75098725  0.14987721]]
ind:
 [2 0 3 1]
time_max:
 [  82.5    20.    113.75   51.25]
data_max:
 [ 0.98935825  0.84147098  0.99060736  0.6569866 ]


True

`data[ind, range(data.shape[1])]`, equals to `[data[ind[0],0],data[ind[1],1],data[ind[2],2],data[ind[3],3]]`.  
And we can let all these operation be substituted by `data.max(axis=0)`.  
You can also use indexing with **arrays as a target to assign** to:

In [89]:
a = np.arange(5)
a[[1,3,4]] = 0
a

array([0, 0, 2, 0, 0])

If the list of indices have **same index more than once**, the assignment will finnaly leaving **the last value** as is last assigned to.

In [90]:
a = np.arange(5)
a[[1,1,1,1,1,1]] = [1,2,3,4,5,6]
a

array([0, 6, 2, 3, 4])

Unless you use another operation, the Python's `+=` construct, things may get more complex...

In [91]:
a = np.arange(5)
a[[1,1,1,1,1,1]] += [1,2,3,4,5,6]
a

array([0, 7, 2, 3, 4])

### <span id = "2.6.2 Indexing with Boolean Arrays">2.6.2 Indexing with Boolean Arrays</span>

## <span id = "3.4 Indexing">3.4 Indexing</span>