# Chapter 4 NumPy Basics
## The NumPy ndarray: 
### The basics:
An ndarray is a generic multidimensional container for homogeneous data; all elements must be the same type, indexed by a tuple of positive integers. In NumPy dimensions are called _axes_.
Every array has
 * __ndarray.ndim__: the number of axes (dimensions) of the array
 * __ndarray.shape__: a tuple indicating the size of each dimension. For a matrix with _n_ rows and _m_ columns, __shape__ will be _(n, m)_. The length of the __shape__ tuple is therefore the number of axes, __ndim__.
 * __ndarray.dtype__: an object describing the _data type_ of the array, NumPy provides types of its own. numpy.int32, numpy.int16, numpy.int64 are some examples.

In [11]:
import numpy as np
data = np.arange(15).reshape(3, 5)
data

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [12]:
data.ndim

2

In [13]:
data.shape

(3, 5)

In [15]:
data.dtype.name

'int64'

 * __ndarray.size__: the total number of elements of the array. This is equal to the product of the elements of __ndarray.shape__
 * __ndarray.itemsize__: the size in bytes of each element of the array. It is equivalent to __ndarray.dtype.itemsize__.
 * __ndarray.data__: the buffer containing the actual elements of the array. Normally, we won't need to use this.

In [17]:
data.itemsize

8

In [18]:
type(data)

numpy.ndarray

### Creating ndarrays
1. The easiest way: use the __array__ function: accepts any sequence-like object and produces a new NumPy array containing the passed data. For example, a list/tuple is a good candidate for conversion:

In [4]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [5]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

A frequent error consists in calling __array__ with multiple numeric arguments, rather than providing a single list of numbers as an argument:

In [None]:
a = np.array(1, 2, 3, 4)    #WRONG
a = np.arrar([1, 2, 3, 4])    #RIGHT

The type of the array can also be explicitly specified at creation time:

In [20]:
c = np.array([[1, 2], [3, 4]], dtype=complex)
c

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

2. There are a number of other functions for creating new arrays. As examples:
 * __zeros__ and __ones__ create arrays of 0s and 1s, respectively, with a given length or shape

In [22]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

 * __empty__ creates an array without initializing its values to any particular value. 

In [33]:
np.empty((2, 4, 3))    # 第三维在前
# Uninitialized, output may vary. It's not safe to assume that np.empty will return an array of all zeors,
# sometimes it will be garbages.

array([[[3.10503618e+231, 3.10503618e+231, 9.38724727e-323],
        [0.00000000e+000, 2.31297541e-312, 5.02034658e+175],
        [2.21471671e+160, 2.89830427e-057, 1.79747002e-052],
        [1.68777511e+160, 1.47763641e+248, 1.16096346e-028]],

       [[7.69165785e+218, 1.35617292e+248, 4.10985423e-061],
        [1.08672383e-071, 8.38095896e+165, 4.66450330e-033],
        [4.30422694e-096, 6.32299154e+233, 6.48224638e+170],
        [5.22411352e+257, 5.74020278e+180, 8.37174974e-144]]])

 * __arange__ analogous to __range__ that returns arrays of sequences of numbers.

In [24]:
np.arange(10, 30, 5)

array([10, 15, 20, 25])

In [25]:
np.arange(10, 20, 1.5)    # it accepts float arguments

array([10. , 11.5, 13. , 14.5, 16. , 17.5, 19. ])

When __arange__ is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite floating point precision. For this reason, it is usually better to use the function __linspace__ that receives as an argument the number of elements that we want, instead of the step:

In [26]:
np.linspace(0, 10, 9)    # 9 numbers from 0 to 10, both ends are inclusive

array([ 0.  ,  1.25,  2.5 ,  3.75,  5.  ,  6.25,  7.5 ,  8.75, 10.  ])

Array Creation Functions

Function | Description
----------- | ---------------
array | Convert input data(list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype; copies the input data by default
asarray | Convert input to ndarray, but do not copy if the input is already an ndarray
arange | Like the built-in range but returns an ndarray instead of a list
ones | Produce an array of all 1s with the given shape and dtype 
ones_like | Produces a ones array of the same shape and dtype
zeros, zeros_like | Like ones and ones_like but producing arrays of 0s instead
empty, empty_like | Create new arrays y allocating new memory, but do not populate with any values like ones and zeros
full, full_like | Produce an array of the given shape and dtype with all values set to the indicated 'fill value'
eye, identity | Create a Square N x N identity matrix (1s on the diagonal and 0s elsewhere)

### Data Types for ndarrays
__dtype__ is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data, it is named in this way: a type name, like float or int, followed by a number indicating the number of bits per element:

In [34]:
arr1 = np.array([1,2,3], dtype=np.float64)
arr2= np.array([4,5,6], dtype=np.int32)
arr1.dtype

dtype('float64')

In [36]:
arr1.dtype.itemsize

8

In [35]:
arr2.dtype

dtype('int32')

In [37]:
arr2.dtype.itemsize

4

NumPy Data Types
Type | Type Code | Description
----- | ----- | -----
int8, uint8 | i1, u1 | Signed and unsigned 8-bit integer types
int16, uint16 | i2, u2 | Signed and unsigned 16-bit ineger types
int32, uint32 | i4, u4 | Signed and unsigned 32-bit integer types
int64, uint64 | i8, u8 | Signed and unsigned 64-bit integer types
float16 | f2 | Half-precision floating point
float32 | f4 or f | Standard single-precision floating point; compatible with C float
float64 | f8 or d | Standard double-precision floating point; compatible with C double and Python float object
float128 | f16 or g | Extended-precision floating point
complex64, complex128, complex256 | c8, c16, c32 | Complex numbers represented by two 32, 64, or 128 floats, respectively
bool | ? | Boolean type storing True and False values
object | 0 | Python object type; a value can be any Python object
string_ | S | Fixed-length ASCII string type (1 byte per character); for example, to create a string dtype with length 10, use 'S10'
unicode_ | U | Fixed-length Unicode type (number of bytes platform specific); same specification semantics as string_ (e.g., 'U10')

Explicitly convert or cast an array from one dtype to another using ndarray's __astype__ method:

In [46]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int64')

In [48]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

If I cast some floating-point numbers to be of integer dtype, the decimal part will be truncated:

In [49]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [51]:
arr.astype(np.int64)    # 为什么这里转换失败。。。
arr.dtype

dtype('float64')

An array of strings representing numbers, you can use __astype__ to convert them to numeric form:

In [54]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)    # 此处跟书上不一样，可能是版本问题，不过此处的操作更是合适使用pandas
numeric_strings.astype(np.float)
numeric_strings

array([b'1.25', b'-9.6', b'42'], dtype='|S4')

Type Code can also be used to refer to a dtype:

In [55]:
empty_uint32 = np.empty(8, dtype='u4')
empty_uint32

array([         0, 1072693248,          0, 1073741824,          0,
       1074266112,          0, 1074790400], dtype=uint32)

__Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.__

### Arithmetic with NumPy Arrays
Arrays enable you to express batch operations on data without writing any for loops. This is called _vectorization_. Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [56]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [57]:
arr * arr

array([[ 1,  4,  9],
       [16, 25, 36]])

In [58]:
arr - arr

array([[0, 0, 0],
       [0, 0, 0]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [59]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [60]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

Comparisons between arrays of the same size yield boolean arrays:

In [61]:
arr2 = np.array([[0, 4, 1], [7, 2, 12]])

In [62]:
arr2

array([[ 0,  4,  1],
       [ 7,  2, 12]])

In [63]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

Operations between differently sized arrays is called __broadcasting__. 