#### Note
<!--BOOK_INFORMATION-->
*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

# NumPy Basics

Datasets can come from a wide range of sources and a wide range of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.
Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

NumPy (short for *Numerical Python*) provides an efficient interface to store and operate on dense data buffers. NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python

In [5]:
import numpy
numpy.__version__

'1.14.0'

In [2]:
import numpy as np

#### NumPy Standard Data Types

NumPy arrays contain values of a single type. The standard NumPy data types are listed in the following table. Note that when constructing an array, they can be specified using a string:

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

In [9]:
np.zeros(10, dtype=np.int16)
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

### NumPy Arrays

Basic array manipulations are:

- **Attributes** of arrays: Determining the size, shape, memory consumption, and data types of arrays
- **Indexing** of arrays: Getting and setting the value of individual array elements
- **Slicing** of arrays: Getting and setting smaller subarrays within a larger array
- **Reshaping** of arrays: Changing the shape of a given array
- **Joining and splitting** of arrays: Combining multiple arrays into one, and splitting one array into many

#### NumPy Array Attributes
We'll start by defining three random arrays, a one-dimensional, tw-dimensional, and three-dimensional array.
We'll use NumPy's random number generator, which we will **seed** with a set value in order to ensure that the same random arrays are generated each time this code is run:

In [41]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

Each array has attributes **ndim** (the number of dimensions), **shape** (the size of each dimension), and **size** (the total size of the array):

In [42]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


Other useful attribute are the **dtype** (the data type of the array), **itemsize** (size in bytes of each array element), and **nbytes** (the total size in bytes of the array)

In [43]:
print("dtype:", x3.dtype)
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

dtype: int64
itemsize: 8 bytes
nbytes: 480 bytes


**nbytes** is equal to **itemsize** times **size**.

#### Array Indexing: Accessing Single Elements
The $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists

In [44]:
print(x1)
print(x1[0])
print(x1[4])
print(x1[-1]) # To index from the end of the array, you can use negative indices:

[5 0 3 3 7 9]
5
7
9


In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:


In [45]:
print(x2)
print(x2[0,0])
print(x2[2,0])
print(x2[2,-1]) 

# Values can also be modified using any of the above index notation:
x2[0, 0] = 12
print(x2)

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]
3
1
7
[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


In [46]:
# NOTE: NumPy arrays have a fixed type. Inserting a floating-point value to an integer array, the value will be silently truncated.
x1[0] = 3.14159  # this will be truncated!
print(x1)


[3 0 3 3 7 9]


### Array Slicing: Accessing Subarrays
The $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:

> **x[start:stop:step]**

If any of these are unspecified, they default to the values **``start=0``, ``stop=`` ``size of dimension``, ``step=1``**.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

#### One-dimensional subarrays

In [55]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [56]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [57]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [58]:
x[4:7]  # middle sub-array

array([4, 5, 6])

In [59]:
x[::2]  # every other element

array([0, 2, 4, 6, 8])

In [60]:
x[1::2]  # every other element, starting at index 1

array([1, 3, 5, 7, 9])

In [61]:
x[::-1]  # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [62]:
x[5::-2]  # reversed every other from index 5

array([5, 3, 1])

#### Multi-dimensional subarrays

In [63]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [64]:
x2[:2, :3]  # two rows, three columns

array([[12,  5,  2],
       [ 7,  6,  8]])

In [65]:
x2[:3, ::2]  # all rows, every other column

array([[12,  2],
       [ 7,  8],
       [ 1,  7]])

In [66]:
x2[::-1, ::-1]

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

In [67]:
x2[:, 0]  # first column of x2

array([12,  7,  1])

In [68]:
x2[0, :]  # first row of x2

array([12,  5,  2,  4])

In [69]:
x2[0]  # equivalent to x2[0, :]

array([12,  5,  2,  4])

#### Subarrays as Views

One important–and extremely useful–thing to know about **array slices** is that they return **views** rather than **copies** of the array data.
This is one area in which NumPy **array slicing** differs from Python **list slicing**: in lists, slices will be copies.


In [70]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [71]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[12  5]
 [ 7  6]]


In [72]:
x2_sub[0, 0] = 99
print(x2_sub)

[[99  5]
 [ 7  6]]


In [73]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


#### Creating copies of arrays
It is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the **copy()** method:

In [74]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


In [75]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [76]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


#### Reshaping arrays

In [77]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid) 
#Note that for this to work, the size of the initial array must match the size of the reshaped array. 

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [78]:
x = np.array([1, 2, 3])

# row vector via reshape
x.reshape((1, 3))

array([[1, 2, 3]])

In [79]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [80]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [81]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

#### Array Concatenation and Splitting

##### Array Concatenation
Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument

In [3]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [4]:
# concatenate more than two arrays at once:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


In [5]:
# It can also be used for two-dimensional arrays:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [6]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [7]:
# np.vstack (vertical stack) and np.hstack (horizontal stack) functions are used with arrays of mixed dimensions
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [8]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

##### Splitting of arrays

Implemented by the functions **np.split, np.hsplit, and np.vsplit**. 

In [10]:
# 2nd argument is list of indices giving the split points:

x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [13]:
grid = np.arange(16).reshape((4, 4))
print(grid)
upper, lower = np.vsplit(grid, [2])
print("upper")
print(upper)
print("lower")
print(lower)


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
upper
[[0 1 2 3]
 [4 5 6 7]]
lower
[[ 8  9 10 11]
 [12 13 14 15]]


In [14]:
left, right = np.hsplit(grid, [2])
print("left")
print(left)
print("right")
print(right)

left
[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
right
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]
