# ndarray Object Internals
The NumPy ndarray provides a means to interpret a block of homogeneous data as a multidimensional array object. The data type, or dtype determines how the data is interpreted as being floating point, integer, boolean, or any of the other types that NumPy provides. The ndarray is a generic multidimensional container for homogeneous data; all of the elements must be the same type.

Part of what makes ndarray powerful is that every array object is a strided view on a block of data. The strides are the number of bytes to step in each dimension when traversing an array. For example, a (100, 200) array of float64s has strides (800, 8) meaning that to move to the next row, one needs to skip 800 bytes, and to move to the next column, one needs to skip 8 bytes.

# NumPy dtype Hierarchy

In [1]:
import numpy as np 
import matplotlib.pyplot as plt 
import matplotlib.animation as animation
import time
import random 
import math
import sys
import os 


ints = np.ones(10, dtype=np.int32) 
floats = np.ones(10, dtype=np.float32) 
np.issubdtype(ints.dtype, np.integer) 

True

In [2]:
np.issubdtype(floats.dtype, np.float32) 

True

In [3]:
np.float64.mro()     # method resolution order 

[numpy.float64,
 numpy.floating,
 numpy.inexact,
 numpy.number,
 numpy.generic,
 float,
 object]

# Advanced Array Manipulation 
There are many ways to work with arrays beyond fancy indexing, slicing, and simple reshaping. While much of the heavy lifting for data analysis applications is handled by higher level functions in pandas.

# Reshaping Arrays

In [4]:
arr = np.arange(8) 

In [5]:
arr 

array([0, 1, 2, 3, 4, 5, 6, 7])

In [6]:
arr.reshape(4,2) 

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

In [7]:
# a multidimensional array can also be reshaped: 
arr.reshape((4,2)).reshape(2,4) 

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [8]:
# One of the passed shape dimensions can be -1, in which case the value used for that dimension will be inferred from the data: 

arr = np.arange(15) 

In [9]:
arr.reshape(5,-1) 

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [10]:
# since an array's shape attribute is a tuple, it can be passed to reshape, too: 
other_arr = np.ones((3,5)) 
other_arr.shape

(3, 5)

In [11]:
arr.reshape(other_arr.shape) 

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [12]:
# The Opposite operation of reshape from one-dimensional array is to higher dimensional array is typically known as flattening or raveling. 
arr = np.arange(15).reshape(5,3)
arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [13]:
arr.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [14]:
# ravel does not produce a copy of the underlying  data if it does not have to. The flatten method behaves like ravel except it always returns a copy of the data: 
arr.flatten() 

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [15]:
# Reshaping arrays with more than two dimensions can be a bit mind-bending. The key difference between C and Fortran order is the order in which the dimensions are walked: 

# C / row major order: traverse higher dimensions first (e.g., axis 1 before advancing on axis 0) 
# Fortran / column major order: traverse higher dimensions last (e.g., axis 0 before advancing on axis 1) 

# Concatenating and Splitting Arrays 
numpy.concatenate, numpy.vstack, numpy.hstack, numpy.split, numpy.hsplit, numpy.vsplit 


In [2]:
import numpy as np 
arr1 = np.arange(16).reshape((2,2,4)) 
arr2 = np.arange(16).reshape((2,2,4))  

np.concatenate([arr1, arr2], axis=0) 

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [3]:
# there are some convenience functions like vstack and hstack for common kinds of concatenation: 
np.vstack((arr1, arr2)) 

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [4]:
np.hstack((arr1, arr2)) 

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [6]:
# split on the other hand slices apart an array into multiple arrays along an axis: 
arr = np.random.randn(5,2) 

In [7]:
first, second, third = np.split(arr, [1,3]) 

In [8]:
first 

array([[ 0.12845268, -1.151601  ]])

In [9]:
second

array([[-0.79016274,  0.79550426],
       [ 0.9035132 , -0.22952522]])

In [10]:
third 

array([[-2.59653354,  1.94227996],
       [ 0.34789687,  1.39177336]])

In [25]:
# Array concatenation functions 
# Function Name 	Description 
# concatenate 	Join a sequence of arrays along an existing axis
# stack 	Join a sequence of arrays along a new axis 
# hstack 	Stack arrays in sequence horizontally (column wise) 
# vstack 	Stack arrays in sequence vertically (row wise)
# dstack 	Stack arrays in sequence depth wise (along third axis) 
# split 	Split an array into multiple sub-arrays
# hsplit 	Split array along horizontal axis
# vsplit 	Split array along vertical axis
# dsplit 	Split array along third axis 

# Stacking helpers: numpy.r_, numpy.c_ 

In [11]:
# There are two special objects in the NumPy namespace, r_ and c_, that make stacking arrays more concise: 
arr = np.arange(6) 

In [12]:
arr1 = arr.reshape((3,2)) 

In [13]:
arr2 = np.random.randn(3,2) 

In [14]:
np.r_[arr1, arr2] 

array([[ 0.        ,  1.        ],
       [ 2.        ,  3.        ],
       [ 4.        ,  5.        ],
       [-0.22806276, -1.11166318],
       [-1.34303997, -0.69600675],
       [-1.68851672,  0.31978364]])

In [15]:
np.c_[np.r_[arr1, arr2], arr] 

array([[ 0.        ,  1.        ,  0.        ],
       [ 2.        ,  3.        ,  1.        ],
       [ 4.        ,  5.        ,  2.        ],
       [-0.22806276, -1.11166318,  3.        ],
       [-1.34303997, -0.69600675,  4.        ],
       [-1.68851672,  0.31978364,  5.        ]])

# Repeating Elements: Tile and Repeat 

The need to replicate or repeat arrays is less common with NumPy than it is with other popular array programming
languages like MATLAB. This main reason for this is that broadcasting fulfills this need better which is the subject
of the next section. 

In [16]:
# The two main tools for repeating or replicating arrays to produce larger arrays are the tile and repeat functions. 

# repeat replicates each element in an array some number of times, producing a larger array 
arr = np.arange(3) 

In [17]:
arr.repeat(3)

array([0, 0, 0, 1, 1, 1, 2, 2, 2])

In [18]:
# By default, if you pass an integer, each element will be repeated that number of times. If you pass an array of integers,
# each element can be repeated a different number of times: 
arr.repeat([2,3,4]) 

array([0, 0, 1, 1, 1, 2, 2, 2, 2])

In [19]:
# Multidimensional arrays can have their elements repeated along a particular axis: 
arr = np.random.randn(2,2) 

In [20]:
arr 

array([[-0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004]])

In [21]:
arr.repeat(2, axis=0)

array([[-0.56748745, -0.17620055],
       [-0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004],
       [ 0.28779426,  0.35345004]])

In [22]:
# Note that if no axis is passed, the array will be flattened first, which is likely not what you want.
# Similarly you can pass an array of integers when repeating a multidimensional array to repeat a given slice a different number of times:
arr.repeat([2,3], axis=0) 

array([[-0.56748745, -0.17620055],
       [-0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004],
       [ 0.28779426,  0.35345004],
       [ 0.28779426,  0.35345004]])

In [23]:
arr.repeat([2,3], axis=1) 

array([[-0.56748745, -0.56748745, -0.17620055, -0.17620055, -0.17620055],
       [ 0.28779426,  0.28779426,  0.35345004,  0.35345004,  0.35345004]])

In [24]:
# tile, on the other hand is a shortcut for stacking copies of an array along an axis. 
# visually think about it as like "laying down tiles" 
arr 

array([[-0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004]])

In [25]:
np.tile(arr, 2) 

array([[-0.56748745, -0.17620055, -0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004,  0.28779426,  0.35345004]])

In [26]:
# The second argument is the number of tiles, with a scalar, the tiling is made row-by-row,rather than column by column.
# The second argument to tile can be a tuple indicating the layout of the "tiling": 
arr

array([[-0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004]])

In [27]:
np.tile(arr, (2,1))

array([[-0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004],
       [-0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004]])

In [28]:
np.tile(arr, (3,2)) 

array([[-0.56748745, -0.17620055, -0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004,  0.28779426,  0.35345004],
       [-0.56748745, -0.17620055, -0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004,  0.28779426,  0.35345004],
       [-0.56748745, -0.17620055, -0.56748745, -0.17620055],
       [ 0.28779426,  0.35345004,  0.28779426,  0.35345004]])

# Fancy Indexing Equivalents: Take and Put 

In [29]:
arr = np.arange(10) * 100 

In [30]:
inds = [7,1,2,6] 

In [31]:
arr[inds] 

array([700, 100, 200, 600])

In [32]:
# There are alternate ndarray methods that are useful in the special case of only making a selection on a single axis: 
arr.take(inds)  

array([700, 100, 200, 600])

In [33]:
arr.put(inds, 42) 

In [34]:
arr

array([  0,  42,  42, 300, 400, 500,  42,  42, 800, 900])

In [35]:
arr.put(inds, [40,41,42,43]) 

In [36]:
arr 

array([  0,  41,  42, 300, 400, 500,  43,  40, 800, 900])

In [37]:
# To use "take" along other axes, you can pass the axis keyword: 
inds = [2,0,2,1] 

In [38]:
arr = np.random.randn(2,4) 

In [39]:
arr 

array([[-0.07590108,  0.19275259, -1.19574056,  0.4705214 ],
       [ 1.2183366 ,  1.27771594,  0.29463128,  1.28430626]])

In [40]:
arr.take(inds, axis=1) 

array([[-1.19574056, -0.07590108, -1.19574056,  0.19275259],
       [ 0.29463128,  1.2183366 ,  0.29463128,  1.27771594]])

In [41]:
# "put" does not accept an axis argument but rather indexes into the flattened (onedimensional, C order) 
# version of the array. Thus, when you need to set elements using an index array  on other axes, you will want to use fancy indexing: 

arr = np.random.randn(1000, 50) 

In [42]:
# Random sample of 500 rows 
inds = np.random.permutation(1000)[:500] 

In [43]:
%timeit arr[inds] 

12.4 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [44]:
%timeit  arr.take(inds, axis=0)  

6.49 µs ± 44.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


# Broadcasting 
describes how arithmetic works between arrays of different shapes. It is a very powerful feature, but one that can be easily misunderstood, even by experienced users. The simplest example of broadcasting occurs when combining a scalar value with an array: 

In [45]:
arr = np.random.randn(4,3)  

In [46]:
arr.mean(0)

array([-0.99606653,  0.0205905 , -0.02063102])

In [47]:
demeaned = arr - arr.mean(0) 

In [48]:
demeaned 

array([[ 0.04487455,  0.28330057, -0.9099914 ],
       [-0.06929066,  0.20439684, -0.49619636],
       [-0.04290141, -0.56850213,  0.98554693],
       [ 0.06731752,  0.08080473,  0.42064083]])

In [49]:
# for an illustration of this operation. Demeaning the rows as a broadcast operation requires a bit more care: 
# Fortunately, broadcasting potentially lower dimensional values across any dimension of an array(like subtracting the row means from each column of a two-dimensional array) 
# is possible as long as you follow the rules.

[Broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)  

# Setting Array Values by Broadcasting 
The same broadcasting the rule governing arithmetic operations also applies to setting values via array indexing. In the simplest case, the value is broadcast to the entire selection. 

In [50]:
arr = np.zeros((4,4)) 

In [51]:
arr[:] = 5

In [52]:
arr 

array([[5., 5., 5., 5.],
       [5., 5., 5., 5.],
       [5., 5., 5., 5.],
       [5., 5., 5., 5.]])