# Introduction and 4.1 The NumPy ndarray: A Multidimensional Array Object

---

## Introduction

- Things about numpy:
    - ndarray: an efficient multidimensional array.
    - Mathematical functions for fast operations.
    - Tools for writing and reading array data.
    - Linear algebra, random number generator, and Fourier transform capabilities.
    - C API
- For data analsis applications, the main areas of functionality are:
    - Vectorized operations for data munging and cleaning, subsetting and filtering, transformation and any other kinds of computations.
    - Common array algorithms like sorting, unique, and set operations.
    - Efficient descriptive statistics and aggregating/summarizing data.
    - Data alignment and relational data manipulations for merging and joining together heterogeneous datasets.
    - Expressing conditional logic as array expressions instead of loops with if-elifelse branches.
    - Group-wise data manipulations (aggregation, transformation, function application).
- Reasons why NumPy is so important for numerical computations:
    - Internally stores data in a contiguous block of memory.
    - Fast and less memory.
    - No need loop.
    - 10 to 100 (or more) faster than python built-ins.

In [1]:
import numpy as np

---

# 4.1 The NumPy ndarray: A Multidimensional Array Object

In [2]:
data = np.random.randn(2, 3)

In [3]:
data

array([[-0.91237121,  0.59818067,  0.40473221],
       [ 0.34348958,  1.99815007,  0.34985759]])

An ndarray is a generic multidimensional container for homogeneous data.

### Creating ndarrays

In [4]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [5]:
np.empty((2, 3, 2))

array([[[1.08462507e-311, 3.16202013e-322],
        [0.00000000e+000, 0.00000000e+000],
        [1.11260619e-306, 2.17833486e-076]],

       [[1.71964390e+184, 9.95130870e-043],
        [4.85799839e-033, 9.97560467e-047],
        [1.34972671e+161, 3.64070289e+175]]])

In [6]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Data Types for ndarrays

The data type or dtype is a special object containing the information (or metadata, data about data)

In [7]:
arr1 = np.array([1, 2, 3], dtype=np.float64)

In [8]:
arr = np.array([1, 2, 3, 4, 5])

In [9]:
float_arr = arr.astype(np.float64)

In [10]:
arr.dtype

dtype('int32')

In [14]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)

In [16]:
numeric_strings.dtype

dtype('S10')

### Arithmetic with NumPy Arrays

Vectorization: Express batch operations on data
without writing any for loops.

In [17]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

In [18]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [19]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

In [20]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

### Basic Indexing and Slicing

In [21]:
arr = np.arange(10)

In [22]:
arr[5:8]

array([5, 6, 7])

Array slices are views on the original array, not a copy.

In [23]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [25]:
arr2d[2]

array([7, 8, 9])

In [26]:
arr2d[0][2]

3

In [27]:
arr2d[0, 2]

3

Idea: axis 0 as row and axis 1 as column

In [28]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

In [29]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [30]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

#### Indexing with slices

In [31]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [32]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [33]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

### Boolean Indexing

In [34]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [35]:
data = np.random.randn(7, 4)

In [36]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [37]:
data

array([[-1.85771575, -0.85349372,  0.02314233,  1.55690159],
       [ 0.03833559, -0.87238976, -0.38278167,  0.92207937],
       [-1.34515549,  0.26818529, -1.189436  ,  1.25465958],
       [-0.27409881,  0.05398741, -0.16874392,  1.49071091],
       [-1.29974273, -1.56243307, -1.80774218,  0.21844052],
       [ 0.10070541,  0.11370125, -1.417667  ,  0.45048904],
       [ 0.07312458, -1.61666271,  1.39146888, -0.86663   ]])

In [38]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [39]:
data[names == 'Bob']

array([[-1.85771575, -0.85349372,  0.02314233,  1.55690159],
       [-0.27409881,  0.05398741, -0.16874392,  1.49071091]])

In [40]:
data.shape

(7, 4)

In [41]:
data[names == 'Bob', 2:]

array([[ 0.02314233,  1.55690159],
       [-0.16874392,  1.49071091]])

In [42]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [43]:
data[~(names == 'Bob')]

array([[ 0.03833559, -0.87238976, -0.38278167,  0.92207937],
       [-1.34515549,  0.26818529, -1.189436  ,  1.25465958],
       [-1.29974273, -1.56243307, -1.80774218,  0.21844052],
       [ 0.10070541,  0.11370125, -1.417667  ,  0.45048904],
       [ 0.07312458, -1.61666271,  1.39146888, -0.86663   ]])

In [44]:
mask = (names == 'Bob') | (names == 'Will')

In [45]:
mask

array([ True, False,  True,  True,  True, False, False])

In [46]:
data[mask]

array([[-1.85771575, -0.85349372,  0.02314233,  1.55690159],
       [-1.34515549,  0.26818529, -1.189436  ,  1.25465958],
       [-0.27409881,  0.05398741, -0.16874392,  1.49071091],
       [-1.29974273, -1.56243307, -1.80774218,  0.21844052]])

Selecting data from an array by boolean indexing always creates a copy of the data,
even if the returned array is unchanged.

In [47]:
data

array([[-1.85771575, -0.85349372,  0.02314233,  1.55690159],
       [ 0.03833559, -0.87238976, -0.38278167,  0.92207937],
       [-1.34515549,  0.26818529, -1.189436  ,  1.25465958],
       [-0.27409881,  0.05398741, -0.16874392,  1.49071091],
       [-1.29974273, -1.56243307, -1.80774218,  0.21844052],
       [ 0.10070541,  0.11370125, -1.417667  ,  0.45048904],
       [ 0.07312458, -1.61666271,  1.39146888, -0.86663   ]])

In [48]:
data[data < 0] = 0

In [49]:
data

array([[0.        , 0.        , 0.02314233, 1.55690159],
       [0.03833559, 0.        , 0.        , 0.92207937],
       [0.        , 0.26818529, 0.        , 1.25465958],
       [0.        , 0.05398741, 0.        , 1.49071091],
       [0.        , 0.        , 0.        , 0.21844052],
       [0.10070541, 0.11370125, 0.        , 0.45048904],
       [0.07312458, 0.        , 1.39146888, 0.        ]])

In [50]:
data[names != 'Joe'] = 7

In [51]:
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.03833559, 0.        , 0.        , 0.92207937],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.10070541, 0.11370125, 0.        , 0.45048904],
       [0.07312458, 0.        , 1.39146888, 0.        ]])

### Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.

In [52]:
arr = np.empty((8, 4))

In [53]:
for i in range(8):
    arr[i] = i

In [54]:
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [55]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [56]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [59]:
arr = np.arange(32).reshape((8, 4))

In [60]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [61]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

In [62]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Keep in mind that fancy indexing, unlike slicing, always copies the data into a new
array.

### Transposing Arrays and Swapping Axes

Returns view, not copy.

In [63]:
arr = np.arange(15).reshape((3, 5))

In [64]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [65]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [66]:
arr = np.random.randn(6, 3)

In [67]:
np.dot(arr.T, arr)

array([[ 9.63973425, -3.92935035, -3.19738446],
       [-3.92935035,  3.34360779,  1.86845484],
       [-3.19738446,  1.86845484,  4.40146203]])

In [68]:
arr = np.arange(16).reshape((2, 2, 4))

In [69]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [70]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [71]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [72]:
arr.swapaxes(1, 2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

---

# IMPORTANT POINTS

- Indexing and Slicing return view
- Masking and Fancy indexing return copy
- Transposing return view
- Swapaxes return view