# Numpy beginner tutorial
## This tutorial in particular repeats [numpy quickstart](https://numpy.org/devdocs/user/quickstart.html) a lot from the official site of NumPy.

In [2]:
# imports
import numpy as np
import matplotlib.pyplot as plt
rg = np.random.default_rng(1)  # create instance of default random number generator
%matplotlib inline

### Basics

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called `axes`.

In [2]:
# for example: the array has 2 axes. The first axis has a length of 2, the second axis has a length of 3.
[[1., 0., 0.],
 [0., 1., 0.]]

[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]

NumPy’s array class is called `ndarray`. It is also known by the alias array. Note that `numpy.array` **is not** the same as the Standard Python Library class `array.array`

The more important attributes of an ndarray object are:

***

> `ndarray.ndim`

the number of axes (dimensions) of the array.

> `ndarray.shape`

the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, `shape` will be `(n,m)`. The length of the `shape` tuple is therefore the number of axes, `ndim`.

> `ndarray.size`

the total number of elements of the array. This is equal to the product of the elements of `shape`.

> `ndarray.dtype`

an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. **Additionally** NumPy provides types of its own. **numpy.int32, numpy.int16, and numpy.float64 are some examples**.

> `ndarray.itemsize`

the size in bytes of each element of the array. For example, an array of elements of type `float64` has `itemsize` 8 (=64/8), while one of type `complex32` has `itemsize 4` (=32/8). It is equivalent to `ndarray.dtype.itemsize`.

> `ndarray.data`

the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.


In [3]:
lst = np.arange(15).reshape(3, 5)
lst

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [4]:
lst.shape

(3, 5)

In [5]:
lst.size

15

In [6]:
lst.dtype

dtype('int64')

In [7]:
lst.itemsize

8

In [8]:
lst.data

<memory at 0x7fb932c29790>

In [9]:
type(lst)

numpy.ndarray

In [10]:
# create numpy array
basic_lst = [1, 2, 3, 4]
own_lst = np.array(basic_lst)

own_lst

array([1, 2, 3, 4])

In [11]:
type(own_lst)

numpy.ndarray

### Array Creation

In [12]:
# create array using np.array function
int_lst = np.array([1, 2, 3])
int_lst

array([1, 2, 3])

In [13]:
int_lst.dtype

dtype('int64')

In [14]:
float_lst = np.array([1., 2., 3.])
float_lst

array([1., 2., 3.])

In [15]:
float_lst.dtype

dtype('float64')

A frequent error consists in calling `array` with multiple arguments, rather than providing a single sequence as an argument.

In [16]:
# WRONG
np.array(1, 2, 3, 4)

TypeError: array() takes from 1 to 2 positional arguments but 4 were given

In [17]:
# RIGHT
np.array([1, 2, 3, 4, 5])

array([1, 2, 3, 4, 5])

`array` transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on.

In [18]:
# like tuples
lst_2dim = np.array([(1, 2, 3), (4, 5, 6)])
lst_2dim

array([[1, 2, 3],
       [4, 5, 6]])

In [19]:
# like list
lst_2dim = np.array([[1, 2, 3], [4, 5, 6]])
lst_2dim

array([[1, 2, 3],
       [4, 5, 6]])

In [20]:
lst_2dim.ndim

2

In [21]:
#The type of the array can also be explicitly specified at creation time:
complex_lst = np.array([1, 2, 3, 4], dtype=complex)
complex_lst

array([1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j])

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

> `numpy.zeros(shape, dtype=float, order='C', *, like=None)`

Return a new array of given shape and type, filled with zeros.

> `numpy.ones(shape, dtype=None, order='C', *, like=None)`

Return a new array of given shape and type, filled with ones.

> `numpy.empty(shape, dtype=float, order='C', *, like=None)`

Return a new array of given shape and type, without initializing entries **(content is random and depends on the state of the memory)**.


_By default, the `dtype` of the created array is `float64`, but it can be specified via the key word argument dtype._

In [27]:
np.zeros([2, 3])
# or
# np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [28]:
np.ones([3, 4])
# or
# np.ones((3, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [33]:
np.empty([2, 4])

array([[4.65328105e-310, 0.00000000e+000, 6.93833211e-310,
        6.93838078e-310],
       [6.93838077e-310, 6.93838078e-310, 6.93835596e-310,
        6.93838036e-310]])

In [41]:
# with numpy type
np.empty((2, 4), dtype=np.int8)

array([[-1, -1, -1, -1],
       [-1, -1, -1, -1]], dtype=int8)

In [53]:
# with python type
np.empty([2, 5], dtype=int)

array([[     94183456056560,                   0,                   0,
                          0,                   0],
       [7076336329807914035, 3617566314198085933, 3257002151774073654,
        3618752481072527665,        521388909921]])

To create sequences of numbers, NumPy provides the `arange` function which is analogous to the Python built-in `range`, but returns an array.

> `numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)`

Return evenly spaced values within a given interval.

Values are generated within the half-open interval **[start, stop)** (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list.

**When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use** `numpy.linspace` **for these cases.**

> `numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)`

Return evenly spaced numbers over a specified interval.

Returns num evenly spaced samples, calculated over the interval **[start, stop] when endpoint=True, or [start, stop) when endpoint=False**.

In [56]:
np.arange(1, 16)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [57]:
np.arange(1, 15, 3)

array([ 1,  4,  7, 10, 13])

In [84]:
# VERY BAD

# step must be int, else - undefined behavior
np.arange(1, 5, 0.1)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
       4.9])

In [85]:
# GOOD
# with endpoint=False
# [start, stop) 

np.linspace(1, 5, 40, endpoint=False)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
       4.9])

In [86]:
# with endpoint=True
# [start, stop] 

np.linspace(1, 5, 41)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
       4.9, 5. ])

### Printing Arrays

When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:

* the last axis is printed from left to right,

* the second-to-last is printed from top to bottom,

* the rest are also printed from top to bottom, with each slice separated from the next by an empty line.

One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

In [88]:
# 1d array
lst_1d = np.arange(1, 6)
print(lst_1d)

[1 2 3 4 5]


In [92]:
# 2d array
lst_2d = np.arange(12).reshape(4, 3)
print(lst_2d)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


In [94]:
# 3d array
lst_3d = np.arange(24).reshape(2, 3, 4)
print(lst_3d)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


If an array is too large to be printed, NumPy automatically skips the central part of the array and only prints the corners:

In [97]:
big_lst = np.arange(10000).reshape(100,100)
big_lst

array([[   0,    1,    2, ...,   97,   98,   99],
       [ 100,  101,  102, ...,  197,  198,  199],
       [ 200,  201,  202, ...,  297,  298,  299],
       ...,
       [9700, 9701, 9702, ..., 9797, 9798, 9799],
       [9800, 9801, 9802, ..., 9897, 9898, 9899],
       [9900, 9901, 9902, ..., 9997, 9998, 9999]])

To disable this behaviour and force NumPy to print the entire array, you can change the printing options using `set_printoptions`.

In [98]:
# np.set_printoptions(threshold=sys.maxsize)  # sys module should be imported

### Basic Operations

Arithmetic operators on arrays apply *elementwise*. A new array is created and filled with the result.



In [101]:
a = np.array([10, 20, 30, 40])
b = np.arange(4)

print(a - b)
print(b ** 2)
print(10 * np.sin(a))
print(a < 21)

[10 19 28 37]
[0 1 4 9]
[-5.44021111  9.12945251 -9.88031624  7.4511316 ]
[ True  True False False]


In [108]:
# important part with product
A = np.array([[1, 1],
              [0, 1]])

B = np.array([[2, 0],
             [3, 4]])
print('elementwise')
print(A * B)    # elementwise product
print('\nmatrix')
print(A @ B)    # matrix product
print('\nmatrix')
print(A.dot(B)) # another matrix product



elementwise
[[2 0]
 [0 4]]

matrix
[[5 4]
 [3 4]]

matrix
[[5 4]
 [3 4]]


Some operations, such as `+=` and `*=`, act in place to **modify an existing array** rather than create a new one.

In [125]:
a = np.ones([3, 4], dtype=int)
b = np.empty([3, 4], dtype=np.float16)

In [126]:
print('before:\n', a)
a *= 3
print('\nafter:\n', a)

before:
 [[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]

after:
 [[3 3 3 3]
 [3 3 3 3]
 [3 3 3 3]]


In [127]:
print('before:\n', b)
b += a
print('\nafter:\n', b)

before:
 [[ 9.     9.     9.    10.984]
 [ 9.     9.     9.    10.99 ]
 [ 9.     9.     9.    11.   ]]

after:
 [[12.    12.    12.    13.984]
 [12.    12.    12.    13.99 ]
 [12.    12.    12.    14.   ]]


In [128]:
a += b  # b is not automatically converted to integer type

UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

In [138]:
a = np.ones(3, dtype=np.int32)
b = np.linspace(0, np.pi, 3)
b.dtype.name

'float64'

In [140]:
c = a + b
c

array([1.        , 2.57079633, 4.14159265])

In [141]:
c.dtype.name

'float64'

In [143]:
d = np.exp(c * 1j)
d

array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
       -0.54030231-0.84147098j])

In [148]:
d.dtype.name

'complex128'

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class.

In [149]:
lst = np.empty([3, 5], dtype=np.float16)

In [152]:
lst

array([[ 7.568e-03, -1.526e+01,  9.050e+01,  0.000e+00,  0.000e+00],
       [ 0.000e+00,  0.000e+00,  0.000e+00,  3.877e-01,  1.311e-01],
       [ 3.159e-06,  0.000e+00,  3.874e-06,  0.000e+00,  0.000e+00]],
      dtype=float16)

In [150]:
lst.min()

-15.26

In [151]:
lst.max()

90.5

In [153]:
lst.sum()

75.75

In [154]:
lst.mean()

5.05

By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified `axis` of an array:

In [155]:
b = np.arange(12).reshape(3, 4)
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [160]:
# min of each column
b.min(axis=0) 

array([0, 1, 2, 3])

In [161]:
# min of each row
b.min(axis=1)

array([0, 4, 8])

In [163]:
# sum of each row
b.sum(axis=1)

array([ 6, 22, 38])

In [170]:
b.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28, 36, 45, 55, 66])

In [164]:
b.cumsum(axis=1)  # cumulative sum along each row

array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])

### Universal Function

NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called *“universal functions”* (`ufunc`). Within NumPy, these functions operate elementwise on an array, producing an array as output.

In [171]:
B = np.arange(3)
B

array([0, 1, 2])

In [172]:
np.exp(B)

array([1.        , 2.71828183, 7.3890561 ])

In [173]:
np.sin(B)

array([0.        , 0.84147098, 0.90929743])

In [174]:
np.sqrt(B)

array([0.        , 1.        , 1.41421356])

In [175]:
C = np.array([2., -1., 4.])
np.add(B, C)

array([2., 0., 6.])

### Indexing, Slicing and Iterating

**One-dimensional** arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

In [178]:
a = np.arange(15) ** 3
a

array([   0,    1,    8,   27,   64,  125,  216,  343,  512,  729, 1000,
       1331, 1728, 2197, 2744])

In [179]:
a[2]

8

In [180]:
a[2:5]

array([ 8, 27, 64])

In [183]:
a[:6:2]

array([ 0,  8, 64])

In [186]:
# equivalent to a[0:6:2] = 1000;
# from start to position 6, exclusive, set every 2nd element to 1000
a[:6:2] = 1000 
a

array([1000,    1, 1000,   27, 1000,  125,  216,  343,  512,  729, 1000,
       1331, 1728, 2197, 2744])

In [187]:
a[::-1]

array([2744, 2197, 1728, 1331, 1000,  729,  512,  343,  216,  125, 1000,
         27, 1000,    1, 1000])

In [188]:
for i in a:
    print(i**(1 / 3.))

9.999999999999998
1.0
9.999999999999998
3.0
9.999999999999998
4.999999999999999
5.999999999999999
6.999999999999999
7.999999999999999
8.999999999999998
9.999999999999998
10.999999999999998
11.999999999999998
12.999999999999998
13.999999999999998


**Multidimensional arrays** can have one index per axis. These indices are given in a tuple separated by commas:

In [189]:
def f(x, y):
    return 10 * x + y

b = np.fromfunction(f, (5, 4), dtype=int)
b

array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])

In [194]:
b[2, 3]

23

In [193]:
# each row in the second column of b
b[0:5, 1] 

array([ 1, 11, 21, 31, 41])

In [195]:
# equivalent to the previous example
b[:, 1]

array([ 1, 11, 21, 31, 41])

In [197]:
# each column in the second and third row of b
b[1:3, :]

array([[10, 11, 12, 13],
       [20, 21, 22, 23]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices:

In [198]:
b[-1]

array([40, 41, 42, 43])

The expression within brackets in `b[i]` is treated as an `i` followed by as many instances of `:` as needed to represent the remaining axes. NumPy also allows you to write this using dots as `b[i, ...]`.

The **dots** (`...`) represent as many colons as needed to produce a complete indexing tuple. For example, if `x` is an array with 5 axes, then:

* `x[1, 2, ...]` is equivalent to `x[1, 2, :, :, :]`,

* `x[..., 3]` to `x[:, :, :, :, 3]` and

* `x[4, ..., 5, :]` to `x[4, :, :, 5, :]`.



In [199]:
# a 3D array (two stacked 2D arrays)
c = np.array([[[  0,  1,  2],
               [ 10, 12, 13]],
              [[100, 101, 102],
               [110, 112, 113]]])


In [200]:
c.shape

(2, 2, 3)

In [204]:
# same as c[1, :, :] or c[1]
c[1, ...]

array([[100, 101, 102],
       [110, 112, 113]])

In [205]:
# same as c[:, :, 2]
c[..., 2]

array([[  2,  13],
       [102, 113]])

**Iterating** over multidimensional arrays is done with respect to the first axis:



In [206]:
for row in b:
    print(row)

[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]


In [207]:
# However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is an iterator over all the elements of the array:

In [208]:
for element in b.flat:
    print(element)

0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43


### Shape Manipulation

#### Changing the shape of an array

> `numpy.ravel(a, order='C')`

Return a contiguous flattened array.

>> **some parameters:**

>>> **order{‘C’,’F’, ‘A’, ‘K’}, optional**:

The elements of a are read using this index order. ‘C’ means to index the elements in row-major, C-style order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to index the elements in column-major, Fortran-style order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of axis indexing. ‘A’ means to read the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise. ‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used.

> `ndarray.T`

The transposed array.

> `ndarray.resize(new_shape, refcheck=True)`

Change shape and size of array in-place.



In [225]:
a = np.array([3, 1, 7, 7, 4, 2, 3, 2, 4, 4, 2, 9], dtype=float).reshape(3, 4)
a

array([[3., 1., 7., 7.],
       [4., 2., 3., 2.],
       [4., 4., 2., 9.]])

In [227]:
a.shape

(3, 4)

In [231]:
# returns the array, flattened
a.ravel()

array([3., 1., 7., 7., 4., 2., 3., 2., 4., 4., 2., 9.])

In [230]:
# same
np.ravel(a)

array([3., 1., 7., 7., 4., 2., 3., 2., 4., 4., 2., 9.])

In [234]:
# fortran like order
np.ravel(a, order='F')

array([3., 4., 4., 1., 2., 4., 7., 3., 2., 7., 2., 9.])

In [243]:
a = a.reshape(6, 2)
a

array([[3., 1.],
       [7., 7.],
       [4., 2.],
       [3., 2.],
       [4., 4.],
       [2., 9.]])

In [244]:
# returns the array, transposed
a = a.T
a

array([[3., 7., 4., 3., 4., 2.],
       [1., 7., 2., 2., 4., 9.]])

In [245]:
a.shape

(2, 6)

In [246]:
a.T.shape

(6, 2)

The `reshape` function returns its argument with a modified shape, whereas the `ndarray.resize` method modifies the array itself:

In [247]:
a

array([[3., 7., 4., 3., 4., 2.],
       [1., 7., 2., 2., 4., 9.]])

In [251]:
# change shape in-place
a.resize(4, 3)

In [249]:
a

array([[3., 4., 4.],
       [1., 2., 4.],
       [7., 3., 2.],
       [7., 2., 9.]])

In [250]:
a.shape

(4, 3)

If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated **(ONLY FOR** `reshape`**)**:

In [254]:
a.reshape(-1, 2)

array([[3., 4.],
       [4., 1.],
       [2., 4.],
       [7., 3.],
       [2., 7.],
       [2., 9.]])

### Stacking together different arrays

> `numpy.vstack(tup)`

Stack arrays in sequence vertically (row wise)

> `numpy.hstack(tup)`

Stack arrays in sequence horizontally (column wise).


> `numpy.column_stack(tup)`

Stack 1-D arrays as columns into a 2-D array.

Take a sequence of 1-D arrays and stack them as columns to make a single 2-D array. 2-D arrays are stacked as-is, just like with hstack. 1-D arrays are turned into 2-D columns first.

In [267]:
# Several arrays can be stacked together along different axes:
a = np.floor(10 * rg.random((2, 2)))
a

array([[5., 0.],
       [7., 5.]])

In [268]:
b = np.floor(10 * rg.random((2, 2)))
b

array([[3., 7.],
       [3., 4.]])

In [269]:
np.vstack((a, b))

array([[5., 0.],
       [7., 5.],
       [3., 7.],
       [3., 4.]])

In [270]:
np.hstack((a, b))

array([[5., 0., 3., 7.],
       [7., 5., 3., 4.]])

In [271]:
# 2d arrays converts like hstack
np.column_stack((a, b))

array([[5., 0., 3., 7.],
       [7., 5., 3., 4.]])

In [272]:
a = np.array([4., 2.])
b = np.array([3., 8.])

In [277]:
# returns a 2D array
np.column_stack((a, b))  

array([[4., 3.],
       [2., 8.]])

In [278]:
# the result is different
np.hstack((a, b))        

array([4., 2., 3., 8.])

In [280]:
# view `a` as a 2D column vector
a[:, np.newaxis]  

array([[4.],
       [2.]])

In [283]:
np.column_stack((a[:, np.newaxis], b[:, np.newaxis]))

array([[4., 3.],
       [2., 8.]])

In [286]:
# the result is the same
np.hstack((a[:, np.newaxis], b[:, np.newaxis]))

array([[4., 3.],
       [2., 8.]])

On the other hand, the function `row_stack` is equivalent to `vstack` for any input arrays. In fact, `row_stack` **is an alias for** `vstack`:

In [296]:
np.column_stack is np.hstack


False

In [297]:
np.row_stack is np.vstack

True

In general, for arrays with more than two dimensions, `hstack` stacks along their second axes, `vstack` stacks along their first axes, and `concatenate` allows for an optional arguments giving the number of the axis along which the concatenation should happen.

Note

In complex cases, `r_` and `c_` are useful for creating arrays by stacking numbers along one axis. They allow the use of range literals :.

In [310]:
np.r_[1:4, 0, 4]

array([1, 2, 3, 0, 4])

In [311]:
np.c_[np.array([1,2,3]), np.array([4,5,6])]


array([[1, 4],
       [2, 5],
       [3, 6]])

### Splitting one array into several smaller ones

> `numpy.hsplit(ary, indices_or_sections)

Split an array into multiple sub-arrays horizontally (column-wise).

> `numpy.vsplit(ary, indices_or_sections)`

plit an array into multiple sub-arrays vertically (row-wise).

> `numpy.array_split(ary, indices_or_sections, axis=0)`

Split an array into multiple sub-arrays.

**axis : int**, optional

The axis along which to split, default is

(0 - row-wise)

(1 - column-wise).



Using `hsplit`, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return, or by specifying the columns after which the division should occur:


In [313]:
a = np.floor(10 * rg.random((2, 12)))
a

array([[1., 4., 2., 2., 7., 2., 4., 9., 9., 7., 5., 2.],
       [1., 9., 5., 1., 6., 7., 6., 9., 0., 5., 4., 0.]])

In [316]:
# Split `a` into 3
b = np.hsplit(a, 3)
b

[array([[1., 4., 2., 2.],
        [1., 9., 5., 1.]]),
 array([[7., 2., 4., 9.],
        [6., 7., 6., 9.]]),
 array([[9., 7., 5., 2.],
        [0., 5., 4., 0.]])]

In [318]:
type(b)

list

In [319]:
b[1].shape

(2, 4)

In [320]:
b[1].ndim

2

In [330]:
# Split `a` after the third and the fourth column
np.hsplit(a, (3, 4))

[array([[1., 4., 2.],
        [1., 9., 5.]]),
 array([[2.],
        [1.]]),
 array([[7., 2., 4., 9., 9., 7., 5., 2.],
        [6., 7., 6., 9., 0., 5., 4., 0.]])]

In [334]:
np.vsplit(a, 2)

[array([[1., 4., 2., 2., 7., 2., 4., 9., 9., 7., 5., 2.]]),
 array([[1., 9., 5., 1., 6., 7., 6., 9., 0., 5., 4., 0.]])]

In [336]:
np.array_split(a, 3, axis=1) 

[array([[1., 4., 2., 2.],
        [1., 9., 5., 1.]]),
 array([[7., 2., 4., 9.],
        [6., 7., 6., 9.]]),
 array([[9., 7., 5., 2.],
        [0., 5., 4., 0.]])]

### Copies and Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:

#### No Copy at All

Simple assignments make no copy of objects or their data.



In [59]:
a = np.arange(12).reshape(3,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [60]:
# no new object is created
# a and b are two names for the same ndarray object
b = a
b is a

True

In [61]:
# Python passes mutable objects as references, so function calls make no copy.

def f(x):
    print(id(x))
    
print(id(a))
f(a)

140329679561808
140329679561808


#### View or Shallow Copy

Different array objects can share the same data. The `view` method creates a new array object that looks at the same data.

In [62]:
c = a.view()

In [63]:
c is a

False

In [64]:
# c is a view of the data owned by a
c.base is a            

False

In [65]:
c.flags.owndata

False

In [66]:
# a's shape doesn't change
c = c.reshape((2, 6))   
c

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [67]:
a.shape

(3, 4)

In [68]:
# a's data changes
c[0, 4] = 1234    
a

array([[   0,    1,    2,    3],
       [1234,    5,    6,    7],
       [   8,    9,   10,   11]])

In [69]:
# Slicing an array returns a view of it:

s = a[:, 1:3]
s

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

In [70]:
# s[:] is a view of s. Note the difference between s = 10 and s[:] = 10
s[:] = 10

In [71]:
a

array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

#### Deep Copy 

The `copy` method makes a complete copy of the array and its data.



In [72]:
# a new array object with new data is created
d = a.copy()  

In [73]:
d is a

False

In [74]:
d.base is a

False

In [75]:
d[0, 0] = 9999
d

array([[9999,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

In [76]:
a

array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

Sometimes `copy` should be called after slicing if the original array is not required anymore. For example, suppose `a` is a huge intermediate result and the final result `b` only contains a small fraction of `a`, a deep copy should be made when constructing `b` with slicing:


In [78]:
a = np.arange(int(1e8))
a.shape

(100000000,)

In [80]:
b = a[:100].copy()
b.shape

(100,)

In [81]:
del a  # the memory of ``a`` can be released.

If b = a[:100] is used instead, a is referenced by b and **will persist in memory even if del a is executed**.