# Introduction to Numpy

<a href="https://colab.research.google.com/drive/1dhJ-t8VFtOEqYbw2xIXQWuqgWbkj4QQ5?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Imports


In [1]:
import random
import numpy as np
np.__version__

'1.20.2'

In [14]:
# For reproducibility
random.seed(0)
np.random.seed(0)

Why do we seed? See [What exactly is the function of random seed in python](https://www.edureka.co/community/25335/what-exactly-is-the-function-of-random-seed-in-python)

You can use any random seed you want! Why did i use 42? Read [The Story Behind Random.Seed(42) In Machine Learning](https://medium.com/geekculture/the-story-behind-random-seed-42-in-machine-learning-b838c4ac290a)

If there is no previous value then the current time is taken as previous value automatically.

## Numpy Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the `rank` of the array; the `shape` of an array is a tuple of integers giving the size of the array along each dimension.

In [15]:
arr1 = np.random.randint(10, size=6) # One-dimensional array
print(arr1)
print(type(arr1))

[5 0 3 3 7 9]
<class 'numpy.ndarray'>


In [17]:
arr1_from_list = np.array([5, 0, 3, 3, 7, 9])
print(arr1_from_list)
print(type(arr1_from_list))

[5 0 3 3 7 9]
<class 'numpy.ndarray'>


In [19]:
list_one = random.choices(list(range(10)), k=6)
print(list_one)

[8, 7, 4, 2, 5, 4]


In [20]:
arr1_new_list = np.array(list_one)
print(arr1_new_list)
print(type(arr1_new_list))

[8 7 4 2 5 4]
<class 'numpy.ndarray'>


In [21]:
# We can create even 2 & 3 dimensional arrays
arr2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
arr3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [22]:
print("arr3 ndim: ", arr3.ndim)
print("arr3 shape:", arr3.shape)
print("arr3 size: ", arr3.size)
print(arr3.dtype)

arr3 ndim:  3
arr3 shape: (3, 4, 5)
arr3 size:  60
int64


In [23]:
arr3

array([[[8, 1, 5, 9, 8],
        [9, 4, 3, 0, 3],
        [5, 0, 2, 3, 8],
        [1, 3, 3, 3, 7]],

       [[0, 1, 9, 9, 0],
        [4, 7, 3, 2, 7],
        [2, 0, 0, 4, 5],
        [5, 6, 8, 4, 1]],

       [[4, 9, 8, 1, 1],
        [7, 9, 9, 3, 6],
        [7, 2, 0, 3, 5],
        [9, 4, 4, 6, 4]]])

In [94]:
arr_from_range = np.arange(60)
print(arr_from_range)
print(arr_from_range.shape)
print(type(arr_from_range))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59]
(60,)
<class 'numpy.ndarray'>


In [27]:
# We can reshape an existing array
arr_reshaped = arr_from_range.reshape(3, 4, 5)
print(arr_reshaped)

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]]

 [[40 41 42 43 44]
  [45 46 47 48 49]
  [50 51 52 53 54]
  [55 56 57 58 59]]]


In [35]:
# When we reshape, we must ensure same number of elements remain!
arr_from_range.reshape(3, 4, 4)

ValueError: cannot reshape array of size 60 into shape (3,4,4)

In [95]:
# There are other ways to reshape arrays
# This gives a row vector
arr_new_axis_row = arr_from_range[np.newaxis, :]
print(arr_new_axis_row)
print(arr_new_axis_row.shape)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
  48 49 50 51 52 53 54 55 56 57 58 59]]
(1, 60)


In [96]:
# This gives a row vector
arr_new_axis_col = arr_from_range[:, np.newaxis]
print(arr_new_axis_col)
print(arr_new_axis_col.shape)

[[ 0]
 [ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]
 [11]
 [12]
 [13]
 [14]
 [15]
 [16]
 [17]
 [18]
 [19]
 [20]
 [21]
 [22]
 [23]
 [24]
 [25]
 [26]
 [27]
 [28]
 [29]
 [30]
 [31]
 [32]
 [33]
 [34]
 [35]
 [36]
 [37]
 [38]
 [39]
 [40]
 [41]
 [42]
 [43]
 [44]
 [45]
 [46]
 [47]
 [48]
 [49]
 [50]
 [51]
 [52]
 [53]
 [54]
 [55]
 [56]
 [57]
 [58]
 [59]]
(60, 1)


In [28]:
# Create an array of zeros. Can be any shape!
arr_zeros = np.zeros(60)
print(arr_zeros)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [29]:
# Fill an existing array with a particulat value
arr_zeros.fill?

[0;31mDocstring:[0m
a.fill(value)

Fill the array with a scalar value.

Parameters
----------
value : scalar
    All elements of `a` will be assigned this value.

Examples
--------
>>> a = np.array([1, 2])
>>> a.fill(0)
>>> a
array([0, 0])
>>> a = np.empty(2)
>>> a.fill(1)
>>> a
array([1.,  1.])
[0;31mType:[0m      builtin_function_or_method


In [31]:
# Create an array of 1s
arr_ones = np.ones((3, 4, 5))
print(arr_ones)

[[[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1.]]]


In [32]:
# We can create a new array with same shape as an existing array
arr_like_arr2 = np.ones_like(arr2)
print(arr_like_arr2)

[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]


In [34]:
# We can also create an array of random float numbes
arr_randn = np.random.randn(3, 4)
print(arr_randn)

[[-0.3756659  -0.03822364  0.36797447 -0.0447237 ]
 [-0.30237513 -2.2244036   0.72400636  0.35900276]
 [ 1.07612104  0.19214083  0.85292596  0.01835718]]


## Indexing

In [36]:
print(arr1)

[5 0 3 3 7 9]


In [37]:
# Remember that indices begin at 0
arr1[0]

5

In [39]:
# Just like with python lists, we can use negative indices
arr1[-1]

9

In [40]:
arr1[-2]

7

In [41]:
# In multi-dimensional array, we can pass in multiple indices
print(arr2)

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]


In [42]:
arr2[0]

array([3, 5, 2, 4])

In [59]:
# Index the first row. Then index the first element in the retrieved row
%time
arr2[0][1]

CPU times: user 1 µs, sys: 4 µs, total: 5 µs
Wall time: 7.15 µs


5

In [60]:
# Index the first row. Then index the first element in the retrieved row
%time
arr2[0, 1]

CPU times: user 1e+03 ns, sys: 1 µs, total: 2 µs
Wall time: 3.1 µs


5

Notice that the second option is slightly faster. With a larger array, this difference may compound. Take advantage of Numpy's capabilities as much as possible!

In [61]:
print(arr2)

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]


In [62]:
# Note that any value we pass in is converted to the array dtype. Beware!
arr2[0, 0] = 12
print(arr2)

[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


In [None]:
# This means we can pass in any values that can be converted easily
# This is convenient :) 
arr2[0, 0] = "56"
print(arr2)

In [63]:
# This can produce unintended results too! 
arr2[0, 0] = 4.56
print(arr2)
print(arr2.dtype)

In [67]:
# This will fail during the conversion
arr2[0, 0] = "Hello"
print(arr2)

ValueError: invalid literal for int() with base 10: 'Hello'

### Subarrays

The general syntax for slicing any array, `x`, is

x[start:stop:step]

If any of these are defined, they take on defaults:
* start -> 0
* stop -> size of dimensio
* step -> 1

In [69]:
print(arr1)

[5 0 3 3 7 9]


In [70]:
# first three elements
arr1[:3]

array([5, 0, 3])

In [71]:
# elements from index 2
arr1[2:]

array([3, 3, 7, 9])

In [72]:
arr1[2:5]

array([3, 3, 7])

In [74]:
# elements from index 2, use step size of 2
arr1[2::2]

array([3, 7])

In [75]:
# negative step reverses the array retrieved
# all elements, reversed
arr1[::-1]

array([9, 7, 3, 3, 0, 5])

In [76]:
print(arr2)

[[56  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


In [77]:
# retrieve first two rows, first three columns
arr2[:2, :3]

array([[56,  5,  2],
       [ 7,  6,  8]])

In [78]:
# retrieve all rows and first three columns
arr2[:, :3]

array([[56,  5,  2],
       [ 7,  6,  8],
       [ 1,  6,  7]])

In [79]:
# You can even reverse both dimensions
arr2[::-1, ::-1]

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 56]])

Sub-arrays are copies of the original arrays. This means that when we work with large datasets, we can access and process pieces of these dataset

In [81]:
arr2_sub = arr2[:2, :2]
print(arr2_sub)

[[56  5]
 [ 7  6]]


In [82]:
# Fill first row with value 0
arr2_sub[0].fill(0)
print(arr2_sub)

In [84]:
# Our original array is modified also! 
print(arr2)

[[0 0 2 4]
 [7 6 8 8]
 [1 6 7 7]]


### Creating copies of arrays

In [87]:
print(arr2)

[[0 0 2 4]
 [7 6 8 8]
 [1 6 7 7]]


In [88]:
arr2_sub_new = arr2[:2, :2].copy()
print(arr2_sub_new)

[[0 0]
 [7 6]]


In [89]:
arr2_sub_new[0, -1] = 5
print(arr2_sub_new)

[[0 5]
 [7 6]]


In [90]:
print(arr2)

[[0 0 2 4]
 [7 6 8 8]
 [1 6 7 7]]


## Concatenation & Splitting

In [97]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [99]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [100]:
# concatenate along rows (default)
np.concatenate([grid, grid])


array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [None]:
# concatenate along columns
np.concatenate([grid, grid], axis=1)

In [101]:
# To split, we pass the index where new arrays should begin
x = [4, 5, 6, 7, 8, 9, 0, 12, 13, 24, 35, 46]
x1, x2, x3 = np.split(x, [4, 7])

In [102]:
print(x1, x2, x3)

[4 5 6 7] [8 9 0] [12 13 24 35 46]


## Computations on Arrays

In [110]:
print(arr1)

[5 1 3 3 7 9]


In [109]:
arr1[1] = 1 # to avoid zero division error/warning 

In [111]:
# we can compute reciprocals directly
1 / arr1

array([0.2       , 1.        , 0.33333333, 0.33333333, 0.14285714,
       0.11111111])

In [112]:
print(arr1)

[5 1 3 3 7 9]


In [122]:
# we can even divde two numpy arrays
# note that here, we use arrays of same dimensions
k = np.arange(10)
print(k)
print(k.sum())

[0 1 2 3 4 5 6 7 8 9]


In [123]:
k / k.sum()

array([0.        , 0.02222222, 0.04444444, 0.06666667, 0.08888889,
       0.11111111, 0.13333333, 0.15555556, 0.17777778, 0.2       ])

### Applying Ufuncs
Ufuncs operate on a single input

In [115]:
# we can perform unary negation
-arr1

array([-5, -1, -3, -3, -7, -9])

In [116]:
# or take the square
arr1 ** 2

array([25,  1,  9,  9, 49, 81])

In [117]:
# or even use our array in an equation
-(0.1 * arr1 + 4) ** 4

array([-410.0625, -282.5761, -341.8801, -341.8801, -487.9681, -576.4801])

In [118]:
# there are also some arithmetic operators implemented in Numpy
np.multiply(arr1, 8)

array([40,  8, 24, 24, 56, 72])

In [119]:
# Be sure to check what `floor_divide` means!
np.floor_divide(arr2, arr2.max())

array([[0, 0, 0, 0],
       [0, 0, 1, 1],
       [0, 0, 0, 0]])

In [120]:
# trigonometric functions work too
np.cos(arr1)

array([ 0.28366219,  0.54030231, -0.9899925 , -0.9899925 ,  0.75390225,
       -0.91113026])

In [121]:
# as do logarithms and exponents
np.power(3, arr1)

array([  243,     3,    27,    27,  2187, 19683])