# NumPy

* stands for Numerical Python
* provides Python with an extensive math library capable of performing numerical computations effectively and efficiently.
* fundamental package designed for efficient scientific computing in Python
* built on top of language C, which works on a lower level on computer
* core of NumPy is N-dimensional array object (ndarrays) with elements of the same data type
 * also used as a container for generic data & allows efficient integration with databases
* comes installed with Anaconda

Import NumPy to use functionality in this notebook

In [1]:
import numpy as np

### Advantages of NumPy

**Power** <br>
NumPy can handle more datatypes than Python

**Efficiency & Optimization** <br>
When performing operations on large arrays NumPy can often perform several orders of magnitude faster than Python lists

1. NumPy's multidimensional array data structures (ndarrays)
 1. the nature of NumPy ndarrays are more memory-efficient
 2. can represent vectors and matrices, used in a lot of machine learning algorithms, & have optimized for these structures.
 
2. NumPy has a large number of optimized built-in mathematical functions 
 1. optimized algorithms for doing complex arithmetic, statistical, and linear algebra operations.
 2. increases readability by reducing code (avoiding the use of complicated loops) 
 

In [10]:
'''
Calculate mean of large array using Python vs NumPy to see which is faster
'''
import time

# generates array of 100 million floats btw 0 and 1
x = np.random.random(100000000)

# Case 1
start = time.time()
sum(x) / len(x)
print("Python time to calculate mean:", time.time() - start)

# Case 2
start = time.time()
np.mean(x)
print("NumPy time to calculate mean:", time.time() - start)


Python time to calculate mean: 13.831938743591309

NumPy time to calculate mean: 0.0525362491607666


### N-Dimensional Arrays

* NumPy's ndarrays are optimized for data science & machine learning
* ndarrays are mutable, but they cannot be resized
 * deleting elements creates a new array
* numpy arrays must contain all the same datatypes
 * if converting a py list of mixed data types, numpy will upcast 
* arrays with 1 dimension are called 'Rank 1', 2D arrays are 'Rank 2', etc.
 
***
You can create NP arrays 2 ways
1. Convert a Python list
2. Create from scratch

## Creating NP Arrays from Python List
Numpy will convert python lists to arrays with a default datatype for the elements. Since Numpy only allows homogenous type elements, if a list with mixed datatypes is given, numpy will upcast the types.

**Convert Python List to Numpy Array**

In [32]:
# one-dimensional(1D) array containing strings
arr_str_1d = np.array(["one", "two", "three", "four", "five"])

# 2D array containing ints
arr_int_2d = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])

# array containing mixed data types of string & ints
arr_mix_str_int = np.array([1, 2, "three"])

# array containing mixed data types of floats & ints
arr_mix_float_int = np.array([1, 2.5, 4])

print("type:", type(x))

type: <class 'numpy.ndarray'>


**Get the datatype of elements in array to see how they're created**

In [38]:
# string arrays are stored as unicode strings of 5 characters
arr_str_1d.dtype

dtype('<U5')

In [34]:
# integer arrays are stored as 64 bit integers
arr_int_2d.dtype

dtype('int64')

In [35]:
# arrays with mixed datatype of strings & ints are converted to the same type 
# & stored as unicode strings of 21 characters due to upcasting
print(arr_mix_str_int)
print(arr_mix_str_int.dtype)

['1' '2' 'three']
<U21


In [36]:
# arrays with mixed datatype of floats & ints are converted to the same type 
# & stored as unicode strings of 21 characters due to upcasting to floats
print(arr_mix_float_int)
print(arr_mix_float_int.dtype)

[1.  2.5 4. ]
float64


**Can assign datatype to array if you're unsure about the existing datatypes**

In [39]:
# convert float array to ints
arr_convert_float_int = np.array([1.5, 2.2, 3.7], dtype=np.int64)
print(arr_convert_float_int)
print(arr_convert_float_int.dtype)

[1 2 3]
int64


## Creating Ndarrays from Scratch

**Create Numpy Array of 0's with a specified shape**

In [62]:
nd_arr = np.zeros((3, 4))
print(np_arr)

# default of elements is float64
print("default dtype:", nd_arr.dtype)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
default dtype: float64


**Create ndarray with specified datatype using the dtype arg**

In [58]:
nd_arr_int = np.zeros((3, 4), dtype=int)
print(nd_arr_int)
print("specified dtype:", nd_arr_int.dtype)

[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]
spedified dtype: int64


**Create ndarray with 1's**

In [60]:
nd_arr_1s = np.ones((3, 4))
print(nd_arr_1s)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


**Create ndarray with specified shape & constant**

In [61]:
nd_arr_s = np.full((3, 4), 5)
print(nd_arr_s)

# dtype is same as constant arg
print("default dtype:", nd_arr_s.dtype)

[[5 5 5 5]
 [5 5 5 5]
 [5 5 5 5]]
default dtype: int64


**Create Identity Matrix**<br>
An Identity Matrix is a square matrix filled with zeroes, except diagonal is filled with ones.

In [64]:
# the arg specifies the size
id_matrix = np.eye(5)
id_matrix

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

**Create a Square Matrix with specified diagonals**

In [66]:
# arg takes a list of diagonals
diag_matrix = np.diag([10, 20, 30 ,50])
print(diag_matrix)

[[10  0  0  0]
 [ 0 20  0  0]
 [ 0  0 30  0]
 [ 0  0  0 50]]


**Create ndarray with specified range of ints**<br>
Although arange() can accept non-int steps, the output is usually inconsistent due to finite floating point precision. Use next func instead.

In [69]:
# when only 1 arg is given, it's used as a stop arg, which is exclusive
nd_arg1 = np.arange(10)
print("1 arg is 10:\n", nd_arg1)

# when 2 args given, 1st is start, 2nd is stop
nd_arg2 = np.arange(4, 10)
print("2 args, 4 & 10:\n", nd_arg2)

# when 3 args given, 1st is start, 2nd is stop, 3rd is step
nd_arg3 = np.arange(1, 14, 3)
print("3 args, 1, 14, 3:\n", nd_arg3)

1 arg is 10:
 [0 1 2 3 4 5 6 7 8 9]
2 args, 4 & 10:
 [4 5 6 7 8 9]
3 args, 1, 14, 3:
 [ 1  4  7 10 13]


**Create ndarray with specified range of floats**<br>
* Takes 3 args: start, stop, n
 * requires at least 2 args: stop & stop
 * n isn't the step, but the number of values you want
 * if n isn't specified, it defaults to 50
* start & stop are inclusive by default
 * can override this by adding an additional param: endpoint=False

In [78]:
nd_float_n_range = np.linspace(0, 25, 10)
print("float range array, inclusive:\n", nd_float_step, "\n")

nd_float_step_exclusive = np.linspace(0, 25, 10, endpoint=False)
print("float range array, exclusive:\n", nd_float_step_exclusive)

float range array, inclusive:
 [ 0.          2.77777778  5.55555556  8.33333333 11.11111111 13.88888889
 16.66666667 19.44444444 22.22222222 25.        ] 

float range array, exclusive:
 [ 0.   2.5  5.   7.5 10.  12.5 15.  17.5 20.  22.5]


**Create randomized arrays/matrices**<br>
* often use random in machine learning
 * when initializing weights in neural networks
* functions are contained in numpy's random module 
 * so must use dot op to access 'random' module before using functions

In [84]:
# array of random floats between 0 & 1, where 0 is inclusive & 1 exclusive
rand_arr = np.random.random((3, 3))
print("random float array:\n", rand_arr)

random float array:
 [[0.04215499 0.89094011 0.28572072]
 [0.48770199 0.53009047 0.62885518]
 [0.0392371  0.21078589 0.59738158]]


In [85]:
# array of random ints in a given range, where 
 # 1st arg is the lower bound of range, inclusive
 # 2nd arg is upper bound of range, exclusive
 # 3rd arg is tuple for shape
rand_int_range = np.random.randint(4, 15, (3, 2))
print("random int array w range:\n", rand_int_range)

random int array w range:
 [[ 4 10]
 [11  7]
 [ 8  7]]


**Create random permuted sequence** <br>
This won't have duplicates like the randint() function

In [6]:
# returns randomized sequence of 0-9
print(np.random.permutation(10))

# returns randomized sequence of given
print(np.random.permutation([1, 4, 9, 12, 15]))


[8 6 5 9 4 2 3 1 7 0]
[12  9 15  1  4]


**Create randomized arrays/matrices that satisfy some statistical properties**<br>
* Can create random arrays with numbers drawn from various probability distributions
* Can get mean, stan dev, min, max, etc.

In [89]:
# Gaussian
# creates a 1000 x 1000 array with a given shape containing random floats picked from a normal distribution
 # with a given mean of 0 & standard deviation of 0.1
norm_dist_arr = np.random.normal(0, 0.1, size=(1000, 1000))
print(norm_dist_arr)
print("\n")
print("mean:", norm_dist_arr.mean())
print("std:", norm_dist_arr.std())
print("max:", norm_dist_arr.max())
print("min:", norm_dist_arr.min())
print("# positive nums:", (norm_dist_arr > 0).sum())
print("# negative nums:", (norm_dist_arr < 0).sum())

[[ 0.10183654 -0.11481208 -0.06143871 ...  0.0581417   0.06721711
   0.05968294]
 [ 0.05987433 -0.05824493 -0.00831663 ...  0.01480538 -0.06950297
  -0.02836399]
 [-0.00568082  0.08548414 -0.06340922 ...  0.01158762 -0.10754176
  -0.12194862]
 ...
 [ 0.0808518   0.03925191 -0.1470603  ... -0.08791296 -0.01541348
  -0.00590724]
 [ 0.00057217 -0.06272317 -0.0057382  ...  0.12524738  0.07976549
  -0.10397455]
 [-0.05409676  0.02521265  0.0891198  ... -0.16189873 -0.27363629
  -0.14128807]]


mean: -5.608245499639116e-05
std: 0.09999876249515768
max: 0.46492353085442906
min: -0.4721527982718894
# positive nums: 500161
# negative nums: 499839


As we can see, the average of the random numbers in the ndarray is close to zero, both the maximum and minimum values in X are symmetric about zero (the average), and we have about the same amount of positive and negative numbers.

# Accessing & Modifying ndarrays

## Indexing 

In [17]:
# setup

# rank 1 indexing & modifying
r1 = np.array([1, 2, 3, 4, 5])
print("rank 1 array:", r1)

#rank 2 indexing & modifying
r2 = np.arange(1, 10).reshape(3, 3)
print("rank 2 array:\n", r2)

rank 1 array: [1 2 3 4 5]
rank 2 array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


**Accessing with Indices** <br>
Rank 1 ndarrays accessed the same way as regular arrays, but higher ranks different

In [13]:
# 1st arg is array, 2nd arg is element
elem_2_arr_1 = r2[0, 1]
elem_1_arr_3 = r2[2, 0]
print("1st element in 3rd array:", elem_1_arr_3)

1st element in 3rd array: 7


**Modifying with Indices**

In [94]:
r2[1, 2] = 100
print("modified rank 2 array:\n", r2)

modified rank 2 array:
 [[  1   2   3]
 [  4   5 100]
 [  7   8   9]]


**Boolean Indexing** <br>
When we don't know the indices of the elements we want to select, but want to select all that meet a certain requirement

In [18]:
# grab elements
elems_l = r2[r2 < 5]
print("Elements < 5:", elems_l)

elems_mod = r2[r2 % 2 == 0]
print("Elements that are even:", elems_mod)

# modify elements
r2[(r2 >= 3) & (r2 < 8)] = -1
print("Assign -1 to elements between <= 3 & > 8:\n", r2)

Elements < 5: [1 2 3 4]
Elements that are even: [2 4 6 8]
Assign -1 to elements between <= 3 & > 8:
 [[ 1  2 -1]
 [-1 -1 -1]
 [-1  8  9]]


## Slicing 
* used for retrieving subsets into different data (e.g. training, cross validation, testing sets)
* slicing is the same as other languages for rank 1 arrays, but different for higher ranking arrays
* slicing doesn't create a copy, but rather a view of original array
 * if you change a value in the subset, it will reflect in the original
 * if you want a copy, you need to use the copy function or method

In [4]:
# create matrix
mat = np.arange(1, 21).reshape(4, 5)
print("starting array:\n", mat)

starting array:
 [[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]]


**Matrix subset** <br>
the 1st arg specifies rows, 2nd arg specified columns

In [127]:
# grab subset of matrix
sub_mat = mat[1:, 2:4]
print(sub_mat)

[[ 8  9]
 [13 14]
 [18 19]]


**Accessing ndrray column/row**<br>
2 different syntaxes depending on if you want the output to be rank 1 or rank 2

In [135]:
# column into rank 1
sub_arr = mat[:, 2]
print("Column to rank 1:\n", sub_arr)

# column into rank 2
sub_arr_r2 = mat[:, 2:3]
print("Column to rank 2:\n", sub_arr_r2)

Column to rank 1:
 [ 3  8 13 18]
Column to rank 2:
 [[ 3]
 [ 8]
 [13]
 [18]]


**Copy slice** <br>
Can be used as a function or a method

In [6]:
# copy function
copy_func_sub_a = np.copy(mat[1:, 2:4])
print(copy_func_sub_a)

# copy method
copy_meth_sub_arr = mat[1:, 2:4].copy()
print(copy_meth_sub_arr)

[[ 8  9]
 [13 14]
 [18 19]]
[[ 8  9]
 [13 14]
 [18 19]]


## Accessing with Functions

**Extract Diagonal Elements** <br>
Can extract elements along diagonal, above, or below it

In [7]:
# get diagonal
diag_a = np.diag(mat)
print(diag_a)

[ 1  7 13 19]


In [8]:
# get elements above diagonal by setting param k=1
above_diag = np.diag(mat, k=1)
print(above_diag)

[ 2  8 14 20]


In [10]:
# get elements below diagonal by setting k to negative #
below_diag = np.diag(mat, k=-2)
print(below_diag)

[11 17]


**Extract unique elements** <br>
Also sorts result

In [13]:
# array with repeated values
dupl_arr = np.array([[1, 3, 4],[3, 8, 10],[3, 9, 3]])
unique_arr = np.unique(dupl_arr)
print(unique_arr)

[ 1  3  4  8  9 10]
[[ 1  3  4]
 [ 3  8 10]
 [ 3  9  3]]


**Set Operations** <br>
* useful when comparing 2 arrays
* union, intersection, & difference

In [26]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([6, 7, 2, 8, 4])
print('The elements that are both in x and y:', np.intersect1d(x,y))
print('The elements that are in x that are not in y:', np.setdiff1d(x,y))
print('All the elements of x and y:',np.union1d(x,y))

The elements that are both in x and y: [2 4]
The elements that are in x that are not in y: [1 3 5]
All the elements of x and y: [1 2 3 4 5 6 7 8]


## Modifying with Functions

**Change shape (dimensions) of ndarray**<br>
* Have to make sure that the shape is compatible with original array, i.e. shape args must be factors of orginal array or it will throw an error.
* Can use reshape as a function by passing in an array with the args, or as a method on an existing array

In [81]:
original_arr = np.zeros(20)
print("original array:", original_arr)

# done by passing in original
shape_2d_arr_arg = np.reshape(original_arr, (4,5))
print("solo shaped array:\n", shape_2d_arr_arg)

# done by using dot op method
shape_2d_arr_dot = np.linspace(0, 50, 10, endpoint=False).reshape(5, 2)
print("dot op shaped array:\n", shape_2d_arr_dot)

original array: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
solo shaped array:
 [[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
dot op shaped array:
 [[ 0.  5.]
 [10. 15.]
 [20. 25.]
 [30. 35.]
 [40. 45.]]


**Sorting ndarrays** <br>
* Can use sort as a function or method, but results are different
 * sort function sorts out-of-place, not changing the original array & creates a copy instead
 * sort method sorts in-place & changes the original
* For rank 2, use axis param to sort by rows or columns
 * axis=0 sorts by rows
 * axis=1 sorts by columns
 * axis=None sorts array is flattened & returns rank 1 array
 * if no axis specified, default is -1, which sorts as 1?
* Can use unique() function to sort unique elements as well, but won't alter original

In [7]:
# sorting rank 1 arrays
arr_to_sort = np.array([1, 5, 7, 2, 4, 3])

sorted_func = np.sort(arr_to_sort)
print("new sorted array using sort function:", sorted_func)
print("original array not changed:", arr_to_sort)

arr_to_sort.sort()
print("original array after sort method:", arr_to_sort)


new sorted array using sort function: [1 2 3 4 5 7]
original array not changed: [1 5 7 2 4 3]
original array after sort method: [1 2 3 4 5 7]


In [16]:
# sorting rank 2+ arrays: need to use axis param
r2_to_sort = np.random.randint(1, 11, size=(5, 5))
print("original r2 array:\n", r2_to_sort)

print("sorted by rows:\n", np.sort(r2_to_sort, axis=0))
print("sorted by columns:\n", np.sort(r2_to_sort, axis=1))
print("sorted by default:\n", np.sort(r2_to_sort))
print("sorted by flattening:", np.sort(r2_to_sort, axis=None))
print("sorted by unique():", np.unique(r2_to_sort))

original r2 array:
 [[ 7  5  4  7  7]
 [ 9  6  1  9 10]
 [10  3  2  3  4]
 [ 3  7  8 10  8]
 [ 5  7  8  5  7]]
sorted by rows:
 [[ 3  3  1  3  4]
 [ 5  5  2  5  7]
 [ 7  6  4  7  7]
 [ 9  7  8  9  8]
 [10  7  8 10 10]]
sorted by columns:
 [[ 4  5  7  7  7]
 [ 1  6  9  9 10]
 [ 2  3  3  4 10]
 [ 3  7  8  8 10]
 [ 5  5  7  7  8]]
sorted by default:
 [[ 4  5  7  7  7]
 [ 1  6  9  9 10]
 [ 2  3  3  4 10]
 [ 3  7  8  8 10]
 [ 5  5  7  7  8]]
sorted by flattening: [ 1  2  3  3  3  4  4  5  5  5  6  7  7  7  7  7  7  8  8  8  9  9 10 10
 10]
sorted by unique(): [ 1  2  3  4  5  6  7  8  9 10]


## Deleting
* 1st arg is array
* 2nd arg is list of elements to be deleted
* 3rd arg is axis
 * axis 0 to select rows, axis 1 to select columns
 * not required for rank 1 arrays
* returns new array without specified elements since ndarrays cannot be resized

In [104]:
# rank 1 delete multiple
del_r1 = np.delete(r1, [0, 4])
print(del_r1)

[2 3 4]


In [105]:
# rank 2 delete row
del_row = np.delete(r2, 0, axis=0)
print(del_row)

[[4 5 6]
 [7 8 9]]


In [98]:
# rank 2 delete column
del_2_col = np.delete(r2, [0,1], axis=1)
print(del_2_col)

[[  3]
 [100]
 [  9]]


## Adding
When adding rows/cols to rank 2, must match the shape

**Appending**

In [106]:
# r1 append multiple
add_r1 = np.append(r1, [6, 7])
print(add_r1)

[1 2 3 4 5 6 7]


In [108]:
# r2 append row
r2_add_row = np.append(r2, [[10, 11, 12]], axis=0)
print(r2_add_row)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


**Inserting**

In [111]:
# r1 insert in middle
r1_in = np.insert(r1, 2, [53, 89])
print(r1_in)

[ 1  2 53 89  3  4  5]


In [112]:
# r2 insert row
r2_in_row = np.insert(r2, 2, [301, 303, 304], axis=0)
print(r2_in_row)

[[  1   2   3]
 [  4   5   6]
 [301 303 304]
 [  7   8   9]]


**Stacking**<br>
Note that side stacking requires reshaping

In [123]:
x = np.array([1, 2])
y = np.array([[3, 4], [5, 6]])

# vertical stack
top_stack = np.vstack((x, y))
print("vstack:\n", top_stack)

# horizontal stack
side_stack = np.hstack((y, x.reshape(2, 1)))
print("hstack:\n", side_stack)

vstack:
 [[1 2]
 [3 4]
 [5 6]]
hstack:
 [[3 4 1]
 [5 6 2]]


## Element-wise Operations
* when performing element-wise operations, the shapes of the ndarrays being operated on, must have the same shape or be broadcastable
* Broadcasting is the term used to describe how NumPy handles element-wise arithmetic operations with ndarrays of different shapes. 
 * broadcasting is used implicitly when doing arithmetic operations between scalars and ndarrays.

**Arethmetic Operations between 1D arrays** <br>
Can use functions or arethmetic symbols, but functions usually have options that you can tweak using keywords

In [20]:
# create two rank 1 ndarrays
x = np.array([1,2,3,4])
y = np.array([5.5,6.5,7.5,8.5])

print('x = ', x)
print('y = ', y)
print()

# Add, subtract, multiply, divide using symbols & functions
print('x + y = ', x + y)
print('add(x,y) = ', np.add(x,y))
print('x - y = ', x - y)
print('subtract(x,y) = ', np.subtract(x,y))
print('x * y = ', x * y)
print('multiply(x,y) = ', np.multiply(x,y))
print('x / y = ', x / y)
print('divide(x,y) = ', np.divide(x,y))

x =  [1 2 3 4]
y =  [5.5 6.5 7.5 8.5]

x + y =  [ 6.5  8.5 10.5 12.5]
add(x,y) =  [ 6.5  8.5 10.5 12.5]
x - y =  [-4.5 -4.5 -4.5 -4.5]
subtract(x,y) =  [-4.5 -4.5 -4.5 -4.5]
x * y =  [ 5.5 13.  22.5 34. ]
multiply(x,y) =  [ 5.5 13.  22.5 34. ]
x / y =  [0.18181818 0.30769231 0.4        0.47058824]
divide(x,y) =  [0.18181818 0.30769231 0.4        0.47058824]


**Arethmetic Operations between 2D arrays of the same shape**

In [21]:
# create rank 3 arrays
X = np.array([1,2,3,4]).reshape(2,2)
Y = np.array([5.5,6.5,7.5,8.5]).reshape(2,2)
print('X = \n', X)
print('Y = \n', Y)
print()

# Add, subtract, multiply, divide using symbols & functions
print('X + Y = \n', X + Y)
print('add(X,Y) = \n', np.add(X,Y))
print('X - Y = \n', X - Y)
print('subtract(X,Y) = \n', np.subtract(X,Y))
print('X * Y = \n', X * Y)
print('multiply(X,Y) = \n', np.multiply(X,Y))
print('X / Y = \n', X / Y)
print('divide(X,Y) = \n', np.divide(X,Y))

X = 
 [[1 2]
 [3 4]]
Y = 
 [[5.5 6.5]
 [7.5 8.5]]

X + Y = 
 [[ 6.5  8.5]
 [10.5 12.5]]
add(X,Y) = 
 [[ 6.5  8.5]
 [10.5 12.5]]
X - Y = 
 [[-4.5 -4.5]
 [-4.5 -4.5]]
subtract(X,Y) = 
 [[-4.5 -4.5]
 [-4.5 -4.5]]
X * Y = 
 [[ 5.5 13. ]
 [22.5 34. ]]
multiply(X,Y) = 
 [[ 5.5 13. ]
 [22.5 34. ]]
X / Y = 
 [[0.18181818 0.30769231]
 [0.4        0.47058824]]
divide(X,Y) = 
 [[0.18181818 0.30769231]
 [0.4        0.47058824]]


**Arethmetic Operations between 2D arrays of of different shapes** <br>
NumPy is able to add 1 x 3 and 3 x 1 ndarrays to 3 x 3 ndarrays by broadcasting the smaller ndarrays along the big ndarray so that they have compatible shapes. 
In general, NumPy can do this provided that the smaller ndarray, such as the 1 x 3 ndarray in our example, can be expanded to the shape of the larger ndarray in such a way that the resulting broadcast is unambiguous.

In [5]:
# create a rank 1 ndarray to work with
x = np.array([1,2,3])

# create a 3 x 3 ndarray to work with
Y = np.array([[1,2,3],[4,5,6],[7,8,9]])

# create a 3 x 1 ndarray
Z = np.array([1,2,3]).reshape(3,1)

print('x = ', x)
print('Y = \n', Y)
print('Z = \n', Z)

print('x + Y = \n', x + Y)
print('Z + Y = \n',Z + Y)

x =  [1 2 3]
Y = 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Z = 
 [[1]
 [2]
 [3]]
x + Y = 
 [[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]]
Z + Y = 
 [[ 2  3  4]
 [ 6  7  8]
 [10 11 12]]


**Arithmetic with single numbers** <br>
NumPy is working behind the scenes to broadcast 3 along the ndarray so that they have the same shape. This allows us to add 3 to each element of X with just one line of code.

In [4]:
# create a 2 x 2 ndarray to work with
X = np.array([[1,2], [3,4]])
print('X = \n', X)
print('3 * X = \n', 3 * X)
print('3 + X = \n', 3 + X)
print('X - 3 = \n', X - 3)
print('X / 3 = \n', X / 3)

X = 
 [[1 2]
 [3 4]]
3 * X = 
 [[ 3  6]
 [ 9 12]]
3 + X = 
 [[4 5]
 [6 7]]
X - 3 = 
 [[-2 -1]
 [ 0  1]]
X / 3 = 
 [[0.33333333 0.66666667]
 [1.         1.33333333]]


**Mathematic Functions** <br>
List of math funcs can be found: https://numpy.org/devdocs/reference/routines.math.html?highlight=arithmetic#mathematical-functions

In [2]:
# create a rank 1 ndarray to work with
x = np.array([1,2,3,4])
print('x = ', x)

# The exponential function is e^x where e is a mathematical constant called Euler's number, approximately 2.718281. 
 # This value has a close mathematical relationship with pi and the slope of the curve e^x is equal to its value at every point. 
  # np.exp() calculates e^x for each value of x in your input array.
print('EXP(x) =', np.exp(x))

# gets square root of each element 
print('SQRT(x) =',np.sqrt(x))

# raise all elements to the power of 2
print('POW(x,2) =',np.power(x,2)) 

x =  [1 2 3 4]
EXP(x) = [ 2.71828183  7.3890561  20.08553692 54.59815003]
SQRT(x) = [1.         1.41421356 1.73205081 2.        ]
POW(x,2) = [ 1  4  9 16]


**Statistical Functions** <br>
Most of the statistical operations can be done using either a function or an equivalent method. For example, both numpy.mean function and numpy.ndarray.mean method will return the arithmetic mean of the array elements along the given axis.

In [3]:
# create a 2 x 2 ndarray to work with
X = np.array([[1,2], [3,4]])
print('X = \n', X)
print('Average of all elements in X:', X.mean())
print('Average of all elements in the columns of X:', X.mean(axis=0))
print('Average of all elements in the rows of X:', X.mean(axis=1))
print()
print('Sum of all elements in X:', X.sum())
print('Sum of all elements in the columns of X:', X.sum(axis=0))
print('Sum of all elements in the rows of X:', X.sum(axis=1))
print()
print('Standard Deviation of all elements in X:', X.std())
print('Standard Deviation of all elements in the columns of X:', X.std(axis=0))
print('Standard Deviation of all elements in the rows of X:', X.std(axis=1))
print()
print('Median of all elements in X:', np.median(X))
print('Median of all elements in the columns of X:', np.median(X,axis=0))
print('Median of all elements in the rows of X:', np.median(X,axis=1))
print()
print('Maximum value of all elements in X:', X.max())
print('Maximum value of all elements in the columns of X:', X.max(axis=0))
print('Maximum value of all elements in the rows of X:', X.max(axis=1))
print()
print('Minimum value of all elements in X:', X.min())
print('Minimum value of all elements in the columns of X:', X.min(axis=0))
print('Minimum value of all elements in the rows of X:', X.min(axis=1))

X = 
 [[1 2]
 [3 4]]
Average of all elements in X: 2.5
Average of all elements in the columns of X: [2. 3.]
Average of all elements in the rows of X: [1.5 3.5]

Sum of all elements in X: 10
Sum of all elements in the columns of X: [4 6]
Sum of all elements in the rows of X: [3 7]

Standard Deviation of all elements in X: 1.118033988749895
Standard Deviation of all elements in the columns of X: [1. 1.]
Standard Deviation of all elements in the rows of X: [0.5 0.5]

Median of all elements in X: 2.5
Median of all elements in the columns of X: [2. 3.]
Median of all elements in the rows of X: [1.5 3.5]

Maximum value of all elements in X: 4
Maximum value of all elements in the columns of X: [3 4]
Maximum value of all elements in the rows of X: [2 4]

Minimum value of all elements in X: 1
Minimum value of all elements in the columns of X: [1 2]
Minimum value of all elements in the rows of X: [1 3]


# Getting Metadata for ndarrays

**Get dimension lengths of array using shape**

In [45]:
# is rank 2 with 4 rows & 3 columns
arr_int_2d.shape

(4, 3)

**Get dimension rank (number of dimensions)**

In [44]:
arr_int_2d.ndim

2

**Get total num of elements in n-dimensional array**

In [41]:
arr_int_2d.size

12

# Load/Save

**Save array in file for future use**

In [42]:
# saves file name file-save-test.npy in current directory
np.save('file-save-test', arr_str_1d)

**Load saved npy file**

In [43]:
x = np.load('file-save-test.npy')
print(x)

['one' 'two' 'three' 'four' 'five']
