# Topic 4: Numerical Computing in Python
## E-DAT540-1 21H Introduction to Data Science
---
Instructor: \
  Antorweep Chakravorty   
  Email: antorweep.chakravorty@uis.no \
  Address: \
    Room Number: KE E-425 \
    Det teknisk- naturvitenskapelige fakultet \
    Institutt for data- og elektroteknologi \
    Universitetet i Stavanger \
    Kjell Arholms gate 41, 4021 Stavanger 
       
---

## Numpy
- **NumPy** or *Numerical Python*, is one of the most important foundational package for numerical computing in python
- *ndarray*, an efficient multidimensional array provides fast array-oriented arithmetic operations and flexible *broadcasting* capabilities
- Mathematical functions for fast operations on entire array of data without having to write loops
- Tools for reading/writing array data to disk and working with memory-mapped files
- Linear algebra, random number generation and Fourier transform capabilities
- A C-API for connecting NumPy with libraries written in C, C++, or FORTRAN
- NumPy C-API allows data to be easily passed to external libraries written in a low-level language and for external libraries to return data to python as NumPy arrays
- NumPy based algorithms are generally 10 to 100 times faster than their pure python counterparts and use significantly less memory

In [25]:
# In order to work with Numpy we have to import the numpy module. We can also give an alias for it to refer to this module using the alias
import numpy as np

## NDArrays
- Fast vectorized array operations for data mugging and cleaning, sub-setting, and filtering, transformation, and any kinds of computations
- Common array algorithms like sorting, unique, and set operations
- Efficient descriptive statistics and aggregating/summarizing data
- Data alignment and relational data manipulation for merging and joining heterogeneous datasets
- Expressing conditional logic as array expressions instead of loops 
- Group-wise data manipulations (aggregation, transformation, function application)

### Array Creating Functions
#### array()
- ndarray can be created using the **np.array()** function
- It accepts any sequence-like object (including other arrays) and produces a new numpy array containing the passed data
- Nested sequences, like a list of equal-length list, will also be converted into a multidimensional array

In [34]:
# Creating a single dimensional array for a python list
aSeq = [1, 2, 3]
anArray = np.array(aSeq)
print('Array created using a list:\n', anArray)

# Creating a multi dimensional array for a nested python list
aNestedSeq = [[1, 2, 3], [4, 5, 6]]
anArray = np.array(aNestedSeq)
print('Array created using a nested list:\n', anArray)


Array created using a list:
 [1 2 3]
Array created using a nested list:
 [list([1, 2, 3]) list([4, 6])]


  anArray = np.array(aNestedSeq)


#### arange()
- **np.arange()** is an array-valued version of built-in python *range* function
- Generate a sequence of value
- A *start* and *end* value may be provided as a first and second argument respectively to set the start and end of the range. Note that the end value is never included in the range. By default the start value is 0, if not provided. Incases where only one argument is provided, it is automatically considered as the end value.
- An third argument *step* set the interval of the range. By default the step argument is equal to 1, if not specified. The step argument can also be negative to create a descending range. In such as case, the end argument value should be smaller than the start argument value



In [36]:
# Creating a sequence of 10 value starting from value 0
print('Range 0 to 9:', np.arange(10))

# Creating a sequence of 10 values starting from value 1
print('Range 1 to 10:', np.arange(start=1, stop=11))

# Creating a sequence of 5 values starting from value 0 with an interval of two between consecutive values
print('Range 0 to 8:', np.arange(0, 10, step=2))

# Creating a decending sequence starting from 10 to 1
# Creating a sequence of 10 values starting from value 1
print('Range 10 to 1:', np.arange(start=10, stop=0, step=-1))


Range 0 to 9: [0 1 2 3 4 5 6 7 8 9]
Range 1 to 10: [ 1  2  3  4  5  6  7  8  9 10]
Range 0 to 8: [0 2 4 6 8]
Range 10 to 1: [10  9  8  7  6  5  4  3  2  1]


#### zeros(), zeros_like()
- **np.zeros()** and **np.zeros_like()** creates ndarrays containing zeros as values for each index
- zeros()
  - a scaler value can be passed as an argument to create a single dimension array
  - a tuple can be passed as an argument to create a multi dimensional array
- zeros_like
  - accepts another array as the argument and creates an array of the same size like the array passed as its argument containing 0 as values for each index

In [4]:
# Creating a single dimensional array of size 5 containing the value zero for each index
arr1 = np.zeros(5)
print('arr1:', arr1)

# Creating a multi dimensional array of size (2, 5) containing the value zero for each index
arr2 = np.zeros((2,5))
print('arr2:\n', arr2)


arr1: [0. 0. 0. 0. 0.]
arr2:
 [[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [5]:
# Creating a non zero array of size (3,2)
arr3 = np.array([[1,2],[3,4],[5,6]])

# Creating an array of the same size as arr3 with 0 as value for each index
arr4 = np.zeros_like(arr3)
print('arr4:\n', arr4)


arr4:
 [[0 0]
 [0 0]
 [0 0]]


#### ones(), ones_like()
- **np.ones()** and **np.ones_like()** creates ndarrays containing ones as values for each index
- ones()
  - a scaler value can be passed as an argument to create a single dimension array
  - a tuple can be passed as an argument to create a multi dimensional array
- ones_like
  - accepts another array as the argument and creates an array of the same size like the array passed as its argument containing 1 as values for each index

In [6]:
# Creating a single dimensional array of size 5 containing the value one for each index
arr1 = np.ones(5)
print('arr1:', arr1)

# Creating a multi dimensional array of size (2, 5) containing the value one for each index
arr2 = np.ones((2, 5))
print('arr2:\n', arr2)


arr1: [1. 1. 1. 1. 1.]
arr2:
 [[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


In [7]:
# Creating a non one array of size (3,2)
arr3 = np.array([[1, 2], [3, 4], [5, 6]])

# Creating an array of the same size as arr3 with 1 as value for each index
arr4 = np.ones_like(arr3)
print('arr4:\n', arr4)


arr4:
 [[1 1]
 [1 1]
 [1 1]]


#### empty(), empty_like()
- **np.empty()** and **np.empty_like()** creates ndarray that contains zeros, ones or garbage values
- empty()
  - a scaler value can be passed as an argument to create a single dimension array
  - a tuple can be passed as an argument to create a multi dimensional array
- empty_like
  - accepts another array as the argument and creates an array of the same size like the array passed as its argument containing garbage value for each index

In [8]:
# Creating a single dimensional array of size 5 containing the garbage value for each index
arr1 = np.empty(5)
print('arr1:', arr1)

# Creating a multi dimensional array of size (2, 5) containing the garbage value for each index
arr2 = np.empty((2, 5))
print('arr2:\n', arr2)


arr1: [1. 1. 1. 1. 1.]
arr2:
 [[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


In [9]:
# Creating an array of size (3,2)
arr3 = np.array([[1, 2], [3, 4], [5, 6]])

# Creating an array of the same size as arr3 with some value for each index
arr4 = np.empty_like(arr3)
print('arr4:\n', arr4)


arr4:
 [[1 2]
 [3 4]
 [5 6]]


#### identity()
- **np.identity()** creates a two dimensional array with ones on the diagonal and zeros elsewhere.
- accepts a scaler value and creates a square two dimensional array

In [10]:
print('Creating an identity square array of size (3,3):\n', np.identity(3))


Creating an identity square array of size (3,3):
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


#### shape
- The *shape* attribute of a ndarray describes the size dimensions

In [11]:
arr1 = np.array([1,2,3])
arr2 = np.array([[1,2,3],[4,5,6]])

print("Shape of arr1:", arr1.shape)
print("Shape of arr2:", arr2.shape)

Shape of arr1: (3,)
Shape of arr2: (2, 3)


#### reshape()
- **reshape()** is an instance method of a ndarray that can reshape an array to another shape
- A sequence of value separated by a comma, can be passed as the arguments that specifies the shape of the intended array
- Take care that the product of the values passed as arguments should be equal to the number of elements in the array
- One of the passed shape dimensions can be *-1*, in which case the value would be inferred

In [40]:
# A single dimensional array of size 10
arr1 = np.arange(10)
print('Shape of arr1:{0} and the content of arr1:{1}'.format(arr1.shape, arr1))

# Reshaping arr1 to a multi dimensional array of size (2,5)
arr2 = arr1.reshape(2,5)
print('Print shape of arr2: {0} and the content of arr2:\n{1}'.format(arr2.shape, arr2))

# A multi dimensional array of size 4,4
arr3 = np.arange(16).reshape(4,4)
print('Shape of arr3:{0} and the content of arr3:\n{1}'.format(arr3.shape, arr3))

# Reshaping arr3 to a multi dimensional array of size (8,2)
arr4 = arr3.reshape(8, 2)
print('Print shape of arr4: {0} and the content of arr4:\n{1}'.format(
    arr4.shape, arr4))

# Reshaping arr3 to a multi dimensional array of size (8,2)
arr5 = arr3.reshape(-1, 2)
print('Print shape of arr5: {0} and the content of arr5:\n{1}'.format(
    arr5.shape, arr5))


Shape of arr1:(10,) and the content of arr1:[0 1 2 3 4 5 6 7 8 9]
Print shape of arr2: (2, 5) and the content of arr2:
[[0 1 2 3 4]
 [5 6 7 8 9]]
Shape of arr3:(4, 4) and the content of arr3:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
Print shape of arr4: (8, 2) and the content of arr4:
[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]]
Print shape of arr5: (8, 2) and the content of arr5:
[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]
 [12 13]
 [14 15]]


### Data Types
Basic numpy data types:
|Numpy type|bits|Range|Description|
|----------|-----|-----|-----------|
|int8|8|-128 to 127|a signed integer|
|uint8|8|0 to 255|an unsigned integer|
|int16|16|-32,768 to 32,767|a signed integer|
|uint16|16|0 to 65,535|an unsigned integer value|
|int32|32|-2,147,483,648 to 2,147,483,647|a signed integer|
|uint32|32|0 to 4,294,967,295|an unsigned integer|
|int64|64|-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807|a signed integer|
|uint64|64|0 to 18,446,744,073,709,551,615|an unsigned integer|
|float16|16|3.4E +/- 38 (7 digits)| a signed float value|
|float64|64|1.7E +/- 308 (15 digits)|a signed float value|
|string_|-|-|Fixed length ASCII string with 1 byte per character|
|unicode_|-|-|Fixed length UNICODE. Number of bytes specific to platform|
|object|-|-|A python object, the value could be any python object|

[Read more](https://docs.microsoft.com/en-us/cpp/cpp/data-type-ranges?view=msvc-160)

### Casting
#### dtype
- The *dtype*, object describes the data type stored in a homogeneous ndarray
- Each ndarray has a *dtype* attribute that describe the data stored in the array
- Even the **np.array()** method can be provided with a second argument to explicitly specify the type of its content

#### astype
- A NumPy array can be converted or casted from one dtype to another using the ndarray's **astype** method
- astype accepts a *dtype* argument that specifies the data type for the converted array
- Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype

In [55]:
# Creating a ndarray
arr = np.array([1,2,3])

# Casting the ndarray to float. Since, as type creates a copy, its stored in arr1
arr1 = arr.astype(dtype=np.float16)
print('dtype of arr: {0} and of arr1: {1}'.format(arr.dtype, arr1.dtype))


dtype of arr: int64 and of arr1: float16


## Pseudorandomization
- The numpy.random module supplements the build-in python random with functions for efficiency generating whole arrays of sample values from many kinds of probability distributions
- They are termed as pseudorandom numbers because they are generated by an algorithm with deterministic behavior based on the *seed* of the random number generator

### Random generator
- The data generation functions in numpy.random
#### seed()
- The seed method sets the global state to initialize a random number generator. 
- Everytime a fresh random function is executed, it will produce the same result if it has been initialized with the same random seed value
- It can be any numeric value

#### Random State
- To avoid global state, *numpy.random.RandomState* can be used to create a random number generator isolated from other

### Simple random data

#### rand()
- Creates a random array of the given shape 
- Populates it with random samples from a uniform distribution 
- Filled values are between 0 and 1

In [17]:

# Create a single dimensional random array
arr = np.random.rand(2)
print('Single dimensional {0} random array:\n{1}'.format(arr.shape, arr))

# Create a multi dimensional random array of size (4,2)
arr1 = np.random.rand(4,2)
print('Multiple dimensional {0} random array:\n{1}'.format(arr1.shape, arr1))


Single dimensional (2,) random array:
[0.62210877 0.43772774]
Multiple dimensional (4, 2) random array:
[[0.78535858 0.77997581]
 [0.27259261 0.27646426]
 [0.80187218 0.95813935]
 [0.87593263 0.35781727]]


#### randn()
- Creates a random array of the given shape 
- Populates it with random samples from a standard normal distribution 
- Filled with values such that the mean and variance of the sample is 0 and 1 respectively

In [147]:

# Create a single dimensional random array
arr = np.random.randn(2)
print('Single dimensional {0} random array:\n{1}'.format(arr.shape, arr))

# Create a multi dimensional random array of size (4,2)
arr1 = np.random.randn(4, 2)
print('Multiple dimensional {0} random array:\n{1}'.format(arr1.shape, arr1))


Single dimensional (2,) random array:
[-0.17446818 -0.64247527]
Multiple dimensional (4, 2) random array:
[[-0.62502311  1.3258867 ]
 [ 0.5312549   1.27528435]
 [-0.68282564 -0.94818614]
 [ 0.7773618   0.3251135 ]]


#### randint()
- Creates a random array of the given shape 
- Populates it with random values between the specified **low** (inclusive) and **high** (exclusive) parameter values.
- A third size parameter is also be provided to create the random array as specified

In [148]:
# Generating a random integer value between -10 and 10.
val = np.random.randint(low=-10, high=10)
print('A randomly generated scaler integer value: {0}'.format(val))

# Generating a random single dimensional array with values between -10 and 10.
arr1 = np.random.randint(low=-10, high=10, size=2)
print('Single dimensional {0} random array:\n{1}'.format(arr1.shape, arr1))

# Generating a random multi dimensional array of size (4,2) with values between -10 and 10
arr2 = np.random.randint(-10, 10, (4,2))
print('Multiple dimensional {0} random array:\n{1}'.format(arr2.shape, arr2))


A randomly generated scaler integer value: -3
Single dimensional (2,) random array:
[  7 -10]
Multiple dimensional (4, 2) random array:
[[-1  8]
 [-1 -9]
 [ 4 -7]
 [ 2 -1]]


#### random_sample()
- Creates a random array of the given shape 
- Populates it with random samples from a continuous uniform distribution 
- Filled values are between 0 and 1

In [20]:
# A scaler value can be generated by directly calling the function
val = np.random.sample()
print('A randomly generated scaler value: {0}'.format(val))

# Generating a random single dimensional array
arr1 = np.random.sample(2)
print('Single dimensional {0} random array:\n{1}'.format(arr1.shape, arr1))

# Generating a random multi dimensional array of size (4,2) 
arr2 = np.random.sample((4,2))
print('Multiple dimensional {0} random array:\n{1}'.format(arr2.shape, arr2))


A randomly generated scaler value: 0.5680986526260692
Single dimensional (2,) random array:
[0.86912739 0.43617342]
Multiple dimensional (4, 2) random array:
[[0.80214764 0.14376682]
 [0.70426097 0.70458131]
 [0.21879211 0.92486763]
 [0.44214076 0.90931596]]


#### choice()
- Creates an array containing random samples from another single dimensional array
- A scaler value can also be passed instead of the array, in such as case it will automatically create a range using the scaler value as the input single dimensional array
- A size parameter can also be provided as the second argument to specify the size of the generated array. If no size parameter is provided a scaler choice is returned

In [176]:
# A scaler value can be generated by not specifying the size
val = np.random.choice(10)
print('A randomly generated scaler value: {0}'.format(val))

# Generating a random single dimensional array using a scaler value rather than an array as input
arr1 = np.random.choice(10, 5)
print('Single dimensional {0} random array using a scaler value as input:\n{1}'.format(arr1.shape, arr1))

# Generating a random multi dimensional array using a scaler value rather than an array as input
arr2 = np.random.choice(10, (4,2))
print('Multiple dimensional {0} random array using a scaler value as input:\n{1}'.format(
    arr2.shape, arr2))

# Using arr1 as input to choose 2 random values from it
arr3 = np.random.choice(arr1, size=(2,2))
print('Multiple dimensional {0} random array using a single dimensional array as input:\n{1}'.format(
    arr3.shape, arr3))



A randomly generated scaler value: 9
Single dimensional (5,) random array using a scaler value as input:
[7 4 5 6 3]
Multiple dimensional (4, 2) random array using a scaler value as input:
[[8 1]
 [7 4]
 [1 9]
 [2 4]]
Multiple dimensional (2, 2) random array using a single dimensional array as input:
[[6 6]
 [4 3]]


#### bytes()
- Creates an string containing random bytes
- A length parameter is provide to specify the number of bytes

In [177]:
# Creating a string of random bytes
rStr = np.random.bytes(10)

print('A string containing random bytes of size 10:', rStr)

A string containing random bytes of size 10: b'k\xfe\x93)k\xa6\xfa\r\x1cL'


### Permutations
#### shuffle()
- Accepts an array as input
- Modify an array in-place by shuffling its contents
- In case of multi dimension arrays, shuffle is only made on the first axis

In [205]:
# Creating a single dimensional array
arr1 = np.arange(10)
print('Un-shuffled arr1:{0}'.format(arr1))
# Shuffling the array
np.random.shuffle(arr1)
print('Shuffled arr1:{0}'.format(arr1))

# Creating a multi dimensional array
arr2 = np.arange(16).reshape(4, 2, 2)
print('Un-shuffled arr2:\n{0}'.format(arr2))
# Shuffling the array
np.random.shuffle(arr2)
print('Shuffled arr2:\n{0}'.format(arr2))


Un-shuffled arr1:[0 1 2 3 4 5 6 7 8 9]
Shuffled arr1:[9 0 7 3 5 8 1 6 2 4]
Un-shuffled arr2:
[[[ 0  1]
  [ 2  3]]

 [[ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]]

 [[12 13]
  [14 15]]]
Shuffled arr2:
[[[ 8  9]
  [10 11]]

 [[12 13]
  [14 15]]

 [[ 0  1]
  [ 2  3]]

 [[ 4  5]
  [ 6  7]]]


#### permutation()
- Accepts a scaler or an array as input. 
- In case a scaler is provided as an input parameter, automatically a range is created as the input
- Randomly permute a sequence, or return a permuted range.
- In case of multi dimension arrays, permutation is only made on the first axis

In [24]:
# Creating a permuted single dimensional array using a scaler input
arr1 = np.random.permutation(10)
print('Permuted array of size {0}, created from range 0 - 10: {1}'.format(arr1.shape, arr1))
# Permuting a single dimensional array
arr2 = np.random.permutation(arr1)
print('Permuted arr1: {0}'.format(arr2))
# Creating a multi dimensional array
arr3 = np.arange(10).reshape(5, 2)
print(
    'Array of size {0} containing values:\n{1}'.format(arr3.shape, arr3))
# Permuting a multi dimensional array
arr4 = np.random.permutation(arr3)
print('Permuted arr3: {0}'.format(arr4))


Permuted array of size (10,), created from range 0 - 10: [8 3 5 0 6 4 1 2 9 7]
Permuted arr1: [2 8 4 0 6 1 9 5 3 7]
Array of size (5, 2) containing values:
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
Permuted arr3: [[4 5]
 [6 7]
 [2 3]
 [0 1]
 [8 9]]


## Basic Indexing and Slicing

### Indexing
#### Single Dimensional Arrays
- Indexing and slicing on single dimensional arrays are similar to python lists on the surface
- In numpy, we can also use negative indexing to access arrays from the end

In [25]:
# Creating a single dimensional random integer array of size 10
arr = np.random.randint(0, 100, 10)
print('Content of arr: {0}'.format(arr))
# Accessing the element using positive indexing
print('Element at index 0: {0}'.format(arr[0]))
print('Element at index 1: {0}'.format(arr[1]))
print('Element at index 2: {0}'.format(arr[2]))

# Accessing the elements using negative indexing
print('Element at the last index: {0}'.format(arr[-1]))
print('Element at second last index: {0}'.format(arr[-2]))


Content of arr: [41  1 14  3 30 12 73 19 26 96]
Element at index 0: 41
Element at index 1: 1
Element at index 2: 14
Element at the last index: 96
Element at second last index: 26


#### Multi-Dimensional Arrays
- Accessing elements in multi dimensional arrays follows similar principles like single dimensional arrays
- However, for multi dimensional arrays, the indices for each axis needs to be specified by passing a comma separated list of indices

In [207]:
# Creating a multi dimensional random integer array of size 3, 4
arr = np.random.randint(0, 100, 12).reshape(3,4)
print('Content of arr:\n{0}'.format(arr))
print('Number of axis in arr: {0}'.format(len(arr.shape)))

# Accessing the element using positive indexing
print('Element at index 0 for axis  0: {0}'.format(arr[0]))
print('Element at index 1 for axis 0: {0}'.format(arr[1]))
print('Element at index 0 for axis 0 and index 0 for axis 1: {0}'.format(arr[0, 0]))
print(
    'Element at index 0 for axis 0 and index 1 for axis 1: {0}'.format(arr[0, 1]))
print(
    'Element at index 1 for axis 0 and index 3 for axis 1: {0}'.format(arr[-1, -3]))


Content of arr:
[[26  9 30 76]
 [ 5 51 51 26]
 [36  6 88 30]]
Number of axis in arr: 2
Element at index 0 for axis  0: [26  9 30 76]
Element at index 1 for axis 0: [ 5 51 51 26]
Element at index 0 for axis 0 and index 0 for axis 1: 26
Element at index 0 for axis 0 and index 1 for axis 1: 9
Element at index 1 for axis 0 and index 3 for axis 1: 6


### Slicing

- Single dimensional arrays can be sliced in similar fashion to that of normal python lists
- Multi-dimensional arrays can be sliced on each **axis** generating a view on a sub-array or element
- *axis* are the dimensions on a ndarray. For example a 3d array has 3 axes
- There are many forms of indexing to create slices on an array.
  - Each form of indexing applies to a axis on the array 
  - Multiple forms of indexing could also be combined by applying a different indexing from on different axes

#### Basic Indexing
- Basic indexing uses numeric indices to create a sliced view on an array
- The created slice always contains continuous sequences of indices
- Basic indexing creates a view of the data, any assignment to the slice would update the values in the original value  

In [210]:
# Creating a single dimensional array with 1 axis
arr1 = np.arange(10)
print('arr1:{0}'.format(arr1))
# Slicing the array to retreive elements from index 2-7 on axis=0
print('Slice on arr1:{0}'.format(arr1[2:7]))

# Creating a multi dimensional array with 2 axis
arr2 = np.arange(12).reshape(3,4)
print('arr2:\n{0}'.format(arr2))
# Slicing the array to retreive elements from index 0 to 3 on axis=0 and 1 to 4 on axis=1
print('Slice on arr2:\n{0}'.format(arr2[0:3, 1:4]))

# Creating a multi dimensional array with 3 axis
arr3 = np.arange(24).reshape(2,3,4)
print('arr3:\n{0}'.format(arr3))
# Slicing the array to retreive elements from index 1 on axis=0, 1 to 3 on axis=1 and index 2 to 4 on axis=3
print('Slice on arr3:\n{0}'.format(arr3[1, 1:3, 2:4]))


arr1:[0 1 2 3 4 5 6 7 8 9]
Slice on arr1:[2 3 4 5 6]
arr2:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Slice on arr2:
[[ 1  2  3]
 [ 5  6  7]
 [ 9 10 11]]
arr3:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
Slice on arr3:
[[18 19]
 [22 23]]


In [60]:
# Creating a multi dimensional array with 2 axis
arr2 = np.arange(12).reshape(3, 4)
print('arr2:\n{0}'.format(arr2))
# Slicing the array to retreive elements from index 0 to 3 on axis=0 and 1 to 4 on axis=1
arr2Slice = arr2[0:3, 1:4] # Storing the slice into a variable
print('arr2Slice:\n{0}'.format(arr2Slice))
# Assigning a scaler value to the complete slice in arr2Slice
arr2Slice[:] = -1
print('arr2Slice after assignment of a scaler value:\n{0}'.format(arr2Slice))
# Checking arr2 after assignment into arr2Slice
print('arr2:\n{0}'.format(arr2))
# Assigning an array of the same shape to arr2Slice to arr2Slice
arr2Slice[:] = np.random.randint(0, 100, arr2Slice.shape)
print('arr2Slice after assignment of an random array:\n{0}'.format(arr2Slice))
# Checking arr2 after assignment into arr2Slice
print('arr2:\n{0}'.format(arr2))
# Assignment can be made directly into the array slice, without storing the reference into a variable
arr2[0:3, 1:4] = 0
print('arr2 after directly assigning a scaler value to its slice:\n{0}'.format(arr2))


arr2:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
arr2Slice:
[[ 1  2  3]
 [ 5  6  7]
 [ 9 10 11]]
arr2Slice after assignment of a scaler value:
[[-1 -1 -1]
 [-1 -1 -1]
 [-1 -1 -1]]
arr2:
[[ 0 -1 -1 -1]
 [ 4 -1 -1 -1]
 [ 8 -1 -1 -1]]
arr2Slice after assignment of an random array:
[[69 81 80]
 [ 8 75 15]
 [20 16 64]]
arr2:
[[ 0 69 81 80]
 [ 4  8 75 15]
 [ 8 20 16 64]]
arr2 after directly assigning a scaler value to its slice:
[[0 0 0 0]
 [4 0 0 0]
 [8 0 0 0]]


##### Negative Indexing
- Negative indexing is an extension of basic indexing where we use both positive and negative numeric indices
- Negative indexing also creates a view on the sliced array
- Like basic indexing, updates to a created slice directly modifies the original array

In [72]:
# Creating a multi dimensional array with 2 axis
arr2 = np.arange(12).reshape(3, 4)
print('arr2:\n{0}'.format(arr2))
# Slicing the array to retreive elements from index -3 to -1 on axis=0 and -3 to -1 on axis=1
arr2Slice = arr2[-3:-1, -3:-1]  # Storing the slice into a variable
print('arr2Slice creating using negative indexing:\n{0}'.format(arr2Slice))
# Slicing the array to retreive elements from index -3 to -1 on axis=0 and -3 to -1 on axis=1
arr2Slice1 = arr2[-3:-1, 1:3]  # Storing the slice into a variable
print('arr2Slice1 creating using negative and positive indexing:\n{0}'.format(arr2Slice1))
print('Slice on arr2 created using negative and positive indexing:\n{0}'.format(
    arr2[0:-1, -3:3]))


arr2:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
arr2Slice creating using negative indexing:
[[1 2]
 [5 6]]
arr2Slice1 creating using negative and positive indexing:
[[1 2]
 [5 6]]
Slice on arr2 created using negative and positive indexing:
[[1 2]
 [5 6]]


##### Wildcard Indexing
- Wildcard indexing is another form of basic indexing where we do not use any numeric indices
- Depending on whether this form is indexing is used to denote the start or end index, it automatically chooses all elements from either the start or the end of the array
- This form of indexing can be combined with positive or negative indices as well
- Wildcard indexing also creates a view on the sliced array
- Like basic indexing, updates to a created slice directly modifies the original array

In [75]:
# Creating a multi dimensional array with 2 axis
arr2 = np.arange(12).reshape(3, 4)
print('arr2:\n{0}'.format(arr2))
# Slicing the array to retreive elements from all indices on axis=0 and all indices on axis=1
print('arr2Slice creating using wildcard indexing:\n{0}'.format(arr2[:, :]))

# Slicing the array to retreive elements from index -3 to all on axis=0 and all to 4 on axis=1
print('arr2Slice1 creating using wildcard and numeric indexing:\n{0}'.format(
    arr2[-3:, :4]))

arr2:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
arr2Slice creating using wildcard indexing:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
arr2Slice1 creating using wildcard and numeric indexing:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


#### Boolean Indexing
- Boolean indexing uses a array or sequence of boolean value equal to the shape of the array
  - The sequence of boolean values can be defined as a list of boolean values
  - Alternatively, a condition can also be used to generate the sequence of boolean values
- Both single and multi dimensional arrays can be sliced using boolean indexing
- The created slice contains elements from the array for which a *True* is present in the boolean sequence
- A slice created using boolean indexing is a new array which contains a copy of the elements from the original array

In [84]:
# Creating a single dimensional array with 5 values
arr = np.random.randint(0, 100, 5)
print('arr: {0}'.format(arr))
# Let us choose values at index 0, 2, 4 from arr using boolean indexing
# Creating a list of boolean values that contains 5 elements with elements at index 0, 2 and 4 being True
boolIndices = [True, False, True, False, True]
print('Boolean Indices: {0}'.format(boolIndices))
# Slicing arr using boolean indexing
print('Sliced arr: {0}'.format(arr[boolIndices]))
# Creating a list of boolean values using a condition on arr
boolIndices = arr <= 50
print('Boolean Indices created using a condition on arr: {0}'.format(boolIndices))
# Slicing arr using boolean indexing
print('Sliced arr using boolIndices created using a condition: {0}'.format(arr[boolIndices]))
# Slicing arr using the condition directly 
print('Sliced arr using a condition directly for axis=0: {0}'.format(arr[arr > 50]))


arr: [17 39 59 82 37]
Boolean Indices: [True, False, True, False, True]
Sliced arr: [17 59 37]
Boolean Indices created using a condition on arr: [ True  True False False  True]
Sliced arr using boolIndices created using a condition: [17 39 37]
Sliced arr using a condition directly for axis=0: [59 82]


In [211]:
# Creating a multidimensional array
arr = np.random.randint(0,100,12).reshape(3,4)
print('multi dimensional arr:\n{0}'.format(arr))

# We will slice the array using the condition arr > 25 which will generate a boolean array of the same shape as arr
print(
    'Array generated to slice arr using condition arr > 25:\n{0}'.format(arr > 25))

print(arr > 25)

print('Sliced arr using condition arr > 25: {0}'.format(arr[arr > 25]))


multi dimensional arr:
[[82 44 48 26]
 [ 6 66 43 88]
 [30  9 36 83]]
Array generated to slice arr using condition arr > 25:
[[ True  True  True  True]
 [False  True  True  True]
 [ True False  True  True]]
[[ True  True  True  True]
 [False  True  True  True]
 [ True False  True  True]]
Sliced arr using condition arr > 25: [82 44 48 26 66 43 88 30 36 83]


In [134]:
# Creating a single dimensional array with 5 values
arr = np.random.randint(0, 100, 5)
print('arr: {0}'.format(arr))
# Assigning to a slice to a variable, creates a copy
arrSlice = arr[[True, True, False, False, False]]
print('arrSlice: {0}'.format(arrSlice))
# Assigning to the slice
arrSlice[:] = -1
print('Values in arrSlice after assigning -1 to it: {0}'.format(arrSlice))
print('Values in arr after assigning -1 to arrSlice created using boolean indexing: {0}'.format(arr))

arr: [10 87 51 11 18]
arrSlice: [10 87]
Values in arrSlice after assigning -1 to it: [-1 -1]
Values in arr after assigning -1 to arrSlice created using boolean indexing: [10 87 51 11 18]



#### Fancy Indexing
- Fancy indexing is used to describe indexing using integers arrays or sequences on the appropriate axis
- Can be used to re-order the values
- Indices can be repeated to create duplicates
- Both single and multi dimensional arrays can be sliced using fancy indexing
- A slice created using fancy indexing is a new array which contains a copy of the elements from the original array

In [138]:
# Creating a single dimensional array with 5 values
arr = np.random.randint(0, 100, 5)
print('arr: {0}'.format(arr))
# Let us choose values at index 0, 2, 4 as 0, 4, 2, 2 from arr using fancy indexing
# Slicing arr using fancy indexing
print('Sliced arr: {0}'.format(arr[[0, 4, 2, 2]]))

# Creating a multidimensional array
arr = np.random.randint(0, 100, 12).reshape(3, 4)
print('multi dimensional arr:\n{0}'.format(arr))
# Slicing the array using fancy indexing on axis=1 and selecting all indices from axis=0 using wildcard indexing
print('Sliced arr:\n{0}'.format(arr[:, [1, 1]]))


arr: [97 58 35 26 81]
Sliced arr: [97 81 35 35]
multi dimensional arr:
[[87 14 51 84]
 [17 79 39 48]
 [73 79 94 24]]
Sliced arr:
[[14 14]
 [79 79]
 [79 79]]


#### Combining different forms of indexing techniques
- Multiple forms on indexing techniques can be combined to create slices from multi dimensional arrays
  - a different form of indexing technique can be specified for each axis
- Using Boolean or Fancy indexing along with Basic indexing always creates a copy- 
- Using just different forms of Basic indexing does not create a copy
- Boolean and Fancy indexing cannot be combined together

In [156]:
# Creating a multidimensional array
arr = np.random.randint(0, 100, 12).reshape(3, 4)
print('Combining basic and fancy indexing for axis=0 and axis=1 respectively:\n{0}'.format(arr[0:3, [1,2]]))
print('Combining boolean and basic indexing for axis=0 and axis=1 respectively:\n{0}'.format(
    arr[[True, False, False], :]))


Combining basic and fancy indexing for axis=0 and axis=1 respectively:
[[ 7 22]
 [ 3 27]
 [32 55]]
Combining boolean and basic indexing for axis=0 and axis=1 respectively:
[[ 1  7 22 58]]


### copy
- A copy of a array slice can be created using the **copy** method of any dnarray 
- copying of a slice is relevant when working with basic indexing 
  - When slicing and assigning with basic indexing, we may need avoid modifying the original array
  - In these cases a copy of the slice can be created

In [161]:
# Creating a single dimensional array with 5 values
arr = np.random.randint(0, 100, 10)
print('arr: {0}'.format(arr))
# Creating a slice of arr
arrSlice = arr[2:7].copy()
print('arrSlice: {0}'.format(arrSlice))
# Assigning to arrSlice
arrSlice[:] = -1
print('arrSlice after assignment: {0}'.format(arrSlice))
print('arr after assignment to slice arrSlice: {0}'.format(arr))

arr: [77 82 26 27 11 93  8 36 96  0]
arrSlice: [26 27 11 93  8]
arrSlice after assignment: [-1 -1 -1 -1 -1]
arr after assignment to slice arrSlice: [77 82 26 27 11 93  8 36 96  0]


## Transposing Arrays and Swapping Axes
### Transposing
- Transposing is a form of reshaping that returns a view on the underlying data without copying.
- Arrays have the *transpose* method and also the special attribute *T*   
- Transposing is useful while performing matrix computations.
- For example, the inner matrix product can be computed using np.dot
- For higher dimensional arrays, *transpose* might also accept a tuple of axis numbers to permute the axes
- Transposing an array creates a copy

In [218]:
# Creating a 2 dimensional array
arr = np.random.randint(0,100, 8).reshape(2,4)
print('arr has the shape {0}:\n{1}'.format(arr.shape, arr))
# arr can be transposed using the instance attribute T. axis 0 becomes axis 1 and axis 1 becomes axis 0
arrTransposed = arr.T
print('arr transposed has the shape {0}:\n{1}'.format(arrTransposed.shape, arrTransposed))


arr has the shape (2, 4):
[[58  5 61 88]
 [29 61 89 46]]
arr transposed has the shape (4, 2):
[[58 29]
 [ 5 61]
 [61 89]
 [88 46]]


In [169]:
# Creating a 3 dimensional array
arr = np.random.randint(0, 100, 6).reshape(1, 2, 3)
print('arr has the shape {0}:\n{1}'.format(arr.shape, arr))
# arr can be transposed using the instance attribute T. axis 0 becomes axis 2 and axis 1 remains same and axis 2 becomes axis 0
arrTransposed = arr.T
print('arr transposed has the shape {0}:\n{1}'.format(
    arrTransposed.shape, arrTransposed.T))


arr has the shape (1, 2, 3):
[[[29 85 75]
  [26 78 16]]]
arr transposed has the shape (3, 2, 1):
[[[29 85 75]
  [26 78 16]]]


In [221]:
# Creating a 3 dimensional array
arr = np.random.randint(0, 100, 6).reshape(1, 2, 3)
print('arr has the shape {0}:\n{1}'.format(arr.shape, arr))

# The transpose method can better specify the position of the axis in the transposed array explicitly
arrTransposed = arr.transpose(2,0,1)
# arr can be transposed using the instance attribute T. axis 0 becomes axis 2 and axis 1 remains same and axis 2 becomes axis 0
print('arr transposed has the shape {0}:\n{1}'.format(
    arrTransposed.shape, arrTransposed))


arr has the shape (1, 2, 3):
[[[56 15 80]
  [16 38  7]]]
arr transposed has the shape (3, 1, 2):
[[[56 16]]

 [[15 38]]

 [[80  7]]]


### Swapping Axis
- **swapaxes** is a method that takes a pair of axis numbers and switches the indicated axes to rearrange the data

In [176]:
# Creating a 3 dimensional array
arr = np.random.randint(0, 100, 6).reshape(1, 2, 3)
print('arr has the shape {0}:\n{1}'.format(arr.shape, arr))

# The swapaxes method accepts two axes as inputs and swaps them
arrTransposed = arr.swapaxes(1, 2)
# arr can be transposed using the instance attribute T. axis 0 becomes axis 2 and axis 1 remains same and axis 2 becomes axis 0
print('arr transposed has the shape {0}:\n{1}'.format(
    arrTransposed.shape, arrTransposed))


arr has the shape (1, 2, 3):
[[[15 56 63]
  [74 98 81]]]
arr transposed has the shape (1, 3, 2):
[[[15 74]
  [56 98]
  [63 81]]]


## Broadcasting
- describes how arithmetic works between arrays of different shapes
> Two arrays are compatible for broadcasting if for each trailing dimension (inner) the axis length matches or either of the length is 1

## Arithmetic Operations
- ndarray offers *vectorization*, that is allow batch operation on data without writing any for loops
- Arithmetic operations can be performed on 
  - An array and scaler value
  - Arrays of same size and shape
  - Array of different size and shape only if their inner dimension(s) are the same
- Operations with scalars propagate the scalar argument to each element in the array
- Any arithmetic operations between equal size arrays applies the operation element wise
- Operations between different sized arrays is called broadcasting
- Comparison operators can also be applied to arrays and results in boolean 
- Arithmetic operators: +, -, /, *, %, **, // 
- Comparison operators:  ==, !=. <, <=, >, >=

In [209]:
# Creating a single dimensional array
arr1 = np.random.randint(0, 100, 6)
print('arr1:{0}'.format(arr1))
# Performing an arithmetic power operation with a scaler value
print('Each element in arr1 raised to the power of 2:{0}'.format(arr1 ** 2))

# Creating a multi dimensional array
arr2 = np.random.randint(0, 100, 6).reshape(3,2)
print('arr2 (shape: {0}):\n{1}'.format(arr2.shape, arr2))
# Performing an arithmetic addition operation with a scaler value
print(
    'Each element in arr2 has the value +1 added to it:\n{0}'.format(arr2 + 1))

# Creating another multi dimensional array with the same shape as arr2
arr3 = np.random.randint(0, 100, 6).reshape(3, 2)
print('arr3 (shape: {0}):\n{1}'.format(arr3.shape, arr3))
# Performing an arithmetic substraction operation between the two arrays
print(
    'Each element in arr3 is substracted from each element in arr2:\n{0}'.format(arr2 - arr3))

# Creating a single dimensional array with the same shape as the inner dimension (axis=1) of arr2
arr4 = np.random.randint(0, 100, 2)
print('arr4 (shape: {0}):\n{1}'.format(arr4.shape, arr4))
# Performing an arithmetic multiplication operation between the two arrays of different shape but same inner dimension
print(
    'Each element in arr4 is multiplied to each element in arr2:\n{0}'.format(arr2 * arr4))

# Creating a three dimensional array with its inner dimensions (axis=1, axis=2) same as of arr2
arr5 = np.random.randint(0, 100, 12).reshape(2,3,2)
print('arr5 (shape: {0}):\n{1}'.format(arr5.shape, arr5))
# Performing an arithmetic division operation between the two arrays of different shape but same inner dimension
print(
    'Each element in arr5 is divided by each element in arr2:\n{0}'.format(arr5 / arr2))


arr1:[43 18 29 70 72 98]
Each element in arr1 raised to the power of 2:[1849  324  841 4900 5184 9604]
arr2 (shape: (3, 2)):
[[ 6 48]
 [31 59]
 [73 61]]
Each element in arr2 has the value +1 added to it:
[[ 7 49]
 [32 60]
 [74 62]]
arr3 (shape: (3, 2)):
[[ 5 27]
 [16 33]
 [40 91]]
Each element in arr3 is substracted from each element in arr2:
[[  1  21]
 [ 15  26]
 [ 33 -30]]
arr4 (shape: (2,)):
[42 38]
Each element in arr4 is multiplied to each element in arr2:
[[ 252 1824]
 [1302 2242]
 [3066 2318]]
arr5 (shape: (2, 3, 2)):
[[[76  9]
  [26 29]
  [72 47]]

 [[37 73]
  [84 63]
  [40 68]]]
Each element in arr5 is divided by each element in arr2:
[[[12.66666667  0.1875    ]
  [ 0.83870968  0.49152542]
  [ 0.98630137  0.7704918 ]]

 [[ 6.16666667  1.52083333]
  [ 2.70967742  1.06779661]
  [ 0.54794521  1.1147541 ]]]


In [204]:
# Creating a single dimensional array
arr1 = np.random.randint(0, 100, 6)
print('arr1:{0}'.format(arr1))
# Performing an comparison operation with a scaler value
print('Each element in arr1 is checked to see if it is greater than the value 50: {0}'.format(arr1 > 50))

# Creating a multi dimensional array
arr2 = np.random.randint(0, 100, 6).reshape(3, 2)
print('arr2:\n{0}'.format(arr2))
# Performing an comparison operation with a scaler value and adding the resulted two arrays together
print(
    'Each element in arr2 is checked to see if it is greater than the value 25 and smaller than 75:\n{0}'.format( (arr2 > 25) + (arr2 < 75)))


arr1:[42 52 42 33 67 74]
Each element in arr1 is checked to see if it is greater than the value 50: [False  True False False  True  True]
arr2:
[[40 27]
 [52  4]
 [13 29]]
Each element in arr2 is checked to see if it is greater than the value 25 and smaller than 75:
[[ True  True]
 [ True  True]
 [ True  True]]


## Universal Function: Fast Element Wise Array Functions
- Performs element wise operations on data in ndarrays
- They act as vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results
- Most ufuncs are simple element wise transformations on an array, like sqrt or exp and are called as *unary* ufuncs

### ufunc:
- Performs element wise operation on a single array


|function|description|
|--------|-----------|
|np.abs()|Computes the absolute element wise value for integers|
|np.fabs()|Computes the absolute element wise value for floating point numbers|
|np.square()|Computes the square of each element in an array|
|np.sqrt()|Computes the square root of each element in an array|
|np.exp()|Computes the exponent $e^x$ of each element in an array|
|np.log()|Computes the natural logarithm base e of each element in an array|
|np.log10()|Computes the natural logarithm base 10 of each element in an array|
|np.log2()|Computes the natural logarithm base 2 of each element in an array|
|np.log1p()|Computes the natural logarithm base (1+x) of each element in an array|
|np.sign()|Computes the sign of each element in an array: 0 - zero, 1 - positive, -1 - negative|
|np.ceil()|Rounds each element in an array to the nearest integer greater than or equal to it for floating point and numeric values respectively|
|np.floor()|Rounds each element in an array to the nearest integer less than or equal to it for floating point and numeric values respectively|
|np.rint()|Rounds each element in an array to the nearest integer|
|np.modf()|Returns the fractional and integer part of the array as a separate array|
|np.isnan()|Returns a boolean array indicating if each value is NaN (Number not available)|
|np.isfinite()|Returns a boolean array indicating if each value is finite|
|np.isinf()|Returns a boolean array indicating if each value is infinite|
|np.logical_not()|Computes the *not* of each element in a boolean array|
|np.cos(), np.cosh(), np.sin(), np.sinh(), np.tan(), np.tanh()|Regular hyperbolic trigonometric functions|-/-|-/-|

### binary ufunc:
- Performs element wise operation between two arrays and returns the result
- Both arrays should be of same shape or their inner dimensions must match 

|function|description|
|--------|-----------|
|np.add()|Adds corresponding elements in the arrays|
|np.subtract()|Subtracts corresponding elements in the arrays|
|np.multiply()|Multiplies corresponding elements in the arrays|
|np.divide()|Divides corresponding elements in the first array by the second array|
|np.floor_divide()|Divides corresponding elements in the first array by the second array and returns the numerator|
|np.maximum()|Maximum of the corresponding elements in the arrays|
|np.fmax()|Maximum of the corresponding elements in the arrays. Ignores NaN|
|np.minimum()|Minimum of the corresponding elements in the arrays|
|np.fmin()|Minimum of the corresponding elements in the arrays. Ignores NaN|
|np.mod()|Divides corresponding elements in the first array by the second array and returns the remainder|
|np.copysign()|Copies the sign of corresponding elements in the second array to the first array|
|np.greater()|Performs boolean greater than check of the corresponding elements in the first array to the second array. Returns a boolean array|
|np.greater_equal()|Performs boolean greater than and equal to check of the corresponding elements in the first array to the second array. Returns a boolean array|
|np.less()|Performs boolean smaller than check of the corresponding elements in the first array to the second array. Returns a boolean array|
|np.less_equal()|Performs boolean smaller than and equal to check of the corresponding elements in the first array to the second array. Returns a boolean array|
|np.equal()|Performs boolean equal check of the corresponding elements in the arrays. Returns a boolean array|
|np.not_equal()|Performs boolean not equal check of the corresponding elements in the arrays. Returns a boolean array|
|np.logical_and()|Performs logical and operation on the corresponding elements in arrays|
|np.logical_or()|Performs logical or operation on the corresponding elements in arrays| 
|np.logical_xor()|Performs logical xor operation on the corresponding elements in arrays| 


### aggregation:
- Each NumPy ufuncs has special methods for performing certain kinds of special vectorized operations

|function|description|
|--------|-----------|
|np.*ufunc*.reduce()|Aggregates values by successive application of the operation|
|np.*ufunc*.accumulate()|Aggregates values by preserving all partial aggregates|
|np.*ufunc*.reduceat()|Local reduce or group by; reduces contiguous slices of data to produce aggregated array|
|np.*ufunc*.outer()| Applies operations to all pairs of elements in two arrays; the resulting array has the shape as the sum of the shape of both the arrays|


## Array-Oriented Programming with Arrays
- The practice of replacing explicit loops with array expressions is commonly referred to as vectorization
- NumPy arrays allows many kinds of data processing tasks as concise array expressions, rather than using loops
- Vectorized array operations are one or two (or more) orders of magnitude faster

In [274]:
# Let's say you have two 1-D arrays and you want to evaluate the expression sqrt(x^2 + y^2)
#  for each element pair in the arrays.
arr1 = np.arange(4)  # 1x4
arr2 = np.arange(4, 7, 1)  # 1x3
print('arr1:\n', arr1)
print('arr2:\n', arr2)
result = []
# Traditionally we can use two for loops
for i in arr2:
  for j in arr1:
    result.append(np.sqrt(i ** 2 + j ** 2))
result = np.array(result)
print('shape of result:', result.shape)  # why?
# We can transform this to a 3x4 array
print('3x4 result\n:', result.reshape(3, 4))


arr1:
 [0 1 2 3]
arr2:
 [4 5 6]
shape of result: (12,)
3x4 result
: [[4.         4.12310563 4.47213595 5.        ]
 [5.         5.09901951 5.38516481 5.83095189]
 [6.         6.08276253 6.32455532 6.70820393]]


### **np.meshgrid()** 
- a function that takes two single dimensional arrays and produce two 2 dimensional array corresponding to all pairs of values in the two input arrays

In [275]:
# in our case, we meshgrid x and y to prepare two new arrays that can be broadcased
arr1Grid, arr2Grid = np.meshgrid(arr1, arr2)  # creates two 3x4 arrays
print('xs:\n', arr1Grid)
print('ys:\n', arr2Grid)


xs:
 [[0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]]
ys:
 [[4 4 4 4]
 [5 5 5 5]
 [6 6 6 6]]


In [276]:
result = np.sqrt(arr1Grid ** 2 + arr2Grid ** 2)
print('result:\n{0}'.format(result))


result:
[[4.         4.12310563 4.47213595 5.        ]
 [5.         5.09901951 5.38516481 5.83095189]
 [6.         6.08276253 6.32455532 6.70820393]]


## Conditional Logic as Array Operations
### numpy.where
- function is a vectorized version the ternary expression
- The second and third arguments to np.where doesn't need to be array, one or both of them can be scalars

In [244]:
# If we have two single dimensional arrays and we want to choose the maximum element when comparing each element from the two arrays
arr1 = np.random.randint(-100, 100, 9)
arr2 = np.random.randint(-100, 100, 9)
result = np.where(arr1 > arr2, arr1, arr2)
print('arr1:', arr1)
print('arr2:', arr2)
print('result:', result)


arr1: [ -59   -2   23   93  -17   86 -100   73  -49]
arr2: [-70  46  28  25 -53  17   0  68  15]
result: [-59  46  28  93 -17  86   0  73  15]


In [245]:
# Suppose we have a multi dimensional array of randomly generated data and we wanted to replace all positive values with 1 and all negative values with 0.
arr1 = np.random.randn(4, 4)
result = np.where(arr1 > 0, 1, 0)
print('arr1:', arr1)
print('result:', result)


arr1: [[ 0.19481061 -0.63113749 -2.39750123 -0.91196804]
 [-0.1252391   0.38937848  0.33608027  0.57221401]
 [ 0.30478289  0.76375121 -1.05093122 -1.4198291 ]
 [ 0.0600492   0.06814378 -1.53232217  1.07207815]]
result: [[1 0 0 0]
 [0 1 1 1]
 [1 1 0 0]
 [1 1 0 1]]


## Mathematical and Statistical Methods
- Computes statistics about an entire array or the data along an axis
- Aggregations (reductions) such as *sum*, *mean*, and *std* are called using array instance methods or using the top-level NumPy functions
- Functions like *sum*  and *mean* takes an optional axis argument that computes the statistic over the given axis
- Methods for Boolean Arrays
    - Boolean values are coerced to 1 (True) and 0 (False)
    - Thus, *sum* can be used as a means of counting True values in a boolean array

|function|description|
|--------|-----------|
|np.sum()|Calculates the sum of the elements in the array. If an axis argument is provided, the sum is measured along the axis|
|np.mean()|Calculates the mean of the elements in the array. If an axis argument is provided, the mean is measured along the axis|
|np.std()|Calculates the standard deviation of the elements in the array. If an axis is provided the standard deviation is measured along the axis|
|np.var()|Calculates variance of the elements in the array. If an axis is provided the variance is measured along the axis|
|np.min()|Calculates the minimum value of the elements in the array. If an axis is provided the minimum is measured along the axis|
|np.max()|Calculates the maximum value of the elements in the array. If an axis is provided the maximum is measured along the axis|
|np.argmin()|Calculates the index for the minimum value of the elements in the array. If an axis is provided the index of the minimum value is searched along the axis|
|np.argmax()|Calculates the index for the maximum value of the elements in the array. If an axis is provided the index of the maximum value is searched along the axis|
|np.cumsum()|Calculates the cumulative sum of the elements in the array. If an axis argument is provided, the cumulative sum is measured along the axis|
|np.cumprod()|Calculates the cumulative product of the elements in the array. If an axis argument is provided, the cumulative product is measured along the axis|

## Linear Algebra
- *np.linalg* has a standard set of matrix decompositions methods

|function|description|
|--------|-----------|
|np.diag()|Returns the diagonal or off-diagonal elements of a square matrix as a 1D array, or converts a 1D array into a square matrix with zeros on the off-diagonal|
|np.dot()|Matix multiplication|
|np.trace()|Computes the sum of the diagonal elements|
|np.linalg.det()|Computes the matrix determinant|
|np.linalg.eig()|Computes the eigenvalues and eigenvectors of a square matrix|
|np.linalg.inv()|Computes the inverse of a square matrix|
|np.linalg.pinv()|Computes the Moore-Penrose pseudo-inverse of a matrix|
|np.linalg.qr()|Computes the QR decomposition|
|np.linalg.svd()|Computes the singular value decomposition (SVD)|
|np.linalg.solve()|Solve the linear system $Ax = b$ for x where, A is a square matrix
|np.linalg.lstsq()|Compute the lease-squares solution of $Ax = b$


 ## Sorting
- NumPy arrays can be sorted *in-place* with the sort **instance** method of the array
- Using the top-level **np.sort** method on an array would return a *copy*
- Multidimensional arrays can also be sorted along a axis

In [260]:
arr = np.random.randn(8)
print('arr:\n', arr)
# Using the instance method. in-place sorting
arr.sort()
print('arr (sorted):\n', arr)


arr:
 [ 1.54509562  1.31644276  0.76174466  0.2280686  -1.14972919 -1.62665528
  0.53810454  0.29891128]
arr (sorted):
 [-1.62665528 -1.14972919  0.2280686   0.29891128  0.53810454  0.76174466
  1.31644276  1.54509562]


In [261]:
arr = np.random.randn(8)
# Using the top-level np.sort method. sorted copy is returned
arr1 = np.sort(arr)
print('unsorted array arr:\n', arr)
print('sorted array arr1:\n', arr1)

unsorted array arr:
 [ 0.49778634  1.5042212   1.17442641 -1.27159298 -0.43087861 -1.01255286
  0.17403696  1.03635486]
sorted array arr1:
 [-1.27159298 -1.01255286 -0.43087861  0.17403696  0.49778634  1.03635486
  1.17442641  1.5042212 ]


In [262]:
arr = np.random.randint(0, 10, 8).reshape(2, 4)
print('arr:\n', arr)
arr.sort(axis=0)
print('sorting array over axis=0:\n', arr)
arr.sort(axis=1)
print('sorting array over axis=1:\n', arr)


arr:
 [[5 9 9 9]
 [4 7 4 9]]
sorting array over axis=0:
 [[4 7 4 9]
 [5 9 9 9]]
sorting array over axis=1:
 [[4 4 7 9]
 [5 9 9 9]]


- Indirect Sorts: argsort and lexsort
- Certain cases requires reordering of datasets by one or more keys. In addition to using the *key* argument in the sort method describing a method for secondary sorting, we can also use the *argsort* array instance method or the top-level *np.lexsort* method
  - an axis can be provided to either of the methods to sort over a given axis
  - *argsort* returns the integer indices of a array after sorting it
  - *lexsort* performs indirect lexicopraphical sort on multiple key arrays
  - in any of the sorting methods, the attribute *kind* can be used to specify the sorting algorithm: 'quick', 'mergesort', or 'heapsort' (default:'kind=quicksort')

In [263]:
# argsort
arr = np.array([5, 0, 1, 3, 2]) * 1000
indexer = arr.argsort()
print('unsorted array:', arr)
print('indexer:', indexer)
print('array sorted using its sorted indices and then performing fancy indexing:', arr[indexer])


unsorted array: [5000    0 1000 3000 2000]
indexer: [1 2 4 3 0]
array sorted using its sorted indices and then performing fancy indexing: [   0 1000 2000 3000 5000]


In [264]:
# lexsort: performing on data identified by first and last names
f_name = np.array(['Bob', 'Jane', 'Steve', 'Bill', 'Barbara'])
l_name = np.array(['Jones', 'Arnold', 'Arnold', 'Jones', 'Waters'])
sorter = np.lexsort((f_name, l_name))
print('sorter:', sorter)
df = zip(l_name[sorter], f_name[sorter])
print('Sorted Data:\n', list(df))

sorter: [1 2 3 0 4]
Sorted Data:
 [('Arnold', 'Jane'), ('Arnold', 'Steve'), ('Jones', 'Bill'), ('Jones', 'Bob'), ('Waters', 'Barbara')]


## Set Logic
- NumPy has basic set operations for 1-D arrays

|function|description|
|--------|-----------|
|np.unique()|Computes the sorted unique elements in the array|
|np.intersect1d()|Computes the sorted intersection (common elements) between two arrays|
|np.union1d()|Computes the sorted union of two arrays|
|np.in1d()|Computes a boolean array indicating if corresponding elements from first array is present in the second array|
|np.setdiff1d()|Computes the difference (uncommon elements) between two arrays|
|np.setxor1d()|Computes the symmetric difference to produce elements that are present in either of the arrays but not both|

## Array Manipulation
### ravel() and flatten()
- Transforms a n-D array into 1-D
- The array instance method *ravel* can be used to flatten an array. Like reshape, ravel does not produce a copy of the array if the underlying values are continuous. 
- If we want to instead always create a copy, we would use the array instance method **flatten**

In [283]:
arr = np.arange(10).reshape(2, 5)

print('arr:\n', arr)
arr1 = arr.ravel()  # make a view
arr1[0:5] = -1
print('arr1:', arr1)
print('arr, after its slice arr1 was changed:\n', arr)

arr = np.arange(10).reshape(2, 5)
arr2 = arr.flatten()  # makes a copy
arr2[0:5] = 0
print('arr2:', arr2)
print('arr, after its slice arr1 was changed:\n', arr)


arr:
 [[0 1 2 3 4]
 [5 6 7 8 9]]
arr1: [-1 -1 -1 -1 -1  5  6  7  8  9]
arr, after its slice arr1 was changed:
 [[-1 -1 -1 -1 -1]
 [ 5  6  7  8  9]]
arr2: [0 0 0 0 0 5 6 7 8 9]
arr, after its slice arr1 was changed:
 [[0 1 2 3 4]
 [5 6 7 8 9]]


## Concatenating and Splitting Arrays
### np.concatenate():
- takes a sequence and joins them together in order along the input axis      

In [284]:
arr1 = np.arange(1, 7, step=1).reshape(2, 3)
arr2 = np.arange(7, 13, step=1).reshape(2, 3)
print('np.concatenate on axis=0:\n', np.concatenate([arr1, arr2], axis=0))
print('\nnp.concatenate on axis=1:\n', np.concatenate([arr1, arr2], axis=1))


np.concatenate on axis=0:
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

np.concatenate on axis=1:
 [[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]


### np.split(): 
- slices an array into multiple arrays along an axis

In [270]:
#Splitting Arrays
arr = np.arange(30).reshape(5, 6)
print('arr:\n', arr)
arr1, arr2, arr3 = np.split(arr, [2, 3, ], axis=1)
print('arr1:\n', arr1)  # arr[:, 0:2]
print('arr2:\n', arr2)  # arr[:, 2:3]
print('arr3:\n', arr3)  # arr[:, 3:]


arr:
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]]
arr1:
 [[ 0  1]
 [ 6  7]
 [12 13]
 [18 19]
 [24 25]]
arr2:
 [[ 2]
 [ 8]
 [14]
 [20]
 [26]]
arr3:
 [[ 3  4  5]
 [ 9 10 11]
 [15 16 17]
 [21 22 23]
 [27 28 29]]


## File Input and Output with Arrays
- NumPy is able to save and load data to and from disk either in text or binary format
- *np.save* and *np.load* are two workhorse functions for efficiently saving and loading data on disk
- Arrays are saved by default in an uncompressed raw binary format with file extension *.npy*
- Multiple arrays can be stored in an uncompressed archive *.npz* file using *np.savez* method and passing the arrays as arguments
- *np.savez_compressed* method can be used to store the arrays with compression
- *npz* files are loaded as a dict-like object that loads indivisual arrays lazily

In [272]:
arr1 = np.arange(100)
arr2 = np.arange(100, 1000)
# the ndarray arr1 will be stored into a file call data.npy
np.save('./data/data', arr1)
# the ndarrays arr1 and arr2 will be stored into a archive file (with compression) called data.npz
np.savez_compressed('./data/data', arr1, arr2)


In [325]:
arr = np.load('./data/data.npy')  # Loading data.npy into variable arr
arrDict = np.load('./data/data.npz')  # Loading data.npz into variable arrDict
print('arr:', arr[:10])
print('keys in arrDict:\n', list(arrDict.keys()))  # Lazily loading 1st array
print('value of array stored in key arr_0 in arrDict:\n',
      list(arrDict['arr_0'][:10]))  # Lazily loading 1st array


arr: [0 1 2 3 4 5 6 7 8 9]
keys in arrDict:
 ['arr_0', 'arr_1']
value of array stored in key arr_0 in arrDict:
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
