#################### 4.1 The NumPy ndarray: A Multidimensional Array Object ###############################

The key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.
An ndarray is a generic multidimensional container for homogeneous data. i. e. all of the elements must be the same type.

1) Creating ndarrays

To create an array to use the array function

2) Data Types for ndarrays

The data type or dtype is a special object containing the information (or metadata, data about data)

3) Arithmetic with NumPy Arrays 

Any arithmetic operations between equal-size arrays applies the operation element-wise. Array enable to express batch operations on data without writing any for loops. NumPy users call this vectorization. Operations between differently sized arrays is called broadcasting.

4) Basic Indexing and Slicing

NumPy array indexing is way to select a subset of your data or individual elements. One-dimensional arrays are similarly to Python lists.
An important first distinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array. To copy of a slice of an ndarray instead of a view, we need to explicitly copy the array.

5) Boolean Indexing 

The boolean array must be of the same length as the array axis it’s indexing. Boolean selection will not fail if the boolean array is not the correct length. We can mix and match boolean arrays with slices or integers.

6) Fancy Indexing

To describe indexing using integer arrays, called Fancy indexing.
Regardless of how many dimensions the array has (here, only 2), the result of fancy indexing is always one-dimensional.
Fancy indexing, unlike slicing, always copies the data into a new array

7) Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping which returns a view on the underlying data without copying anything.
This method require more often When doing matrix computations, such as computing the inner matrix product using np.dot.
For higher dimensional arrays, transpose accept a tuple of axis numbers to permute the axes.

In [3]:
################################# NumPy batch computations ###############################################

import numpy as np

data = np.random.rand(2,3) ### generate random data with 2 row 3 column
data

### perform mathematical operations

data = data * 10 
data

data += data
data

data.shape ### to know shape of the data
data.dtype ### to know data type of array

################################# Creating ndarrays ##########################################################

data1 = [6, 7.5, 8, 0, 1]
data1
arr1 = np.array(data1) ### To create an array to use the array function
arr1

data2 = [[1, 2, 3, 4], [5, 6, 7, 8]] ### Nested sequences,list of equal-length lists (a list of lists)
data2
arr2 = np.array(data2) ### converted (a list of lists) into a multidimensional array ()
arr2 ### array arr2 has two dimensions with shape inferred from the data
arr2.ndim ### To know dimension of array
arr2.shape ### To know shape of array
arr2.dtype ### TO know data type of the element of array

### To create a higher dimensional array, pass a tuple for the shape
# np.zeros((2,3)) ### To create arrays of 0s
# np.ones((3,2)) ### To create arrays of 1s
# np.empty((2,3,2)) ### To create arrays of garbage values

################################# Data Types for ndarrays #######################################################

arr1 = np.array([1,2,3], dtype=np.float64)
arr1.dtype
arr2 = np.array([1,2,3], dtype=np.int32)
arr2.dtype
float_arr = arr2.astype(np.float64) ### Convert an array from one dtype to another using ndarray’s astype method
float_arr

arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr = arr.astype(np.int32)
arr

numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_) ### strings representing numbers
numeric_strings.dtype
arr_num = numeric_strings.astype(np.float64) ### to convert them to numeric form using astype 
arr_num

################################# Arithmetic with NumPy Arrays #######################################################

arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

# arr * arr
# arr - arr
# 1 / arr
# arr ** 2

# arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

# arr > arr2

################################# Basic Indexing and Slicing #######################################################

arr = np.arange(10)
arr
# arr[6]
# arr[2:5]
# arr[5:8] = 12 ### assign a scalar value(12) to a slice
# arr

#### array slices are views on the original array, to explain this
# arr_slice = arr[5:8] ### first create a slice of arr
# arr_slice[1] = 1234  ### now change values in arr_slice
# arr                  ### mutations reflected in the original array arr

# arr_slice = arr[5:8].copy() ### To copy of a slice of an ndarray instead of a view, explicitly define copy the array.
# arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# arr2d
# arr2d[2]
# arr2d[2][1]
# arr2d[1,2]

# arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
# arr3d
# arr3d[0]
# arr3d[0,1]
# arr3d[0,1,2]

################################# Indexing with slices #######################################################

# arr2d[1:]
# arr2d[1:,1:]
# arr2d[:,:2]

################################# Boolean Indexing #######################################################

# names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe']) ### an array of names with duplicates names
# names
# data = np.random.randn(7,4) #### generated random normally distributed data
# data

### suppose each name in the names array correspond to each row in data array
### Now to select all the rows with corresponding name 'Bob'
# names == 'Bob' ### comparing names with the string 'Bob' yields a boolean array
# data[names == 'Bob'] ### boolean array can be passed when indexing the array
# data[names == 'Joe'] ### to select all the rows with corresponding name 'Joe'
# data[names == 'Bob', 2:] ### To select rows where names is 'Bob' and index columns more than 2
# data[names == 'Joe', 2:4] ### To select rows where names is 'Joe' and columns 2 & 3
# data[~(names == 'Bob')] ### To select every row except 'Bob'
# data[(names == 'Bob') | (names == 'Will')]  ### To select row either with 'Bob' or 'Will'
# data[data < 0] = 0 ### To set all of the negative values in data to 0
# data
# data[names != 'Joe'] = 7 ### To Set up whole rows or columns using a 1-D boolean array
# data

################################# Fancy Indexing #############################################################

arr = np.empty((8,4)) ### array of 8 rows and 4 columns
arr

for i in range(8): 
    arr[i] = i  ### assign values to element of array
arr    

arr[[4, 6, 0, 3]] ### To select perticular rows from array, just select the row number and pass it as list
arr[[5, 2, 7, 1, 3, 0, 6]] ### or we can say, ndarray of integers(or float) in specific desired order
arr[[-3, -5, -7]] ### negative indices indicate to selects rows from the end

arr = np.arange(32) ### This will create 1 dimensional array as list
arr
arr = np.arange(32).reshape(8,4) ### Passing multiple index arrays to selects a 1D array of 
arr                              ### elements corresponding to each tuple of indices
# arr[[1,3,5,2], [1,3,2,0]] ### selected element are (1,1), (3,3), (5,2), (2,0)
# arr[[1,3,5,2]][:,[1,3,2,0]] ### selected element are row 0 (1,1), (1,3), (1,2), (1,0)
#                             ###            row 1 (3,1), (3,3), (3,2), (3,0) and so on..

################################## Transposing Arrays and Swapping Axes ########################################
# arr = np.arange(15).reshape(3,5) ### array of 3 rows and  c5olumns
# arr
# arr.T ## transpose method denoted by T
# np.dot(arr.T, arr) ### the inner matrix product using np.dot

### For higher dimensional arrays, transpose accept a tuple of axis numbers to permute the axes
arr = np.arange(16).reshape(2,2,4)
arr
# arr.transpose(1,0,2) ### Here, axes have been reordered with the second axis becomes first, 
                       ### the first axis becomes second, and the last axis unchanged
# arr.transpose(2,1,0)

arr.swapaxes(0,2) ### swapaxes, which takes a pair of axis numbers, switches 
                  ### the indicated axes to rearrange the data

array([[[ 0,  8],
        [ 4, 12]],

       [[ 1,  9],
        [ 5, 13]],

       [[ 2, 10],
        [ 6, 14]],

       [[ 3, 11],
        [ 7, 15]]])

################# 4.2 Universal Functions: Fast Element-Wise Array Functions ##############

A universal function is a function that performs element-wise operations on data in ndarrays. fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [83]:
######################## 4.2 Universal Functions: Fast Element-Wise Array Functions ###############################

arr = np.arange(10)
arr

### Uniary Universal Functions
# np.sqrt(arr) ### element wise square root
# np.square(arr) ### element wise squares
# np.exp(arr) ### element wise expontential

x = np.random.randn(8)
y = np.random.randn(8)
np.maximum(x,y) ### the element-wise maximum of the elements in x and y
remainder, whole_part = np.modf(x * 6) ### modf returns the fractional and integral parts
                                       ### of a floating-point array
# remainder
whole_part

array([ 0., -1., -8.,  8.,  2.,  7.,  1., -4.])

###################### 4.3 Array-Oriented Programming with Arrays #########################

NumPy arrays enables to express data processing tasks as concise array expressions otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. i.e. vectorized array operations will be one or two (or more) orders of magnitude faster than their pure Python equivalents

1) Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition, else y.

2) Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. We can use aggregations like sum, mean, and std (standard deviation) either by calling the array instance method or using the top-level NumPy function.

3) Methods for Boolean Arrays

Boolean values are coerced to 1 (True) and 0 (False). Thus, sum is often used as a means of counting True values in a boolean array.

4) Sorting 

Numpy array can be sorted with the sort method
To sort each one-dimensional section of values in a multidimensional array inplace along an axis by passing the axis number to sort. A quick-and-dirty way to compute the quantiles of an array is to sort it and select the value at a particular rank.

5) Unique and Other Set Logic

NumPy has some basic set operations for one-dimensional ndarrays. example, np.unique returns the sorted unique values in an array.

In [21]:
###################### 4.3 Array-Oriented Programming with Arrays ############################

### to evaluate the function sqrt(x^2 + y^2) across a regular grid of values.

# points = np.arange(-5, 5, 1) ### 10 equally spaced points
# points2 = np.arange(-10,10, 2) ### 10 equally spaced points

### The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding to
###  all pairs of (x, y) in the two arrays
# x, y = np.meshgrid(points, points2)
# z = np.sqrt(x ** 2 + y ** 2) ### sqrt(x^2 + y^2)
# z# np.round(z,2)

### to create visualizations of this two dimensional array we use matplotlib
# import matplotlib.pyplot as plt
# plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
# plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

##################### 1) Expressing Conditional Logic as Array Operations ####################
# x = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
# y = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
# cond = np.array([True, False, True, True, False])

### To Take value from x whenever cond is True else from y
# result = [(x if c else y) for x,c,y in zip(x,cond,y)]  ### problem with method is 1st not
# result ###  fast enough for large array 2nd, it will not work with multidimensional arrays.

# result = np.where(cond, x, y) ###  A use of Where is to produce a new array of values based 
# result                        ### on another array

### Suppose you had a matrix of randomly generated data and you wanted to replace all 
### positive values with 2 and all negative values with –2
# mat = np.random.randn(5,5)
# mat = np.where(mat > 0, 2, -2)
# mat

### Or to replace all positive values in mat with the constant 2
# mat = np.where(mat > 0, 2, mat)
# mat

############################## 2) Mathematical and Statistical Methods ######################

#### To compute some aggregate statistics on normally distributed random data
# arr = np.random.randn(5, 4)
# arr
# arr.mean() ### mean of an entire matrix
# arr.std() ### standard devision of an entire matrix
# arr.sum() ### sum of an entire matrix
# arr.mean(axis=1) ### compute mean across the columns
# arr.std(axis=0) ### compute sum down the rows

# arr = np.arange(1,8)
# arr
# arr.cumsum() ### cumsum producing an array element where each element sum of all previous element
# arr.cumprod()### cumprod producing an array element where each element procuct of all previous element

# arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
# arr
# arr.cumsum(axis=0) ### producing an array element where each element sum of all row wise current & previous element
# arr.cumsum(axis=1) ### producing an array element where each element sum of all column wise current & previous element
# arr.cumprod(axis=0) ### producing an array element where each element product of all row wise current & previous element
# arr.cumprod(axis=1) ### producing an array element where each element product of all column wise current & previous element

########################### 3) Methods for Boolean Arrays ##################################################

arr = np.random.randn(10)
arr = np.arange(-5,5)
# arr > 0
# (arr > 0).sum() ### all the number of positve element in the array
# arr.any() ### any tests whether one or more values in an array is True
# arr.all() ### all checks if every value in an array is True
# arr

######################################## 4) Sorting #############################################################

arr = np.random.randn(10)
arr.sort() ### Numpy array can be sorted with sort() method
arr

arr = np.random.randn(5,3)
arr.sort(1) ### To sort each one-dimensional section of values in a multidimensional array inplace along an axis
arr         ###  by passing the axis number to sort

### to compute the quantiles of an array is to sort it and select the value at a particular rank.
arr = np.random.randn(100)
arr.sort()
arr[int(0.05 * len(arr))] ### To compute 5% quantile
arr[int(0.25 * len(arr))] ### To compute 25% quantile

###################################### 5) Unique and Other Set Logic ###########################################

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names) ### np.unique returns the sorted unique values in an array.
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
# np.unique(ints)
np.in1d(ints,[3,5,1])### np.in1d tests membership of the values in one array in another, returning a boolean array

array([ True,  True,  True, False, False,  True,  True, False, False])

In [None]:
############################## 4.4 File Input and Output with Arrays ############################################

NumPy is able to save and load data to and from disk either in text or binary format.np.save and np.load are the two
workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an 
uncompressed raw binary format with file extension .npy
If data compresses well,then use numpy.savez_compressed.

In [46]:
############################## 4.4 File Input and Output with Arrays ############################################

arr = np.arange(10)
arr1 = np.arange(10,0,-1)
arr1
# np.save('test_array', arr) ### by default Arrays are savedin an uncompressed raw binary format with file extension .npy
# np.load('test_array.npy') ### array on disk can then be loaded with np.load

### multiple arrays can be store in an uncompressed archive using  np.savez & passing the arrays as keyword arguments
np.savez('listof_array', a=arr, b=arr1) 
list_arr = np.load('listof_array.npz') ### while loading an .npz file dict-like object that loads the individual arrays
list_arr['b']

np.savez_compressed('list_of_arr', a=arr, b=arr1) ### If data compresses well,then use numpy.savez_compressed
list_of_arr = np.load('list_of_arr.npz')
list_of_arr['b']

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

############################################### 4.5 Linear Algebra ############################################

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is  part of array library. For matrix multiplication there is a function dot, both an array method and a function in the numpy namespace. 

In [77]:
############################################### 4.5 Linear Algebra ############################################

x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x.dot(y) ### function dot, use for matrix multification
np.dot(x,y) ### function dot, use for matrix multification
np.ones(3)
np.dot(x, np.ones(3)) 
x @ np.ones(3) ### @ symbol performs matrix multiplication

### numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant

from numpy.linalg import inv, qr, det, eig, pinv, svd
x = np.random.randn(5,5)
x
mat = x.T.dot(x) ### x.T.dot(x) computes the dot product of X with its transpose X.T
# mat
# inv(mat) ### inverse of matrix x
mat.dot(inv(mat))
q, r = qr(mat) ### determinant of matrix x
q
r
np.diag(mat) ### Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array
np.trace(mat) ### Compute the sum of the diagonal elements
det(mat) ### Compute the matrix determinant
eig(mat) ### Compute the eigenvalues and eigenvectors of a square matrix
pinv(mat) ### Compute the Moore-Penrose pseudo-inverse of a matrix
svd(mat) ### Compute the singular value decomposition (SVD)


(array([[-0.48659496, -0.61269312, -0.36286292,  0.49913941,  0.08380239],
        [-0.25410727, -0.51961995,  0.08687751, -0.80136512, -0.12526308],
        [ 0.23155428,  0.08763364, -0.7604114 , -0.12075342, -0.58812931],
        [-0.62565152,  0.32416957,  0.31193551,  0.11985234, -0.62594402],
        [-0.50358478,  0.49176893, -0.4304109 , -0.28236116,  0.48947351]]),
 array([19.32875852,  7.22142547,  4.20465043,  0.80467042,  0.07863927]),
 array([[-0.48659496, -0.25410727,  0.23155428, -0.62565152, -0.50358478],
        [-0.61269312, -0.51961995,  0.08763364,  0.32416957,  0.49176893],
        [-0.36286292,  0.08687751, -0.7604114 ,  0.31193551, -0.4304109 ],
        [ 0.49913941, -0.80136512, -0.12075342,  0.11985234, -0.28236116],
        [ 0.08380239, -0.12526308, -0.58812931, -0.62594402,  0.48947351]]))

################################### 4.6 Pseudorandom Number Generation ############################################

The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

In [109]:
################################### 4.6 Pseudorandom Number Generation ############################################
x = np.random.normal(size=(4,4))### 4 × 4 array of samples from the standard normal distribution using normal
x

### numpy.random is well over an order of magnitude faster for generating very large samples
from random import normalvariate
N = 1000000
# %timeit np.random.normal(size=N) ### 37.4 ms ± 2.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# %timeit sample = [normalvariate(0,1,) for _ in range(N)] ### 878 ms ± 61.4 ms per loop (mean ± std. dev. of 7 runs, 
                                                         ### 1 loop each
    
rns = np.random.seed(1234) ### change NumPy’s random number generation seed using np.random.seed  
x = np.random.randn(10) ### data generation functions in numpy.random use a global random seed
x 

### To avoid global state use numpy.random.RandomState to create a random number generator isolated from others
rng = np.random.RandomState(1234)
rng.randn(10)


array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

################################### 4.7 Example: Random Walks ############################################

The simulation of random walks provides an illustrative application of utilizing array operations. consider a simple 
random walk starting at 0 with steps of 1 and –1 occurring with equal probability.

1) Simulating Many Random Walks at Once

To simulate many random walks, let's say 5,000, then generate all of the random walks with minor modifications to the preceding code.

In [156]:
################################### 4.7 Example: Random Walks ############################################

##### a single random walk with 1,000 steps using the built-in random module 
import matplotlib.pyplot as plt
import random
position = 0
walk = [position]
steps = 1000
for i in range(steps):
    step = 1 if random.randint(0,1) else -1
    position += step
    walk.append(position)
    
# plt.plot(walk[:100]) ### To plot of the first 100 values on one of these random walks


### walk is the cumulative sum of the random steps and could be evaluated as an array expression
### thus use the np.random module to draw 1,000 coin flips at once, set these to 1 and –1, 
### and compute the cumulative sum
nstep = 1000
draw = np.random.randint(0,2, size = nstep)
steps = np.where(draw > 0, 1, -1)
walk = steps.cumsum()

### to extract statistics like the minimum and maximum value along the walk’s trajectory
walk.min()
walk.max()

### the step at which the random walk reaches a particular value. i.e. to know how long it took the random walk to get
### at least 10 steps away from the origin 0 in either direction.
# np.abs(walk) >= 10 
(np.abs(walk) >= 10).argmax() ### To compute the index of the first 10 or –10 using argmax, 
                              ### which returns the first index of the maximum value in the boolean array

#############################1) Simulating Many Random Walks at Once ###########################################

### compute the cumulative sum across the rows to compute all 5,000 random walks in one shot
nwalks = 5000
nsteps = 1000
draw = np.random.randint(0, 2, size=(nwalks, nsteps)) ### 0 or 1
draw
steps = np.where(draw > 0, 1, -1)
steps
walk = steps.cumsum(1)
walk

walk.max() ### To compute the maximum values obtained over all of the walks 
walk.min() ### To compute the minimum values obtained over all of the walks 

hits30 = (np.abs(walk) >= 30).any(1) ### compute the minimum crossing time to 30 or –30
hits30.sum() ### number that -30 or 30

### to select out the rows of walks that actually cross the absolute 30 level and call argmax across axis 1 to get
### the crossing times
crossing_time = (np.abs(walk[hits30]) >= 30).argmax(1)
crossing_time.mean()


#### Using normal random number generation function to generate normally distributed steps with some mean and 
###  standard deviation

draw = np.random.normal(loc=0, scale=0.25, size=(nwalks, nsteps))
draw
steps = np.where(draw > 0, 1, -1)
steps
walk = steps.cumsum()
walk
walk.max()
walk.min()


AxisError: axis 1 is out of bounds for array of dimension 1