<h1>Chapter 4: NumPy Basics: Arrays and Vectorized Computation<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#4.1-The-NumPy-ndarray:-A-Multidimensional-Array-Object" data-toc-modified-id="4.1-The-NumPy-ndarray:-A-Multidimensional-Array-Object-1">4.1 The NumPy ndarray: A Multidimensional Array Object</a></span><ul class="toc-item"><li><span><a href="#Creating-ndarrays" data-toc-modified-id="Creating-ndarrays-1.1">Creating ndarrays</a></span></li><li><span><a href="#Data-Types-for-ndarrays" data-toc-modified-id="Data-Types-for-ndarrays-1.2">Data Types for ndarrays</a></span></li><li><span><a href="#Arithmetic-with-NumPy-Arrays" data-toc-modified-id="Arithmetic-with-NumPy-Arrays-1.3">Arithmetic with NumPy Arrays</a></span></li><li><span><a href="#Basic-Indexing-and-Slicing" data-toc-modified-id="Basic-Indexing-and-Slicing-1.4">Basic Indexing and Slicing</a></span></li><li><span><a href="#Indexing-with-slices" data-toc-modified-id="Indexing-with-slices-1.5">Indexing with slices</a></span></li><li><span><a href="#Boolean-Indexing" data-toc-modified-id="Boolean-Indexing-1.6">Boolean Indexing</a></span></li><li><span><a href="#Fancy-Indexing" data-toc-modified-id="Fancy-Indexing-1.7">Fancy Indexing</a></span></li><li><span><a href="#Transposing-Arrays-and-Swapping-Axes" data-toc-modified-id="Transposing-Arrays-and-Swapping-Axes-1.8">Transposing Arrays and Swapping Axes</a></span></li></ul></li><li><span><a href="#4.2-Pseudorandom-Number-Generation" data-toc-modified-id="4.2-Pseudorandom-Number-Generation-2">4.2 Pseudorandom Number Generation</a></span></li><li><span><a href="#4.3-Universal-Functions:-Fast-Element-Wise-Array-Functions" data-toc-modified-id="4.3-Universal-Functions:-Fast-Element-Wise-Array-Functions-3">4.3 Universal Functions: Fast Element-Wise Array Functions</a></span></li><li><span><a href="#4.4-Array-Oriented-Programming-with-Arrays" data-toc-modified-id="4.4-Array-Oriented-Programming-with-Arrays-4">4.4 Array-Oriented Programming with Arrays</a></span><ul class="toc-item"><li><span><a href="#Expressing-Conditional-Logic-as-Array-Operations" data-toc-modified-id="Expressing-Conditional-Logic-as-Array-Operations-4.1">Expressing Conditional Logic as Array Operations</a></span></li><li><span><a href="#Mathematical-and-Statistical-Methods" data-toc-modified-id="Mathematical-and-Statistical-Methods-4.2">Mathematical and Statistical Methods</a></span></li><li><span><a href="#Methods-for-Boolean-Arrays" data-toc-modified-id="Methods-for-Boolean-Arrays-4.3">Methods for Boolean Arrays</a></span></li><li><span><a href="#Sorting" data-toc-modified-id="Sorting-4.4">Sorting</a></span></li><li><span><a href="#Unique-and-Other-Set-Logic" data-toc-modified-id="Unique-and-Other-Set-Logic-4.5">Unique and Other Set Logic</a></span></li></ul></li><li><span><a href="#4.5-File-Input-and-Output-with-Arrays" data-toc-modified-id="4.5-File-Input-and-Output-with-Arrays-5">4.5 File Input and Output with Arrays</a></span></li><li><span><a href="#4.6-Linear-Algebra" data-toc-modified-id="4.6-Linear-Algebra-6">4.6 Linear Algebra</a></span></li><li><span><a href="#4.7-Example:-Random-Walks" data-toc-modified-id="4.7-Example:-Random-Walks-7">4.7 Example: Random Walks</a></span><ul class="toc-item"><li><span><a href="#Simulating-One-Random-Walks-at-Once" data-toc-modified-id="Simulating-One-Random-Walks-at-Once-7.1">Simulating One Random Walks at Once</a></span></li><li><span><a href="#Simulating-Many-Random-Walks-at-Once" data-toc-modified-id="Simulating-Many-Random-Walks-at-Once-7.2">Simulating Many Random Walks at Once</a></span></li></ul></li></ul></div>

In [1]:
# If you use Colab Notebook, you can uncomment the following to mount your Google Drive to Colab
# After that, your colab notebook can read/write files and data in your colab

#from google.colab import drive
#drive.mount('/content/drive')


In [2]:
# If you use Colab Notebook, please change the current directory to be the folder that you save 
# your Notebook and data folder for example, I save my Colab files and data at the following location

#%cd /content/drive/MyDrive/Colab\ Notebooks

In [3]:
import numpy as np
np.random.seed(12345)
import matplotlib.pyplot as plt
plt.rc("`figure", figsize=(10, 6))
np.set_printoptions(precision=4, suppress=True)

KeyError: 'Unrecognized key "`figure.figsize" for group "`figure" and name "figsize"'

In [None]:
import numpy as np

my_arr = np.arange(1_000_000)
my_list = list(range(1_000_000))

In [None]:
# use timeit to measure execution time 
%timeit my_arr2 = my_arr * 2


In [None]:
%timeit my_list2 = [x * 2 for x in my_list]

## 4.1 The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray,
which is a fast, flexible container for large datasets in Python. Arrays enable us to
perform mathematical operations on whole blocks of data using similar syntax to the
equivalent operations between scalar elements.

In [None]:
# importa numpy and create an array

import numpy as np
data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
data

In [None]:
# multiply all elements of the array by 10

data * 10

In [None]:
# element-wise addition

data + data

In [None]:
# very array has a shape, a tuple indicating the size of each dimension
# this is an array of two rows and three columns

data.shape

In [None]:
# An ndarray is a generic multidimensional container for homogeneous data; that is, all 
# of the elements must be the same type.
# dtype, an object describing the data type of the array

data.dtype

### Creating ndarrays

Table 4-1 has some important NumPy array creation functions

In [None]:
# The easiest way to create an array is to use the array function. This accepts any
# sequence-like object (including other arrays) and produces a new NumPy array
#containing the passed data

data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

In [None]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

In [None]:
# ndim is a method to find the the number of dimensions of the array

arr2.ndim

In [None]:
# shape is a method to find the shape of the array

arr2.shape

In [None]:
# check the data type using dtype

arr1.dtype

In [None]:
# check the data type using dtype

arr2.dtype

In [None]:
# numpy.zeros creates arrays of 0's

np.zeros(10)

In [None]:
np.zeros((3, 6))

In [None]:
#numpy.ones create arrays
np.ones((3,1))

In [None]:
# numpy.empty creates an array without initializing its values to any particular value

np.empty((2, 3, 2))

In [None]:
# numpy.arange is an array-valued version of the built-in Python range function

np.arange(15)

### Data Types for ndarrays

Table 4-2 lists NumPy data types


In [None]:
#The data type or dtype is a special object containing the information (or metadata,
#data about data) the ndarray needs to interpret a chunk of memory as a particular
#type of data
# claim data type when creating an array

arr1 = np.array([1, 2, 3], dtype=np.float64)
arr1.dtype

In [None]:
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype

In [None]:
# We can explicitly convert or cast an array from one data type to another using
# ndarray’s astype method

# In this example, arr is an array of integers. We use the astype method to convert 
#it to an array of 64-bit float point

arr = np.array([1, 2, 3, 4, 5])
arr.dtype
float_arr = arr.astype(np.float64)
float_arr
float_arr.dtype

In [None]:
# convert an array of 64-bit float to 32-bit integer
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr.dtype
arr.astype(np.int32)

In [None]:
# convert an array of strings to 64-bit float point
numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_)
numeric_strings.astype(float)


In [None]:
# appy the data type of calibers to int_array

int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)

In [None]:
# an array of unsigned integers

zeros_uint32 = np.zeros(8, dtype="u4")
zeros_uint32

### Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this vectorization. Any arithmetic
operations between equal-size arrays apply the operation element-wise

In [None]:
# define an array of size 2 x 3

arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr


In [None]:
# element-wise multiplication

arr * arr


In [None]:
# element-wise deduction
arr - arr

In [None]:
# Arithmetic operations with scalars propagate the scalar argument to each element in the array

1 / arr


In [None]:
arr ** 2

In [None]:
# Comparisons between arrays of the same size yield Boolean arrays

arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2
arr2 > arr

### Basic Indexing and Slicing

In [None]:
# define an array

arr = np.arange(10)
arr


In [None]:
# the element indexed as 5
arr[5]

In [None]:
# elements of arr indexed from 5 to 7
arr[5:8]


In [None]:
# assign 12 to elements indexed from 5 to 12
arr[5:8] = 12
arr

In [None]:
# An important first distinction from Python’s built-in lists is that
# array slices are views on the original array. This means that the data
# is not copied, and any modifications to the view will be reflected in
#the source array.

# for example, arr_slice is a slice of arr
arr_slice = arr[5:8]
arr_slice

In [None]:
# when we modify the arr_slice, a slice of arr, arr is modified too
arr_slice[1] = 12345
arr

In [None]:
# assign all elements of arr_slice as 64, which modifies the corresponding slice in arr
arr_slice[:] = 64
arr

In [None]:
# If you want a copy of a slice of an ndarray instead of a view, you will need to 
# explicitly copy the array—for example, arr[5:8].copy()

# in this example, we copy a slice of arr to array_slice_nw. After we modify  array_slice_nw, 
# arry doesn't change

arr = np.arange(10)
array_slice_nw = arr[5:8].copy()
array_slice_nw = 64
arr

In [None]:
# define a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

In [None]:
# access the element indexed as row (axis 0) 1, column (axis 1) 2

arr2d[0][2]


In [None]:
# we can also pass a comma-separated list of indices to select individual elements

arr2d[0, 2]

In [None]:
# define a 3D array
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

In [None]:
# In multidimensional arrays, if you omit later indices, the returned object will be a
#lower dimensional ndarray consisting of all the data along the higher dimensions

arr3d[0]

In [None]:
# save a copy of arr3d[0] as old_values
old_values = arr3d[0].copy() 

# modify arr3d[0] to be all 42's
arr3d[0] = 42
arr3d


In [None]:
# modify arr3d[0] by assigning the (origional) values saved in old_values
arr3d[0] = old_values
arr3d

In [None]:
# Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0)

arr3d[1, 0]

In [None]:
# This expression is the same as though we had indexed in two steps:
x = arr3d[1]
x
x[0]

### Indexing with slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced with the
familiar syntax

Refer to Figure 4-2


In [None]:
arr

In [None]:
arr[1:6]

In [None]:
arr2d

In [None]:
arr2d[:2]

In [None]:
# pass multiple slices just like you can pass multiple indexes

arr2d[:2, 1:]

In [None]:
lower_dim_slice = arr2d[1, :2]
lower_dim_slice

In [None]:
# By mixing integer indexes and slices, we get lower dimensional slices.
#Here, while arr2d is two-dimensional, lower_dim_slice is one-dimensional, and its
#shape is a tuple with one axis size:
    
lower_dim_slice.shape

In [None]:
temp=arr2d[:2, 2]
temp.shape

In [None]:
temp=arr2d[:, :1]
temp.shape

In [None]:
arr2d[:2, 1:] = 0
arr2d

### Boolean Indexing

In [None]:
# define a 1D array encompassing names. There are duplicated names
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])

# define a 2D array with the same size on axis-0 (i.e., row)
data = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2],
                 [-12, -4], [3, 4]])
print(names)
print(data)



In [None]:
# Suppose each name corresponds to a row in the data array and we wanted to
# select all the rows with the corresponding name "Bob".
# comparisons (such as ==) with arrays are also vectorized. Thus, comparing names
#with the string "Bob" yields a Boolean array:

names == "Bob"

In [None]:
# This Boolean array can be passed when indexing the array:

data[names == "Bob"]

In [None]:
# slices of data with axis-0= "Bob" and axis-1 =1:
data[names == "Bob", 1:]


In [None]:
#access elements with axis-0 ="Bob", and axis-1=1
data[names == "Bob", 1]

In [None]:
# To select everything but "Bob" you can either use != or negate the condition using ~:

names != "Bob"
~(names == "Bob")
data[~(names == "Bob")]

In [None]:
# The ~ operator can be useful when you want to invert a Boolean array referenced by a variable

cond = names == "Bob"
data[~cond]

In [None]:
# slice of data with names =="Bob" or "Will" 
#The Python keywords and and or do not work with Boolean arrays. Use & (and) and | (or) instead.

mask = (names == "Bob") | (names == "Will")
mask
data[mask]

In [None]:
# find negative values of data, and change them to 0

data[data < 0] = 0
data

In [None]:
# change all values of data to become 7 except for those pertaining to "Joe" 

data[names != "Joe"] = 7
data

### Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays

In [None]:
arr = np.zeros((8, 4))
for i in range(8):
    arr[i] = i
arr

In [None]:
# To select a subset of the rows in a particular order, you can simply pass a list or
# ndarray of integers specifying the desired order:

arr[[4, 3, 0, 6]]

In [None]:
# Using negative indices selects rows from the end
arr[[-3, -5, -7]]

In [None]:
# create an array of 32 elements, ranging from o to 31. Then reshape it according
# to the dimensions specified in the tuple

arr = np.arange(32).reshape((8, 4))
arr


In [None]:
# selecting a subset of the matrix’s rows using arr[[1, 5, 7, 2]].
# Then, select a subset of columns from the subset. Here is one way to get that:

arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

In [None]:
# Keep in mind that fancy indexing, unlike slicing, always copies the data into a new
#array when assigning the result to a new variable. If you assign values with fancy
#indexing, the indexed values will be modified:

arr[[1, 5, 7, 2], [0, 3, 1, 2]]
arr[[1, 5, 7, 2], [0, 3, 1, 2]] = 1000
arr

### Transposing Arrays and Swapping Axes

In [None]:
# Transposing is a special form of reshaping that similarly returns a view on the
# underlying data without copying anything. Arrays have the transpose method and
#the special T attribute:

# the following example generates a 3x5 array. Then the transpose method changes it to a 5x3 array
arr = np.arange(15).reshape((3, 5))
arr
arr.T

In [None]:
# numpy.dot performs matrix computations

arr = np.array([[0, 1, 0], [1, 2, -2], [6, 3, 2], [-1, 0, -1], [1, 0, 1]])
arr
np.dot(arr.T, arr)

In [None]:
# The @ infix operator is another way to do matrix multiplication
arr.T @ arr

In [None]:
#. ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to
# rearrange the data
# swapaxes similarly returns a view on the data without making a copy
arr
arr.swapaxes(0, 1)

## 4.2 Pseudorandom Number Generation

The numpy.random module supplements the built-in Python random module with
functions for efficiently generating whole arrays of sample values from many kinds of
probability distributions

Refer to Table 4-3

In [None]:
# generate a 4x4 array with all elements sampled from a standard normal distribution

samples = np.random.standard_normal(size=(4, 4))
samples

In [None]:
#Python’s built-in random module, by contrast, samples only one value at a time. As
#you can see from this benchmark, numpy.random is well over an order of magnitude
#faster for generating very large samples

from random import normalvariate
N = 1_000_000
%timeit samples = [normalvariate(0, 1) for _ in range(N)]

In [None]:
%timeit np.random.standard_normal(N)

In [None]:
# These random numbers are not truly random (rather, pseudorandom) but instead
#are generated by a configurable random number generator that determines deterministically
#what values are created. Functions like numpy.random.standard_normal use
#the numpy.random module’s default random number generator, but your code can be
#configured to use an explicit generator

#The seed argument is what determines the initial state of the generator, and the state
#changes each time the rng object is used to generate data

rng = np.random.default_rng(seed=12435)
data = rng.standard_normal((2, 3))
data

In [None]:
type(rng)

## 4.3 Universal Functions: Fast Element-Wise Array Functions

A universal function, or ufunc, is a function that performs element-wise operations
on data in ndarrays. You can think of them as fast vectorized wrappers for simple
functions that take one or more scalar values and produce one or more scalar results

Tables 4-4 and 4-5 are a listing of some of NumPy’s ufuncs

In [None]:
arr = np.arange(10)
arr

In [None]:
# # Many ufuncs are simple element-wise transformations.
# unary ufuncs take one array as input, perform element-wise operation, and return one or multiple arrays of the same size

np.sqrt(arr)

In [None]:
np.exp(arr)

In [None]:
# binary ufuncs take two arrays as the input and return one array
# In this example, numpy.maximum computed the element-wise maximum of the elements in x and y.

x = rng.standard_normal(8)
y = rng.standard_normal(8)
print(x)
print(y)
np.maximum(x, y)

In [None]:
# 
arr = rng.standard_normal(7) * 5
arr


In [None]:
# a ufunc can return multiple arrays. numpy.modf is one example. 
# it returns the fractional and integral parts of a floating-point array

remainder, whole_part = np.modf(arr)

print(f'remainder is: {remainder}')
print(f'whole part is: {whole_part}')

In [None]:
# Ufuncs accept an optional out argument that allows them to assign their results into
# an existing array rather than create a new one
arr


In [None]:
out1 = np.zeros_like(arr) # np.zeros_like()Return an array of zeros with the same shape and type as a given array
out1

In [None]:
np.add(arr, 1, out=out1)
out1

## 4.4 Array-Oriented Programming with Arrays

Using NumPy arrays enables you to express many kinds of data processing tasks as
concise array expressions that might otherwise require writing loops. This practice
of replacing explicit loops with array expressions is referred to by some people
as vectorization. In general, vectorized array operations will usually be significantly
faster than their pure Python equivalents, with the biggest impact in any kind of
numerical computations.

In [None]:
# generate a grid 
points = np.arange(-5, 5, 0.01) # 100 equally spaced points 
xs, ys = np.meshgrid(points, points) # Return a list of coordinate matrices from coordinate vectors.
xs
ys

In [None]:
# evaluate the function sqrt(x^2 + y^2) across a regular grid of values.
z = np.sqrt(xs ** 2 + ys ** 2)
z

In [None]:
#Visualize the result as a grey color image

import matplotlib.pyplot as plt
plt.imshow(z, cmap=plt.cm.gray, extent=[-5, 5, -5, 5])
plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

In [None]:
# Redraw the current figure
plt.draw()

In [None]:
# If you’re working in IPython, you can close all open plot windows by executing

plt.close("all")

### Expressing Conditional Logic as Array Operations

In [None]:
# Suppose we had a Boolean array and two arrays of values

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

In [None]:
#Suppose we wanted to take a value from xarr whenever the corresponding value in
#cond is True, and otherwise take the value from yarr.

result = [(x if c else y)
          for x, y, c in zip(xarr, yarr, cond)]
result

In [None]:
# The numpy.where function is a vectorized version of the ternary expression x if condition else y

result = np.where(cond, xarr, yarr)
result

In [None]:
# The second and third arguments to numpy.where don’t need to be arrays; one or
# both of them can be scalars. A typical use of where in data analysis is to produce a
# new array of values based on another array.

# In this example, arr is a 4x4 2D array with values drawn from the standard normal distribution
# A new 3D array is created by replacing all positive values with 2 and rest with -2  

arr = rng.standard_normal((4, 4))
arr
arr > 0
np.where(arr > 0, 2, -2)

In [None]:
# only replace the positive values with 2 and rest unchanged
np.where(arr > 0, 2, arr) # set only positive values to 2

### Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or
about the data along an axis are accessible as methods of the array class. We can
use aggregations (sometimes called reductions) like sum, mean, and std (standard
deviation) either by calling the array instance method or using the top-level NumPy
function. 

Table 4-6. Basic array statistical methods

In [None]:
# generate a 5x4 2D array
arr = rng.standard_normal((5, 4))
arr

In [None]:
# use the mean method to calculate the mean value by calling the array instance
arr.mean()


In [None]:
# When we use the NumPy function, we have to pass the array we want to average 
# as the first argument.
np.mean(arr)


In [None]:
arr.sum()

In [None]:
# calculate mean along axis-1 (e.g., mean values by rows)
arr.mean(axis=1)


In [None]:
# calculate mean along axis-1 (e.g., means by columns)
arr.sum(axis=0)

In [None]:
# cumsum and cumprod do not aggregate. It calculates the cumulative values up to
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])

# cumsum return the cumulative sum of the elements along a given axis. 
arr.cumsum()

In [None]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr

In [None]:
# cumsum along axis=0 
arr.cumsum(axis=0)


In [None]:
# cumsum along axis=1
arr.cumsum(axis=1)

In [None]:
# The default (None) is to compute the cumsum over the flattened array.
arr.cumsum()

### Methods for Boolean Arrays

In [None]:
arr = rng.standard_normal(100)
(arr > 0).sum() # Number of positive values

In [None]:
(arr <= 0).sum() # Number of non-positive values

In [None]:
bools = np.array([False, False, True, False])



In [None]:
# anytests whether one or more values in an array is True

bools.any()

In [None]:
# all checks if every value is True

bools.all()

### Sorting

NumPy arrays can be sorted in place with the sort method

In [None]:
# create a 1D array
arr = rng.standard_normal(6)
arr


In [None]:
# calling the sort method

arr.sort()
arr

In [None]:
# create a 2D array

arr = rng.standard_normal((5, 3))
arr

In [None]:
# sort each one-dimensional section of values in a multidimensional 
# array in place along an axis by passing the axis number to sort

# sort along axis=0
arr.sort(axis=0)
arr


In [None]:
# sort along axis=1
arr.sort(axis=1)
arr

In [None]:
# The top-level method numpy.sort returns a sorted copy of an array (like the Python
# built-in function sorted) instead of modifying the array in place.

arr2 = np.array([5, -10, 7, 1, 0, -3])

sorted_arr2 = np.sort(arr2)
print('sorted_arr2:',sorted_arr2)
print('\narr2:',arr2)

### Unique and Other Set Logic

NumPy has some basic set operations for one-dimensional ndarrays.

Table 4-7. Array set operations

In [None]:
# numpy.unique returns the sorted unique values in an array

names = np.array(["Bob", "Will", "Joe", "Bob", "Will", "Joe", "Joe"])
np.unique(names)


In [None]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

In [None]:
# Contrast numpy.unique with the pure Python alternative:

sorted(set(names))

In [None]:
# numpy.in1d tests membership of the values in one array in another, returning a Boolean array

values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])

In [None]:
arr_1= ([1,2,3,5])
arr_2=([2,3,4,5,7])

In [None]:
# numpy.intersect1d(x,y) computes the sorted, common elements in x and y
np.intersect1d(arr_1, arr_2)

In [None]:
# union1d(x, y) computes the sorted union of elements in x and y
np.union1d(arr_1, arr_2) 

In [None]:
# setdiff1d(x, y) computes set difference, elements in x that are not in y
np.setdiff1d(arr_1, arr_2) 

In [None]:
# setxor1d(x, y) computes set symmetric differences;
# elements that are in either of the arrays, but not both

np.setxor1d(arr_1, arr_2) 

## 4.5 File Input and Output with Arrays

In [None]:
# numpy.save saves array data on disk.
arr = np.arange(10)
np.save("some_array", arr)

In [None]:
# numpy.load loads array data from disk.
np.load("some_array.npy")

In [None]:
# You can save multiple arrays in an uncompressed archive using numpy.savez and
# passing the arrays as keyword arguments:

arr_1=([1,2,4,5])
array_2=([3,4,2,9])
np.savez("array_archive.npz", a=arr_1, b=arr_2)

In [None]:
# When loading an .npz file, we get back a dictionary-like object that loads the
# individual arrays lazily

arch = np.load("array_archive.npz")
arch["b"]

In [None]:
# If our data compresses well, you may wish to use numpy.savez_compressed instead

np.savez_compressed("arrays_compressed.npz", a=arr, b=arr)

In [None]:
# remove files
!rm some_array.npy
!rm array_archive.npz
!rm arrays_compressed.npz

## 4.6 Linear Algebra

Table 4-8. Commonly used numpy.linalg functions

Please refer to the following notebook
https://github.com/SBU-r/CIV355_Spring24/blob/main/Prelim_Notebooks/Linear_Algebra.ipynb

In [None]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y
x.dot(y)

In [None]:
np.dot(x, y)

In [None]:
x @ np.ones(3)

In [None]:
from numpy.linalg import inv, qr
X = rng.standard_normal((5, 5))
mat = X.T @ X
inv(mat)
mat @ inv(mat)

## 4.7 Example: Random Walks

The simulation of random walks provides an illustrative application of utilizing array
operations.

In [None]:
# Example 1 
# Let’s first consider a simple random walk starting at 0 with steps of 
# 1 and –1 occurring with equal probability

import random
position = 0
walk = [position]
nsteps = 1000
for _ in range(nsteps):
    step = 1 if random.randint(0, 1) else -1 # random.randint(a,b) returns a random integer between a and b inclusively
    position += step
    walk.append(position)


In [None]:
plt.figure(figsize=(8,6))
#plt.step(np.arange(100),walk[:100])
plt.step(np.arange(100),walk[:100])
plt.show()

### Simulating One Random Walks at Once

In [None]:
# Example 2

# You might make the observation that walk is the cumulative sum of the random steps
# and could be evaluated as an array expression. Thus, I use the numpy.random module
# to draw 1,000 coin flips at once, set these to 1 and –1, and compute the cumulative sum

nsteps = 1000
rng = np.random.default_rng(seed=12345)  # fresh random generator
draws = rng.integers(0, 2, size=nsteps) # rng.intergers (x,y) returns random integers from low (inclusive) to high (exclusive), 
steps = np.where(draws == 0, 1, -1) # steps are 0's if draws are 1's, and steos are -1's if draws are 1's
walk = steps.cumsum() # random walk is the cumulative of steps

In [None]:
#From this we can begin to extract statistics like the minimum and maximum value
#along the walk’s trajectory:

walk.min()

In [None]:
walk.max()

In [None]:
# how long it took the random walk to get at least 10 steps away from the origin 0 in either direction.
# np.abs(walk) >= 10 gives us a Boolean array indicating where the walk has reached or exceeded 10.
# argmax further returns the first index of the maximum value in the Boolean array (True is the maximum value):

(np.abs(walk) >= 10).argmax()

### Simulating Many Random Walks at Once

In [None]:
# Example 3
# generate 5000 random walks that each is a random walk of 1000 steps (1:forward or -1: backward)

nwalks = 5000
nsteps = 1000
draws = rng.integers(0, 2, size=(nwalks, nsteps)) # 0 or 1
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(axis=1)
walks

In [None]:
# visualize the first 10 walks 
plt.figure()
plt.step(np.arange(1000),walks[:10,].T)
plt.show()

In [None]:
# find the maximum (forward) distance and maximum (backward) distance that the 5000 walks reached
print(f'farest distance forward: {walks.max()}')
print(f'farest distance backward: {walks.min()}')

In [None]:
 (np.abs(walks) >= 30)

In [None]:
# Among the 5000 walks, how many walks are 30 steps or farther from the oigin  

# (np.abs(walks) >= 30) returns a boolean array indicating values >=30
# (np.abs(walks) >= 30).any(axis=1) returns a boolean array indicating walks that hit 30
hits30 = (np.abs(walks) >= 30).any(axis=1)
hits30
hits30.sum() # Number of walks that hit 30 or -30

In [None]:
# the first time crossing 30 steps
crossing_times = (np.abs(walks[hits30]) >= 30).argmax(axis=1) # for those walks that hit 30, argmax(axis=1) returns the earliest time hitting 30 steps
crossing_times

In [None]:
#  mean crossing time
crossing_times.mean()

In [None]:
draws = 0.25 * rng.standard_normal((nwalks, nsteps))