# Introduction to NumPy
-  __NumPy__ (Numerical Python) is pronounced NUM-py or sometimes NUM-pee
-  NumPy provides __arrays__ similar to Python __lists__, but with more efficient storage and better performance,
particularly as the data grows larger.
-  NumPy arrays are the core of data science tools in Python

In [None]:
# Importing numpy in a program
import numpy as np      # alias to np
print("NumPy version is", np.__version__)   # version check

## Understanding Data Types in Python
- "Primitive" data types in Python (integers, floating point values, Booleans) are objects
    - Each object consists of an internal C-language structure which includes a reference count, a type encoding, a size attribute, and the actual value
    - These attributes make Python a dynamically typed (flexible) language, but also reduce performance
- In a Python list, each element is a self-contained object
    - When lists contain elements of the same data type, some internal information is redundant
    - List elements are randomly distributed throughout memory

## NumPy N-Dimensional Arrays
- A NumPy n-dimensional array (ndarray)  is a fixed-size, multidimensional container of items of the same type and size.
    - The number of dimensions and items in an ndarray is defined by its shape, which is a tuple of N non-negative integers that specify the sizes of each dimension.
- NumPy arrays store data as a contiguous block for efficiency ("dense" arrays).
    - The Python array module (Python 3.3 and later) provides a similar structure, but the NumPy ndarray provides additional operations.

## NumPy Arrays vs. Python Lists

<img src="https://github.com/FSCJ-FacultyDev/SWC-Virtual-2024/blob/main/notebooks.day4/images/SWC22-NumPy.python-list.numpy-array.png?raw=true" width=300 height=300 />

<img src="https://github.com/FSCJ-FacultyDev/SWC-Virtual-2024/blob/main/notebooks.day4/images/SWC22-NumPy.python-list.png?raw=true" width=400 height=400 />

# Creating NumPy Arrays from Lists
- ndarrays can be created from Python lists

In [None]:
# integer array:
npa = np.array([1, 4, 2, 5, 3])
print("A\n", npa)

# upcast integers to float
npa = np.array([3.14, 4, 2, 3])
print("\nB\n", npa)

# specify array element type
npa = np.array([1, 2, 3, 4], dtype='float32')
print("\nC\n", npa)

# multidimensional array using list of lists
npa = np.array([range(i, i + 3) for i in [2, 4, 6]])
print("\nD\n", npa)

# Creating NumPy Arrays from Scratch
- NumPy provides optimized functions to create ndarrays

In [None]:
# create a 10-integer array filled with zeros
npa = np.zeros(10, dtype=int)
print("A", npa)

# create a 3x5 floating point array filled with ones
npa = np.ones((3,5), dtype=float)
print("\nB\n", npa)

# create a 3x5 array filled with 3.14
npa = np.full((3,5), 3.14)
print("\nC\n", npa)

# create an array filled with a linear sequence
# starting at 0, ending at 20, stepping by 2
# (similar to built-in range() function)
npa = np.arange(0, 20, 2)
print("\nD", npa)

In [None]:
# create a 3x3 array of uniformly distributed
# random values between 0 and 1
npa = np.random.random((3,3))
print("E\n", npa)

# create a 3x3 array of normally distributed random
# values with mean 0 and standard deviation 1
npa = np.random.normal(0,1,(3,3))
print("\nF\n", npa)

## Try It!

Create a Python script that demonstrates how to create a NumPy array from scratch and from a list. Use different methods provided by the NumPy library to achieve this.

- Import the NumPy library.
- Create a NumPy array from scratch using methods like np.zeros, np.ones, or np.arange.
- Create a NumPy array from a given list.
- Print the created arrays to verify their contents.

**Sample Output**

```
Array of zeros (3x3):
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Array of ones (2x4):
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]

Array with a range of values (0 to 9):
[0 1 2 3 4 5 6 7 8 9]

Array created from a list:
[1 2 3 4 5]
```

# NumPy Standard Data Types
<img src="https://github.com/FSCJ-FacultyDev/SWC-Virtual-2024/blob/main/notebooks.day4/images/SWC22-NumPy.standard-data-types.png?raw=true" width=400 height=400 />

# NumPy Array Operations
- NumPy array operations are critical to many Data Science packages
- Array operations include
    - accessing attributes: determining size, shape, memory usage, and data types
    - indexing: getting/setting individual array element values
    - slicing: getting/setting subarrays
    - reshaping: changing an array's shape
    - joining/splitting: combining multiple arrays into one, split one array into multiple

# NumPy Array Attributes

- Useful attributes of NumPy arrays:
    - ndim: number of dimensions
    - shape: size of each dimension
    - size: total size of the array (number of elements)
    - dtype: data type
    - itemsize: size in bytes of one array element
    - nbytes: total size of array in bytes (itemsize * size)


# Displaying NumPy Array Attributes
- Let's use a simple NumPy function to demonstrate how to display an array's attributes:
    - The **random.randint** function creates a NumPy array of random integers as part of a uniform distribution (probability of all values is the same)

In [None]:
dist = np.random.randint(low=1, high=10, size=5, dtype='l')
print(dist)
print(dist.ndim)
print(dist.shape)
print(dist.size)
print(dist.dtype)
print(dist.itemsize)
print(dist.nbytes)

# 2D NumPy Array Attributes
- The following examples demonstrate the array attributes of various types of NumPy arrays

In [None]:
np.random.seed(0) # seed random generator
x1 = np.random.randint(10, size=6)
x2 = np.random.randint(10, size=(3,4))
x3 = np.random.randint(10, size=(3,4,5))

print("x1: ")
print(x1)

print("x1 ndim: ", x1.ndim)
print("x1 shape: ", x1.shape)
print("x1 size: ", x1.size)

print("x2: ")
print(x2)

print("x2 ndim: ", x2.ndim)
print("x2 shape: ", x2.shape)
print("x2 size: ", x2.size)

print("x3: ")
print(x3)

print("x3 ndim: ", x3.ndim)
print("x3 shape: ", x3.shape)
print("x3 size: ", x3.size)

## Try It!

Create a Python script that demonstrates the creation of a NumPy array of random floating point values using the np.random.rand function. Display various attributes of the created array, such as its number of dimensions, shape, size, data type, item size, and total bytes consumed.

- Import the NumPy library.
- Use np.random.rand to create a 4x4 NumPy array of random floating-point numbers.
- Print the array and its attributes: number of dimensions, shape, size, data type, item size, and total size in bytes.

**Sample Output**

```
Array:
 [[0.80080354 0.62027642 0.14443824 0.11212174]
 [0.67681886 0.50058221 0.66218508 0.12738906]
 [0.45404636 0.39964858 0.64327836 0.00413337]
 [0.89345052 0.39355462 0.59888026 0.19925229]]

Array attributes:
Number of dimensions: 2
Shape: (4, 4)
Size: 16
Data type: float64
Item size (bytes): 8
Total size (bytes): 128
```

# Array Indexing: Single Element, Single Dimension
- NumPy array indexing is similar to Python list indexing, both for accessing and setting single elements

In [None]:
x1 = np.random.randint(10, size=6)
print("x1 = ", x1)
print("x1[0] = ", x1[0])
print("x1[4] = ", x1[4])
print("x1[-1] = ", x1[-1])
print("x1[-2] = ", x1[-2])
x1[1] = 3.14159 # will truncate
print("x1[1] = ", x1[1])
print("x1 = ", x1)

# Array Indexing: Single Element, Multi-Dimension

In [None]:
x2 = np.random.randint(10, size=(3,4))
print("x2 = \n", x2)
print()
print("x2[0,0] = ", x2[0,0])
print("x2[2,0] = ", x2[2,0])
print("x2[2,-1] = ", x2[2,-1])
print("x2[2] = ", x2[2]) # access the entire 3rd row
# print("x2[,2] = ", x2[,2]) # cannot access a column using simple indexing

## Try It!

Create a Python script that demonstrates indexing operations on a NumPy array.

- import the NumPy Library
- create a 3x3 NumPy array of random integers between 1 and 10 (np.random.randint(1, 11, size=(3, 3)))
- display the array
- print the element at row 1, column 2 of the array (indexing is zero-based)
- print the entire second row of the array

Sample Output

```
Original array:
[[4 7 1]
 [6 9 3]
 [5 2 8]]

Element at row 1, column 2:
3

Entire second row:
[5 2 8]
```

# Array Slicing: One Dimension
- NumPy slicing syntax follows that of the standard Python list
- To access a slice of an array x, use x[start:stop:step]
    - If any of these are unspecified, they default to the values
          start=0, stop=(size of dimension), step=1

In [None]:
# slicing
print("x1 = ", x1)
print("x1[:3] = ", x1[:3])    # up to index 3 (excl)
print("x1[3:] = ", x1[3:])    # start at index 3
print("x1[2:5] = ", x1[2:5])  # element index 2 - 4
print("x1[::2] = ", x1[::2])  # every other element
print("x1[::-1] = ", x1[::-1]) # reverse all

# Array Slicing: Multi-Dimension
- Multiple-dimension slices are separated by commas

In [None]:
# multi-slicing
print("x2 = \n", x2)
print("x2[:2,:3] = \n", x2[:2, :3]) # 2 rows, 3 columns
print("x2[:3,::2] = \n", x2[:3, ::2]) # all rows, every other column
print("x2[::-1,::-1] = \n", x2[::-1, ::-1]) # reverse rows and columns

# Accessing Array Rows and Columns
- Combine indexing and slicing to access an entire row or column

In [None]:
print("x2 =\n", x2)
print("x2[:,0]", x2[:, 0])  # first column
print("x2[0,:]", x2[0, :])  # first row
print("x2[0]", x2[0])     # first row (can omit : for row)

# NumPy Arrays as Views
- In contrast to lists, which slice as *copies*, numpy arrays slice as *views*
    - Modifications to a numpy array slice modifies original data in place vs.  modifying a copy of the data, improving performance
    - Use array.copy() as necessary, e.g. to back up original data

In [None]:
print('x2 = ')
print(x2)
x2bak = x2.copy()    # back up x2
x2_sub = x2[:2, :2]  # slice x2 into x2_sub
print('x2_sub (sliced as x2[:2, :2]) = ')
print(x2_sub)
x2_sub[0, 0]= 99     # modify x2_sub
print('x2_sub modified first element:')
print(x2_sub)
print('x2 was also modified:')
print(x2)
x2 = x2bak.copy()    # restore original x2

# Reshaping Arrays
- The reshape() function gives a new shape to an array without changing its data.
    - The size of the initial array must match the size of the reshaped array

In [None]:
# reshape 1D to 2D (3x3)
r = np.arange(1, 10)
print("r = ", r)
grid = r.reshape((3, 3))
print("grid = ")
print(grid)
print('ndim:', grid.ndim, ', shape:', grid.shape)

In [None]:
# convert 1D array to 2D (3x1)
print('reshape a 1x3 array as 3x1:')
x = np.array([1, 2, 3])
print('x ndim:', x.ndim, ', shape:', x.shape)

y = x.reshape(3, 1)
print('y ndim:', y.ndim, ', shape:', y.shape)
print(y)

# NumPy Arrays vs. Matrices
- You will see references to <ins>matrices</ins> in NumPy operations
    - Note the distinction between an <ins>array</ins> and a <ins>matrix</ins>:
        - **array**
          An array is a general n-dimensional container that can hold numbers. It can have any number of dimensions (1D, 2D, 3D, etc.). Arrays are flexible and can be used for a wide range of numerical operations.
        - **matrix**
          A matrix is specifically a 2-dimensional array. Unlike general arrays, matrices are designed to always behave as 2D objects, even when performing operations. They have special rules for operations like multiplication (`*`), which in matrices means matrix multiplication, and powers (`**`), which means matrix powers.

# Array Concatenation
Three functions are available for concatenating arrays:
- np.concatenate  
- np.vstack  
- np.hstack

<ins>np.concatenate</ins> takes a tuple or list of arrays as its first argument

In [None]:
x = np.array([1, 2, 3])
print('x:')
print(x)

y = np.array([3, 2, 1])
print('y:')
print(y)

print('concatenated:')
print(np.concatenate([x, y]))

# Array Concatenation with np.vstack
- <ins>np.vstack</ins> will concatenate vertically
- useful for arrays with different dimensions

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
print(np.vstack([x, grid]))

# Array Concatenation with np.hstack
- <ins>np.hstack</ins> will concatenate horizonally

In [None]:
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])
y = np.array([[99],
              [99]])
# horizontally stack the arrays
print(np.hstack([grid, y]))

# Array Splitting
- The opposite of concatenation is splitting, implemented by <ins>np.split</ins>, <ins>np.hsplit</ins>, and <ins>np.vsplit</ins>.
- Each of these accepts a list of indices giving the split points.
- N split points lead to N + 1 subarrays

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

# Universal Functions
- Standard algorithmic approaches to array manipulation can be very slow when data sets get large.
    - Python's dynamic typing can cause bottlenecks in calculations.
    - Compiled code predetermines types and results are computed much more efficiently.
- NumPy provides an interface into statically typed, compiled routines known as <ins>vectorized</ins> operations, implemented through <ins>universal functions</ins> ("ufuncs").
    - ufuncs make repeated calculations on array elements much more efficient

# Universal Functions: Example
- define a function to calculate reciprocals (1 / number) and calculate the time difference between a standard algorithm vs. ufunc

In [None]:
def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

values = np.random.randint(1, 10, size=5)
print('values:', values)
print('reciprocals:', compute_reciprocals(values))

- Now use the Python timing library to time the reciprocal function on a large data set

In [None]:
import timeit
big_array = np.random.randint(1, 100, size=1000000)
stmt_arg = 'compute_reciprocals(big_array)'
setup_arg = 'from __main__ import compute_reciprocals, big_array'
t = timeit.Timer(stmt=stmt_arg,setup=setup_arg)

print('time to run 5 reciprocal calculations on 1000000 items:')
print(t.timeit(number=5))

- Finally , modify the compute_reciprocals function to use a vectorized calculation

In [None]:
def compute_reciprocals(values):
    output = np.empty(len(values))
    #for i in range(len(values)):
    #    output[i] = 1.0 / values[i]
    output = 1.0 / values
    return output

# run the timed operation again
print('time to run 5 reciprocal calculations on 1000000 items:')
print(t.timeit(number=5))

# uFuncs and Array Arithmetic
- ufuncs make use of Python’s native arithmetic operators.
    - Standard addition, subtraction, multiplication, and division can all be used

In [None]:
x = np.arange(4) # NumPy range of integers from 0 to 3 (inclusive)
print(x)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)

# uFunc Arithmetic Operators
- Arithmetic operations in NumPy are actually wrappers around specific functions built into NumPy
- For example, the + operator is a wrapper for the add function

In [None]:
print(np.add(x, 2))

# uFuncs And Arrays
- ufuncs can be used for operations on multiple arrays

In [None]:
print(np.arange(5))
print(np.arange(1, 6))
print(np.arange(5) / np.arange(1, 6))
print(np.arange(1, 6).dtype)
print((np.arange(5) / np.arange(1, 6)).dtype)

# uFuncs And Multi-Dimensional Arrays
- ufuncs can be used for operations on multi-dimensional arrays


In [None]:
x = np.arange(9).reshape((3, 3)) # transform 1D array to 2D
print('x:\n', x)
print('2 ** x:\n', 2 ** x)

## Try It!

Create a Python script that demonstrates the use of universal functions (ufuncs) with two multi-dimensional NumPy arrays.

- import the NumPy library
- create and display two 3x3 NumPy integer arrays (e.g., np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
- perform and display the results of element-wise addition and multiplication using ufuncs

Sample Output

```
Array 1:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Array 2:
[[9 8 7]
 [6 5 4]
 [3 2 1]]

Addition (Array 1 + Array 2):
[[10 10 10]
 [10 10 10]
 [10 10 10]]

Multiplication (Array 1 * Array 2):
[[ 9 16 21]
 [24 25 24]
 [21 16  9]]
```

# Specifying Output
- For large calculations, rather than creating a temporary array, you can improve performance by writing computation results directly to a predefined memory location.
    - For all ufuncs, you can do this using the out argument of the function

In [None]:
x = np.arange(5)
y = np.empty(5)
print(x)
np.multiply(x, 10, out=y)
print(y)

# Specifying Output: Views
- The **out** argument can also be used with array views.
    - For example, to write the results of a computation to every other element of a specified array:

In [None]:
y = np.zeros(10)
print(x)
np.power(2, x, out=y[::2])
print(y)

- Using the more generic expression y[::2] = 2 ** x would have created a temporary array to hold the results of 2 ** x, followed by a second operation copying those values into the y array.
    - This is fine for a small computation but for very large arrays the memory and performance savings from using the out argument can be significant.

# Aggregates
- Aggregates (e.g. combinations or summaries) can be computed directly from the NumPy objects.
- Reducing an array means repeatedly applying a given operation to the elements until only a single result remains.
    - For example, chaining reduce to the add and multiply ufuncs returns the sum/product of all elements in the array:

In [None]:
x = np.arange(1, 6)
print(x)
print(np.add.reduce(x))
print(np.multiply.reduce(x))

# Accumulating Aggregation Operations
We can store intermediate results of the computations using the <ins>accumulate</ins> function:

In [None]:
print(x)
print(np.add.accumulate(x))
print(np.multiply.accumulate(x))

# Aggregates as Summary Statistics
- A first step in analyzing large data sets  is the computation of summary statistics.
    - Some common summary statistics include the mean, standard deviation, sum, product, median, minimum and maximum, quantiles, etc.
- NumPy has fast built-in aggregation functions for calculating summary statistics
    - Python has many of these functions as well, but they do not use compiled code and are much slower

In [None]:
x = np.random.random(1000000)
print(np.sum(x))  # sum
print(np.mean(x)) # average
print(np.std(x))  # standard deviation

# Aggregates: Object Method Alternatives
The NumPy summary functions are also provided through the object itself:

In [None]:
print(x.sum())
print(x.mean())
print(x.std())
print(x.min())
print(x.max())

# Multi-Dimensional Aggregates
By default, each NumPy aggregation function will return the aggregate over the entire array, including multiple dimensional arrays:

In [None]:
m = np.random.random((3, 4))
print(m)
print(m.sum())

## Try It!

Create a 5x6 array of random floating point values and calculate the aggregated (overall) mean of the values.

- Print the generated array.
- Calculate and display the mean.

Sample Output

```
Generated Array:
[[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548  0.64589411]
 [0.43758721 0.891773   0.96366276 0.38344152 0.79172504 0.52889492]
 [0.56804456 0.92559664 0.07103606 0.0871293  0.0202184  0.83261985]
 [0.77815675 0.87001215 0.97861834 0.79915856 0.46147936 0.78052918]
 [0.11827443 0.63992102 0.14335329 0.94466892 0.52184832 0.41466194]]

Aggregated Mean:
0.58140455
```

# Comparisons, Masks, and Boolean Logic
- Boolean <ins>masks</ins> can be used to examine and manipulate values within NumPy arrays.
    - Masking is useful to extract, modify, count, or otherwise manipulate values in an array based on some criterion

In [None]:
x = np.array([1, 2, 3, 4, 5])
print("\t\t\t", x)
print("less than 3?\t\t", x < 3)
print("greater than 3?\t\t", x > 3)
print("not equal to 3?\t\t", x != 3)
print("equal to 3?\t\t", x == 3)
print("2x = square?\t\t", (2 * x) == (x ** 2))

# Working with Masks
- Counting entries
- How many values are less than 6?

In [None]:
rng = np.random.RandomState(0) # new random number generator with a seed of 0
x = rng.randint(10, size=(3, 4))
print(x, "\n")
print(np.count_nonzero(x < 6))

- "nonzero" as used here in the method **count_nonzero()** does not refer to numerical values in the original array, but instead refers to the numeric values of 1 (True) and 0 (False) in the logical Boolean array resulting from applying the condition x < 6

## Try It!

Generate a 20x20 array of random integers with values between 0 and 30 and use a mask to count the values that are greater than 5, less than 15, and even (hint: use the modulus operator)

**Sample Output**

```
Generated Array:
[[ 6 19 28 14 10  7 28 20  6 25 18 22 10 10 23 20  3  7 23  2]
 [21 20  1 23 11 29  5  1 27 20  0 11 25 21 28 11 24 16 26 26]
 [30  9 27 27 15 14 29 29 14 29 18 11 22 19 24  2  4 18  6 20]
 [ 8  6 17  3 24 27 13 17 25  8 25 20  1 19 27 14 27  6 11 28]
 [ 7 14  2 13 16  3 17  7  3  1 29  5 21  9  3 21 28 30 17 25]
 [11  1  9 29  3 13 30 15 14  7 13 22 27 24 29  7 20 15 12 17]
 [14 20 23 25 24 27 27 27 12  8 28 14 12  0 24  6  8 23  0 11]
 [ 7 23 30 10 18 16  7  2  2  0 26  4  9  6 25  8 27  6  8  7]
 [11  1  0 15 22 22 29 23  4  2 11  7 21 26  2  0  2  4 14 13]
 [ 2  0  4 25 22 30 13  6 26  8 14 14 25  9 27 12 18 30  6 16]
 [19 28  3 29  4 22  6 12 14 10 28  3 12  6 26 18 21 27  1  9]
 [12 29 24 20  5 27 27 11 11 19 29 29 10 25 22 27 24  6 29  0]
 [ 0 24 26 29 24 19 12  8  2  6  5  7 26  8 29  4  0 18  9 11]
 [23 14 26 21 23  8 19 16 29 16 25 19 11 29  6  1  2 16  4 16]
 [23 16 26 16  1  1 27 21 22  4  0  0 18 29  1 20 11 25  5 22]
 [ 3 22 10 23 26 16 30  5 23  4 19  1  5 21 10 30 15 15  0  8]
 [27 26  5 15 28  2 19 27 26  3 18 25  2 30 18 19  6 19  8  0]
 [ 7  6 17  7  0 10 27 24 24 17 22 30 29  9  2  6 27 15 25 15]
 [24 19 27 16  1  0 15 29 11  4  4 26 22  8  8  2 18 15 15  2]
 [19 23 21 23  0 23 19 10 16  7  3  5  7 19 29  2 15 29 24  2]]

Number of target values:
63
```
