# Introduction to NumPy
-  __NumPy__ (Numerical Python) is pronounced NUM-py or sometimes NUM-pee
-  NumPy provides __arrays__ similar to Python __lists__, but with more efficient storage and better performance, 
particularly as the data grows larger.
-  NumPy arrays are the core of data science tools in Python

## Installing and Using NumPy
### Installing NumPy on Windows 10 using pip:
- For an "all-user" configuration, run CMDtool as admin user
- Navigate to install folder, e.g. assuming Python is installed in C:\Program Files\Python

     C:
     cd "\Program Files\Python"

- Update pip and setuptools first

        C:\Program Files\Python>python -m pip install --upgrade pip

        C:\Program Files\Python>pip install --upgrade setuptools

        C:\Program Files\Python>pip install numpy

In [2]:
# Importing numpy in a program
import numpy as np      # alias to np
print("NumPy version is", np.__version__)   # version check

NumPy version is 1.22.3


## Understanding Data Types in Python
- "Primitive" data types in Python (integers, floating point values, Booleans) are objects
    - Each object consists of an internal C-language structure which includes a reference count, a type encoding, a size attribute, and the actual value
    - These attributes make Python a dynamically typed (flexible) language, but also reduce performance
- In a Python list, each element is a self-contained object
    - When lists contain elements of the same data type, some internal information is redundant
    - List elements are randomly distributed throughout memory

## NumPy N-Dimensional Arrays
- A NumPy n-dimensional array (ndarray)  is a fixed-size, multidimensional container of items of the same type and size.
    - The number of dimensions and items in an ndarray is defined by its shape, which is a tuple of N non-negative integers that specify the sizes of each dimension.
- NumPy arrays store data as a contiguous block for efficiency ("dense" arrays).
    - The Python array module (Python 3.3 and later) provides a similar structure, but the NumPy ndarray provides additional operations.

## NumPy Arrays vs. Python Lists

<img src="images/SWC22-NumPy.python-list.numpy-array.png" align = "left" width=300 height=300 />

<img src="images/SWC22-NumPy.python-list.png" align = "left" width=400 height=400 />

# Creating NumPy Arrays from Lists
- ndarrays can be created from Python lists

In [3]:
# integer array:
npa = np.array([1, 4, 2, 5, 3])
print(npa)

# upcast integers to float
npa = np.array([3.14, 4, 2, 3])
print(npa)

# specify array element type
npa = np.array([1, 2, 3, 4], dtype='float32')
print(npa)

# multidimensional array using list of lists
npa = np.array([range(i, i + 3) for i in [2, 4, 6]])
print(npa)

[1 4 2 5 3]
[3.14 4.   2.   3.  ]
[1. 2. 3. 4.]
[[2 3 4]
 [4 5 6]
 [6 7 8]]


# Creating NumPy Arrays from Scratch
- NumPy provides optimized functions to create ndarrays

In [4]:
# create a 10-integer array filled with zeros
npa = np.zeros(10, dtype=int)
print(npa)

# create a 3x5 floating point array filled with ones
npa = np.ones((3,5), dtype=float)
print(npa)

# create a 3x5 array filled with 3.14
npa = np.full((3,5), 3.14)
print(npa)

# create an array filled with a linear sequence
# starting at 0, ending at 20, stepping by 2
# (similar to built-in range() function)
npa = np.arange(0, 20, 2)
print(npa)

[0 0 0 0 0 0 0 0 0 0]
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]]
[ 0  2  4  6  8 10 12 14 16 18]


In [5]:
# create a 3x3 array of uniformly distributed
# random values between 0 and 1
npa = np.random.random((3,3))
print(npa)

# create a 3x3 array of normally distributed random
# values with mean 0 and standard deviation 1
npa = np.random.normal(0,1,(3,3))
print(npa)

[[0.5537605  0.58109005 0.2298685 ]
 [0.97531526 0.10335587 0.10618689]
 [0.40293844 0.8536779  0.74357141]]
[[-0.37708185 -2.46004435  2.25021156]
 [ 0.86053277 -1.87805351  1.38797626]
 [-1.60617902  0.21097215  1.31536323]]


# NumPy Standard Data Types
<img src="images/SWC22-NumPy.standard-data-types.png" align = "left" width=400 height=400 />

# NumPy Array Operations
- NumPy array operations are critical to many Data Science packages
- Array operations include
    - accessing attributes: determining size, shape, memory usage, and data types
    - indexing: getting/setting individual array element values
    - slicing: getting/setting subarrays
    - reshaping: changing an array's shape
    - joining/splitting: combining multiple arrays into one, split one array into multiple

# NumPy Array Attributes

- Useful attributes of NumPy arrays:
    - ndim: number of dimensions
    - shape: size of each dimension
    - size: total size of the array (number of elements)
    - dtype: data type
    - itemsize: size in bytes of one array element
    - nbytes: total size of array in bytes (itemsize * size)


# Displaying NumPy Array Attributes
- Let's use a simple NumPy function to demonstrate how to display an array's attributes:
    - The **random.randint** function creates a NumPy array of random integers as part of a uniform distribution (probability of all values is the same)

In [6]:
dist = np.random.randint(low=1, high=10, size=5, dtype='l')
print(dist)
print(dist.ndim)
print(dist.shape)
print(dist.size)
print(dist.dtype)
print(dist.itemsize)
print(dist.nbytes)

[7 2 3 2 7]
1
(5,)
5
int32
4
20


# 2D NumPy Array Attributes
- The following examples demonstrate the array attributes of various types of NumPy arrays

In [7]:
np.random.seed(0) # seed random generator
x1 = np.random.randint(10, size=6)
x2 = np.random.randint(10, size=(3,4))
x3 = np.random.randint(10, size=(3,4,5))

print("x1: ")
print(x1)

print("x1 ndim: ", x1.ndim)
print("x1 shape: ", x1.shape)
print("x1 size: ", x1.size)

print("x2: ")
print(x2)

print("x2 ndim: ", x2.ndim)
print("x2 shape: ", x2.shape)
print("x2 size: ", x2.size)

print("x3: ")
print(x3)

print("x3 ndim: ", x3.ndim)
print("x3 shape: ", x3.shape)
print("x3 size: ", x3.size)

x1: 
[5 0 3 3 7 9]
x1 ndim:  1
x1 shape:  (6,)
x1 size:  6
x2: 
[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]
x2 ndim:  2
x2 shape:  (3, 4)
x2 size:  12
x3: 
[[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]
x3 ndim:  3
x3 shape:  (3, 4, 5)
x3 size:  60


# Array Indexing: Single Element, Single Dimension
- NumPy array indexing is similar to Python list indexing, both for accessing and setting single elements

In [8]:
x1 = np.random.randint(10, size=6)
print("x1 = ", x1)
print("x1[0] = ", x1[0])
print("x1[4] = ", x1[4])
print("x1[-1] = ", x1[-1])
print("x1[-2] = ", x1[-2])
x1[1] = 3.14159 # will truncate
print("x1[1] = ", x1[1])
print("x1 = ", x1)

x1 =  [4 3 4 4 8 4]
x1[0] =  4
x1[4] =  8
x1[-1] =  4
x1[-2] =  8
x1[1] =  3
x1 =  [4 3 4 4 8 4]


# Array Indexing: Single Element, Multi-Dimension

In [9]:
x2 = np.random.randint(10, size=(3,4))
print("x2 = \n", x2)
print()
print("x2[0,0] = ", x2[0,0])
print("x2[2,0] = ", x2[2,0])
print("x2[2,-1] = ", x2[2,-1])


x2 = 
 [[3 7 5 5]
 [0 1 5 9]
 [3 0 5 0]]

x2[0,0] =  3
x2[2,0] =  3
x2[2,-1] =  0


# Array Slicing: One Dimension
- NumPy slicing syntax follows that of the standard Python list
- To access a slice of an array x, use x[start:stop:step]
    - If any of these are unspecified, they default to the values
          start=0, stop=(size of dimension), step=1

In [10]:
# slicing
print("x1 = ", x1)
print("x1[:3] = ", x1[:3])    # up to index 3 (excl)
print("x1[3:] = ", x1[3:])    # start at index 3
print("x1[2:5] = ", x1[2:5])  # element index 2 - 4
print("x1[::2] = ", x1[::2])  # every other element
print("x1[::-1] = ", x1[::-1]) # reverse all

x1 =  [4 3 4 4 8 4]
x1[:3] =  [4 3 4]
x1[3:] =  [4 8 4]
x1[2:5] =  [4 4 8]
x1[::2] =  [4 4 8]
x1[::-1] =  [4 8 4 4 3 4]


# Array Slicing: Multi-Dimension
- Multiple-dimension slices are separated by commas

In [11]:
# multi-slicing
print("x2 = \n", x2)
print("x2[:2,:3] = \n", x2[:2, :3]) # 2 rows, 3 columns
print("x2[:3,::2] = \n", x2[:3, ::2]) # all rows, every other column
print("x2[::-1,::-1] = \n", x2[::-1, ::-1]) # reverse rows and columns

x2 = 
 [[3 7 5 5]
 [0 1 5 9]
 [3 0 5 0]]
x2[:2,:3] = 
 [[3 7 5]
 [0 1 5]]
x2[:3,::2] = 
 [[3 5]
 [0 5]
 [3 5]]
x2[::-1,::-1] = 
 [[0 5 0 3]
 [9 5 1 0]
 [5 5 7 3]]


# Accessing Array Rows and Columns
- Combine indexing and slicing to access an entire row or column

In [12]:
print("x2 =\n", x2)
print("x2[:,0]", x2[:, 0])  # first column
print("x2[0,:]", x2[0, :])  # first row
print("x2[0]", x2[0])     # first row (can omit : for row)

x2 =
 [[3 7 5 5]
 [0 1 5 9]
 [3 0 5 0]]
x2[:,0] [3 0 3]
x2[0,:] [3 7 5 5]
x2[0] [3 7 5 5]


# NumPy Arrays as Views
- In contrast to lists, which slice as *copies*, numpy arrays slice as *views*
    - Modifications to a numpy array slice modifies original data in place vs.  modifying a copy of the data, improving performance
    - Use array.copy() as necessary, e.g. to back up original data

In [13]:
print('x2 = ')
print(x2)
x2bak = x2.copy()    # back up x2
x2_sub = x2[:2, :2]  # slice x2 into x2_sub
print('x2_sub (sliced as x2[:2, :2]) = ')
print(x2_sub)
x2_sub[0, 0]= 99     # modify x2_sub
print('x2_sub modified first element:')
print(x2_sub)
print('x2 was also modified:')
print(x2)
x2 = x2bak.copy()    # restore original x2

x2 = 
[[3 7 5 5]
 [0 1 5 9]
 [3 0 5 0]]
x2_sub (sliced as x2[:2, :2]) = 
[[3 7]
 [0 1]]
x2_sub modified first element:
[[99  7]
 [ 0  1]]
x2 was also modified:
[[99  7  5  5]
 [ 0  1  5  9]
 [ 3  0  5  0]]


# Reshaping Arrays
- The reshape() function gives a new shape to an array without changing its data.
    - The size of the initial array must match the size of the reshaped array

In [16]:
# reshape 1D to 2D (3x3)
r = np.arange(1, 10)
print("r = ", r)
grid = r.reshape((3, 3))
print("grid = ")
print(grid)
print('ndim:', grid.ndim, ', shape:', grid.shape)

r =  [1 2 3 4 5 6 7 8 9]
grid = 
[[1 2 3]
 [4 5 6]
 [7 8 9]]
ndim: 2 , shape: (3, 3)


In [20]:
# convert 1D array to 2D (3x1)
print('reshape a 1x3 array as 3x1:')
x = np.array([1, 2, 3])
print('x ndim:', x.ndim, ', shape:', x.shape)

y = x.reshape(3, 1)
print('y ndim:', y.ndim, ', shape:', y.shape)
print(y)

reshape a 1x3 array as 3x1:
x ndim: 1 , shape: (3,)
y ndim: 2 , shape: (3, 1)
[[1]
 [2]
 [3]]


# NumPy Arrays vs. Matrices
- You will see references to matrices in NumPy operations
    - Note the distinction between an array and a matrix:
        - **array**
          A homogeneous n-dimensional container (ndarray) of numerical elements
        - **matrix**
          A 2-dimensional ndarray that preserves its two-dimensional nature throughout operations. It has certain special operations, such as * (matrix multiplication) and ** (matrix power)