# Numpy

**Prepared By:**
- Ashish Sharma
- Email: accssharma@gmail.com
- AI Developers, Boise
- AI Saturdays - Week 2

## References:
- [1] https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

## What is Numpy?

- Fundamental, highly performant and efficient python library for scientific computing (written in C and Python).
- provides support for large, multidimensional array object,
- provides fast operations on those arrays,
    - including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

#### There are several important differences between NumPy arrays and the standard Python sequences (eg. list):

- NumPy arrays have a fixed size at creation; Python lists can grow dynamically.
    - Changing the size of an ndarray will create a new array and delete the original.

- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. 
    - The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. 
    - typically, such operations are executed **more efficiently and with less code** than is possible using Python’s built-in sequences.
    
`One such example is ATLAS: "Automatically Tuned Linear Algebra Software is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and Fortran77."`
    
`Question:` What's the real difference between NumPy and Python based list?

`Answer:` **Performance**

- Numpy data structures take up less space.
- Faster in computations
- Leverage low level optimized functions for operations like linear algebra
    
`A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays - including Tensorflow, Pandas, PyTorch, etc.; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.`


**Just so you know:** Tensors (as used in Tensorflow, Pytorch, etc) are similar to NumPy's ndarrays, with the addition being that **Tensors can also be used on a GPU to accelerate computing.**`

`At the core of the NumPy package, is the **ndarray** object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations **being performed in compiled code for performance**. `

## ndarray object
- an array object that represents arrays in numpy which are:
    - High Performance MULTI-DIMENSIONAL array object
    - HOMOGENEOUS
    - FIXED-SIZE items
    - indexed by a tuple of nonnegative integers
    - number of of dimensions is the rank of the array
    - the shape of the array is a tuple of integers giving the size of the array along each dimension


List:
- Dynamically RESIZABLE)
- NOT necessarily HOMOGENEOUS 
- Ineffecient computation on large, multidimensional numbers (for loops, innerloop overheads, etc.)


## Tutorial
- [Using the contents from here - cs231n tutorial -Stanford ](http://cs231n.github.io/python-numpy-tutorial/#numpy)

Installation:

`source activate your_env`

`conda install numpy`

In [None]:
import numpy as np


## Numpy provides several functions to create arrays

(Recommended ways:)

- **Conversion from other Python structures (e.g., lists, tuples)**
- **Intrinsic numpy array array creation objects (e.g., arange, ones, zeros, etc.)**
- Reading arrays from disk, either from standard or custom formats
- Creating arrays from raw bytes through the use of strings or buffers
- **Use of special library functions (e.g., random)**

[Detail here...](https://docs.scipy.org/doc/numpy/user/basics.creation.html#arrays-creation)


In [None]:
aa = [1,2,3]

In [None]:
# Create numpy array from Python sequence like types (list, tuple, etc.)
python_list = [1,2,3]

# Create a rank 1 numpy array
a = np.array(python_list)
a

In [None]:
type(a)

In [None]:
a.shape

In [None]:
b = a.reshape((1,3))
b

In [None]:
b.shape

In [None]:
a[0], a[1], a[2]

In [None]:
a[0]= 5

In [None]:
a

In [None]:
python_2d_list = [[1,2,3],[4,5,6]]
rand2_np_array = np.array(python_2d_list)
rand2_np_array

In [None]:
rand2_np_array[0,1], rand2_np_array[1,1]

In [None]:
# IndexError
rand2_np_array[2,1]

In [None]:
a = np.zeros((2,2), dtype=np.int64)   # Create an array of all zeros
a

In [None]:
a.shape

In [None]:
a.size

In [None]:
b_ones = np.ones((1,2))    # Create an array of all ones
b_ones

In [None]:
b_ones.shape

In [None]:
c_constant = np.full((2,2), 7)  # Create a constant array
c_constant

In [None]:
# create identity matrix
d_identity = np.eye(2)
d_identity

In [None]:
# create an array filled with random value
e_random = np.random.random((2,2))
e_random

## Array indexing

Numpy offers several ways to index into arrays.

**Slicing:** 

- Similar to Python lists, numpy arrays can be sliced. 
- Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [None]:
# Create the following rank 2 array with shape (3, 4)
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
a

In [None]:
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]

# starts from 0,starting point is inclusive, ending index is exclusive

b = a[0:2, 1:3] 
b

In [None]:
# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
a[0,1]

In [None]:
b[0,0] = 77
b

In [None]:
a[0,1]

**Integer array indexing**

`When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array.`

In contrast,

`integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:`

In [None]:
x = [1,2,3,4]
x[[1,2]]

In [None]:
a = np.array([[1,2], [3, 4], [5, 6]])
a

In [None]:
# An example of integer array indexing.
# The returned array will have shape (3,) and
a[[0,1,2], [0,1,0]]

In [None]:
# above array indexing is equivalent to the following individual accesses
a[0,0], a[1,1], a[2,0]

In [None]:
# When using integer array indexing, you can reuse the same
# element from the source array:
a[[0, 0, 0, 0], [1, 1, 1, 1]]  # Prints "[2 2]"

In [None]:
# Equivalent to the previous integer array indexing example
np.array([a[0, 1], a[0, 1]])  # Prints "[2 2]"

**One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:**

In [None]:
# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
a

In [None]:
# create an array of indices
# i.e. for each row (which happen to be 4 of them in the above cell)
# can be seen as one index for each row
b = np.array([0,2,0,1])
b

In [None]:
# select all the row indeces (0,1,2,3)
c = np.arange(4)
c

In [None]:
# using b as expected pointers towards columns direction, 
# and c as all rows, get the elements
a[c, b]

In [None]:
# mutate one element from each row of using the indices in b
a[c, b] += 1

In [None]:
a

**Boolean array indexing:**

Boolean array indexing lets you pick out arbitrary elements of an array. 

Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [None]:
a = np.array([[1,2], [3, 4], [5, 6]])
a

In [None]:
# get the matrix mapping for meeting some conditions
bool_idx = (a > 2)
bool_idx

In [None]:
a[bool_idx]

In [None]:
a[a>2]

## Array Math

In [None]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])


In [None]:
x

In [None]:
y

In [None]:
v

In [None]:
w

In [None]:
x.shape

In [None]:
v.shape

In [None]:
# element wise addition
x+y

In [None]:
# element wise multiplication
x*y

In [None]:
# element wise division
x/y

In [None]:
# element wise square root
np.sqrt(x)

In [None]:
# Inner product
# matrix and vector
#x.dot(v)

In [None]:
# matrix and matrix
x.dot(y)

In [None]:
# equivalent to previous operation
np.dot(x, y)

## Array Computations


In [None]:
x = np.array([[1,2],[3,4]])
x

In [None]:
np.sum(x)  # Compute sum of all elements; prints "10"

In [None]:
np.sum(x, axis=0)  # Compute sum of each column; prints "[4 6]"

In [None]:
np.sum(x, axis=1)  # Compute sum of each row; prints "[3 7]"

## Broadcasting

(incomplete.. continue from cs231n course)

In [None]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

# Now y is the following
# [[ 2  2  4]
#  [ 5  5  7]
#  [ 8  8 10]
#  [11 11 13]]
print(y)