## **Numpy Recap**
#### Numpy is short for Numerical Python. It is the fundamental package required for high performance scientific computing and data analysis. 

Some uses :

- `ndarray`, a fast and space-efficient multidimensional array for large data
- providing vectorized arithmetic operations and sophisticated broadcasting capabilities.
- Standard mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random generation, and Fourier transform capabilities.


## Importing package

In [None]:
import numpy as np

### The NumPy ndarray : A Multidimensional object

A numpy array is a grid of values, all of the `same type`, and is indexed by a tuple of nonnegative integers. 
Every object has 
- a shape = Shape is a tuple giving size of each dimension
- and a dtype = data type. 

Array() tries to infer a good type, if not given explicitly.

Each array has attributes 
- ``ndim``: the number of dimensions
- ``shape``: the size of each dimension
- ``size``: the total size of the array
- ``dtype``: data type of each element

In [None]:
x = np.array([2,4,8,16])

In [None]:
x = np.append(x,10)
print(x)
x

In [None]:
a4 = np.append(x,[11,12])


In [None]:
a4 = np.delete(a4,2) # 2 is index not an item

In [None]:
a4.reshape(1,6)

In [None]:
a4

In [None]:
a4 = np.sort(a4)

In [None]:
print(a4)

In [None]:
a4.reshape(1,2)

In [None]:
a4.resize(2,3)

In [None]:
a4

### Matrix multiplication

In [None]:
s = np.array([1,2])
t = np.array([[10,20,30,40],[100,200,300,400]])
print(np.matmul(s,t))

print(s * t)

# t.T * s

In [None]:
a = np.arange(1,19,2)
a

In [None]:
a = a.reshape([3,3])

In [None]:
a

In [None]:
np.append(a,[14,15,16,14])

In [None]:
a

In [None]:
np.append(a,[14,15,16,14,16]).reshape(7,2)

In [None]:
# Define a numpy array of 100 integers
my_arr = np.arange(100)

In [None]:
my_arr


Let's square each number in the sequence

In [None]:
%time my_arr2 = my_arr ** 2

In [None]:
my_arr2

Wall-clock time is the time that a clock on the wall (or a stopwatch in hand) would measure as having elapsed between the start of the process and 'now'.

The user-cpu time and system-cpu time are - the amount of time spent in user code and the amount of time spent in kernel code.

In [None]:
import sys
print(sys.getsizeof(my_arr))

**NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
pure Python counterparts and use significantly less memory.**

## The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray,
which is a fast, flexible container for large datasets in Python. Arrays enable us to
perform mathematical operations on whole blocks of data using similar syntax to the
equivalent operations between scalar elements.

### Creating Arrays from Python Lists

In [None]:
# integer array:
np.array([1, 4.3, 2, 5, 3])

Remember that unlike Python lists, NumPy is constrained to arrays that all contain
the same type. If types do not match, NumPy will upcast if possible (here, integers are
upcast to floating point):

In [None]:
np.array([3.14, 4, 2, 3])

If we want to explicitly set the data type of the resulting array, we can use the dtype
keyword:

In [None]:
np.array([1, 2, 3, 4], dtype='float64')

Finally, unlike Python lists, NumPy arrays can explicitly be multidimensional; here’s
one way of initializing a multidimensional array using a list of lists:

In [None]:
list(range(1,4))

In [None]:
# nested lists result in multidimensional arrays
array_y = np.array([list(range(i, i + 3)) for i in [1, 3, 5]])
array_y

The inner lists are treated as rows of the resulting two-dimensional array.

### Creating Arrays from Scratch
Especially for larger arrays, it is more efficient to create arrays from scratch using routines
built into NumPy. Here are several examples:

In [None]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

In [None]:
# Create a 3x5 floating-point array filled with 1s
np.ones((3, 5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 2)

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (30, ))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 100)
np.random.randint(0, 100, (3, 3))

In [None]:
# Create a 3x3 identity matrix
np.eye(3)

In [None]:
a

In [None]:
np.ones_like(a)

**<center>Array creation functions</center>**
- ``array``: Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype
or explicitly specifying a dtype; copies the input data by default
- ``asarray``: Convert input to ndarray, but do not copy if the input is already an ndarray
- ``arange``: Like the built-in range but returns an ndarray instead of a list
- ``ones``: Produce an array of all 1s with the given shape and dtype; 
- ``ones_like``: Takes another array and produces a ones array of the same shape and dtype
- ``zeros and zeros_like``: Like ones and ones_like but producing arrays of 0s instead
- ``empty and empty_like``: Create new arrays by allocating new memory, but do not populate with any values like ones and
zeros
- ``full``: Produce an array of the given shape and dtype with all values set to the indicated “fill value”
- ``full_like``: Takes another array and produces a filled array of the same shape and dtype
- ``eye``: Identity Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere)


## Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this vectorization. Any arithmetic
operations between equal-size arrays applies the operation element-wise:

In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

In [None]:
arr * arr

In [None]:
arr - arr

Arithmetic operations with scalars propagate the scalar argument to each element in
the array:

In [None]:
1/arr

In [None]:
arr ** 2

Comparisons between arrays of the same size yield boolean arrays:

In [None]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

In [None]:
arr

In [None]:
arr2 > arr

## NumPy Array Attributes

Use NumPy’s random number generator, which we will seed with a set value in order to
ensure that the same random arrays are generated each time this code is run:

In [None]:
np.random.seed(0) # seed for reproducibility

arr1 = np.random.randint(10, size=10) # One-dimensional array
arr2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
arr3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array

In [None]:
print(arr1)

In [None]:
print(arr2)

In [None]:
print(arr3)

In [None]:
print("arr3 ndim: ", arr3.ndim)
print("arr3 shape:", arr3.shape)
print("arr3 size: ", arr3.size)
print("arr3 size: ", arr3.dtype)

## Basic Indexing and Slicing

### Array Indexing: Accessing Single Elements

NumPy array indexing is a rich topic, as there are many ways you may want to select
a subset of your data or individual elements. One-dimensional arrays are simple; on
the surface they act similarly to Python lists:

In [None]:
arr1

In [None]:
arr1[0]

In [None]:
arr1[4]

To index from the end of the array, you can use negative indices:

In [None]:
arr1

In [None]:
arr1[-1]

In [None]:
arr1[-2]

In a multidimensional array, you access items using a comma-separated tuple of
indices:

In [None]:
arr2

In [None]:
arr2[0, 0]

In [None]:
arr2[2, -1]

In [None]:
arr2[1, -3]

### Array Slicing: Accessing Subarrays

As we can use square brackets to access individual array elements, we can also use
them to access subarrays with the slice notation, marked by the colon (:) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of
an array x, use this:

If any of these are unspecified, they default to the values start=0, stop=size of
dimension, step=1.

#### One-dimensional subarrays

In [None]:
arr1

In [None]:
arr1[:]

In [None]:
arr1[:5] # first five elements

In [None]:
arr1

In [None]:
arr1[5:] # elements after index 5

In [None]:
arr1[5:7] # middle subarray

In [None]:
arr1[::2] # every other element

In [None]:
arr1[1::2] # every other element, starting at index 1

A potentially confusing case is when the step value is negative. In this case, the
defaults for start and stop are swapped. This becomes a convenient way to reverse
an array:

In [None]:
arr1

In [None]:
arr1[::-1]

In [None]:
arr1

In [None]:
arr1[2::-1] # reversed every other from index 5

In [None]:
arr1[2:0:-1]

#### Multidimensional subarrays
Multidimensional slices work in the same way, with multiple slices separated by commas.
For example:

In [None]:
arr2

In [None]:
arr2[:2, :3] # two rows, three columns

In [None]:
arr2[:3, ::2]  # all rows, every other column

In [None]:
arr2[-3:-1,1:2]

Subarray dimensions can even be reversed together:

In [None]:
arr2

In [None]:
arr2[::-1, ::-1]

In [None]:
arr2

#### Accessing array rows and columns

One commonly needed routine is accessing single
rows or columns of an array. You can do this by combining indexing and slicing,
using an empty slice marked by a single colon (:):

In [None]:
arr2[:, 0] # first column of arr2

In [None]:
arr2[0, :] # first row of arr2

In multidimensional arrays, if you omit later indices, the returned object will be a
lower dimensional ndarray consisting of all the data along the higher dimensions.

In the case of row access, the empty slice can be omitted for a more compact syntax:

In [None]:
arr2[2] # equivalent to arr2[2, :]

In [None]:
np.random.seed(0) # seed for reproducibility

arr1 = np.random.randint(10, size=10) # One-dimensional array
arr2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
arr3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array

#### Subarrays as no-copy views
One important—and extremely useful—thing to know about array slices is that they
return views rather than copies of the array data. This is one area in which NumPy
array slicing differs from Python list slicing: in lists, slices will be copies. Consider our
two-dimensional array from before:

In [None]:
print(arr2)

Extract a 2×2 subarray from this:

In [None]:
arr2_sub = arr2[:2, :2]
print(arr2_sub)

Modify this subarray, we’ll see that the original array is changed! Observe:

In [None]:
arr2_sub[0, 0] = 100
print(arr2_sub)
print("\n")
print(arr2)

This default behavior is actually quite useful: it means that when we work with large
datasets, we can access and process pieces of these datasets without the need to copy
the underlying data buffer.

#### Creating copies of arrays
Despite the nice features of array views, it is sometimes useful to instead explicitly
copy the data within an array or a subarray. This can be most easily done with the
copy() method:

In [None]:
arr2

In [None]:
arr2_sub_copy = arr2[:2, :2].copy()
print(arr2_sub_copy)

If we now modify this subarray, the original array is not touched:

In [None]:
arr2_sub_copy[0, 0] = 999
print(arr2_sub_copy)
print("\n")
print(arr2)

## Reshaping of Arrays

Another useful type of operation is reshaping of arrays. The most flexible way of
doing this is with the reshape() method. 

For example, if you want to put the numbers 1 through 25 in a 5×5 grid, you can do the following:

In [None]:
grid = np.arange(1, 26).reshape((5, 5))
print(grid)

Note that for this to work, the size of the initial array must match the size of the
reshaped array. Where possible, the reshape method will use a no-copy view of the
initial array, but with noncontiguous memory buffers this is not always the case.

In [None]:
grid2 = np.arange(1, 17).reshape((5, 5))
print(grid2)

Another common reshaping pattern is the conversion of a one-dimensional array
into a two-dimensional row or column matrix. 


We can do this with the reshape method, or more easily by making use of the newaxis keyword within a slice operation:

In [None]:
arr = np.array(list(range(1, 26)))
print(arr)

In [None]:
# row vector via reshape
arr.reshape((1, 25))

In [None]:
# column vector via reshape
arr.reshape((25, 1))

Reshaping with unknown dimension

In [None]:
arr

In [None]:
arr.reshape((5, -1))

In [None]:
arr = np.array(range(1, 28))
print(arr)

In [None]:
x1 = arr.reshape((3, -1))
print(x1.shape)
print(x1)

In [None]:
x2 = arr.reshape((9, -1))
print(x2.shape)
print(x2)

In [None]:
x3 = arr.reshape((3, 3, -1))
print(x3.shape)
print(x3)

In [None]:
arr2

In [None]:
arr2.reshape((-1, 2))

## Computation on NumPy Arrays: Universal Functions

**Universal Functions: Fast Element-Wise Array Functions**
    
A universal function, or ufunc, is a function that performs element-wise operations
on data in ndarrays. You can think of them as fast vectorized wrappers for simple
functions that take one or more scalar values and produce one or more scalar results.
Many ufuncs are simple element-wise transformations, like sqrt or exp:

In [None]:
arr = np.arange(20)
arr

In [None]:
np.sqrt(arr)

In [None]:
np.exp(arr)

These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays
(thus, binary ufuncs) and return a single array as the result:

In [None]:
x = np.random.randn(10)
y = np.random.randn(10)

In [None]:
x

In [None]:
y

In [None]:
x > y

In [None]:
np.maximum(x, y)

In [None]:
np.add(x,y)

In [None]:
x+y

In [None]:
x = list(x)
y = list(y)
x+y

In [None]:
x

In [None]:
y

In [None]:
np.add(x,y)

Listing of available ufuncs.

**<center>Unary ufuncs</center>**
![Unary Functions](img/numpy_unary_functions.png)

**<center>Binary universal functions</center>**
![Binary Functions 1](img/numpy_binary_ufunctions_1.png)
![Binary Functions 2](img/numpy_binary_ufunctions_2.png)

## Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about
the data along an axis are accessible as methods of the array class. You can use aggregations
(often called reductions) like sum, mean, and std (standard deviation) either by
calling the array instance method or using the top-level NumPy function.

In [None]:
arr = np.random.randn(5, 4)
arr

In [None]:
arr.mean()

In [None]:
np.mean(arr)

In [None]:
arr.sum()

Functions like mean and sum take an optional axis argument that computes the statistic
over the given axis, resulting in an array with one fewer dimension:

In [None]:
arr

In [None]:
arr.mean(axis=1) # across the columns

In [None]:
arr.sum(axis=0) # down the rows

Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0)
means “compute sum down the rows.”

### Broadcasting

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.

Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

## Vectorised operations in numpy and Broadcasting

Numpy arrays are homogeneous in nature means it is an array that contains data of a single type only. Python’s lists and tuples, which are unrestricted in the type 
of data they contain. 
The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. 
The Output and Operations will speed-up when compared to simple non-vectorized operations.



In [None]:
# Any algebraic binary operation performed on numpy arrays are vectorized
x = np.arange(0,5)
y = np.arange(5,10)
print(x)
print(y)
print(x+y)
# In this case these operations were performed element-wise, i.e vectorized.
# Let's take a look at the general for loop approach
res = []
for i in range(len(x)):
        res.append(x[i] + y[i])
np.array(res)
# It's much easier to express the operations as x + y, instead of explicitly defining the for loop to perform the element-wise sum. This is the difference between
# declarative and imperative programming


# Now let's compare this to the python lists addition operation.
l1 = list(range(5))
l2 = list(range(5, 10))
print('Python Lists!')
print('l1', l1)
print('l2', l2)
print('l1 + l2 : ', l1 + l2)

# Python lists don't operate "algebraically", the + operation is just concatenation.


In [None]:
# Another example 

A = np.array([
        [0, 0, 0],
        [1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]
])

A

In [None]:
A.shape

In [None]:
B = np.array([[1, 2, 3]])
B

In [None]:
B.shape

In [None]:
# Even though the shapes are different, we can still perform a vectorized operation between A and B:
A + B



The above is possible thanks to the following rule 

**The Broadcasting Rule** : Two arrays are compatible for broadcasting if for each *trailing dimension* (i.e starting from the end) the axis lengths match or if either of 
lengths is 1. Broadcasting is then performed over the missing or length 1 dimensions.

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. 
Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

In [None]:
A = np.arange(6).reshape(2,3)
A

In [None]:
B = np.array([5,5])
B

In [None]:
A + B

### Broadcasting with Scalars
Vectorized operations between arrays are usually reserved for more "advanced" or "scientific" usages. In your day to day work as a data analyst,
you'll be using more often broadcasting with scalars.
Broadcasting with scalars is probably the most intuitive of the vectorized operations:

In [None]:
a = np.arange(0, 6)
a

In [None]:
a + 10

As you can see, the + 10 operation was "broadcasted" (or "distributed") among all the elements of the array. This is what we usually do with a
list comprehension in pure Python:

In [None]:
[x + 10 for x in a]

But this "broadcasting" behavior is default for Numpy. Here are a few other examples:

In [None]:
a - 5

In [None]:
a * 3

In [None]:
# PYTHON 
l = list(range(5))
l

In [None]:
l + 10

More examples


In [None]:
macros = np.array([
  [0.8, 2.9, 3.9],
  [52.4, 23.6, 36.5],
  [55.2, 31.7, 23.9],
  [14.4, 11, 4.9]
   ])
result = np.zeros_like(macros,dtype=float)
cal_per_nut = np.array([3, 3, 8])
for i in range(macros.shape[0]):
    result[i] = macros[i] * cal_per_nut
    #print(macros[i] * cal_per_nut)
    
result

In [None]:
macros

In [None]:
cal_per_nut

In [None]:
macros * cal_per_nut

In [None]:
cal_per_nut.shape

In [None]:
result.shape

In [None]:
# Example 2
M = np.ones((2, 3))
a = np.arange(3)

In [None]:
M+a

In [None]:
#Example 1
a = np.arange(3).reshape((3, 1))
b = np.arange(3)

In [None]:
a

In [None]:
b

In [None]:
a+b

In [None]:
# Example 3
M = np.ones((3, 2))
a = np.arange(3)

In [None]:
M+a

In [None]:
# Example 4
M = np.ones((3,2,2))
a = np.arange(3)
M+a