# Introduction to NumPy

Adapted by [Nimblebox Inc.](https://www.nimblebox.ai/) from the `Data-X: Introduction to Numpy` tutorial by [Alexander Fred Ojala](https://alex.fo/) and [Ikhlaq Sidhu](https://vcresearch.berkeley.edu/faculty/ikhlaq-sidhu),  [`Python Data Science Handbook`](http://shop.oreilly.com/product/0636920034919.do) by [Jake VanderPlas](https://github.com/jakevdp/PythonDataScienceHandbook) and [`NumPy Documentation`](https://numpy.org/doc/1.17/index.html).

<img style="float:left; margin-left: 50px" src="https://user-images.githubusercontent.com/50221806/86498175-86c40400-bd39-11ea-90de-1315a043fd45.png" alt="Numpy Logo" width="300" height="400">

<img style="float:right; margin-right: 50px" src="https://media-exp1.licdn.com/dms/image/C4E1BAQH3ErUUfLXoHQ/company-background_10000/0?e=2159024400&v=beta&t=9Z2hcX4LqsxlDd2BAAW8xDc-Obfvk_rziT1AkPKBcCc" alt="Nimblebox Logo" width="500" height="600">

## Introduction:  

NumPy stands for **Numerical Python** and it is the fundamental package for scientific computing in Python. It is a package that lets you efficiently store and manipulate numerical arrays. It contains among other things:

* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities


In this tutorial, we will cover:

* **Basics**: Different ways to create NumPy Arrays and Basics of NumPy
* **Computation**: Computations on NumPy arrays using Universal Functions and other NumPy Routines
* **Aggregations**: Various function used to aggregate for NumPy arrays

### NumPy contains an array object that is "fast"


<img src="https://raw.githubusercontent.com/scetx/datax/master/imgsource/threefundamental.png">


**It stores / consists of**:
* location of a memory block (allocated all at one time)
* a shape (3 x 3 or 1 x 9, etc)
* data type / size of each element

The core feauture that NumPy supports is its multi-dimensional arrays. In NumPy, dimensions are called axes and the number of axes is called a rank.

### NumPy Array Anotomy
<img src= "https://raw.githubusercontent.com/scetx/datax/master/imgsource/anatomyarray.png">


We'll start with the standard NumPy import, under the alias `np`

In [None]:
import numpy as np

In [None]:
print(np.__version__)

### Basics of NumPy Array

#### 1. Creating a NumPy Array

##### From Python List

We use `np.array` to create a numpy array object from python list.

In [None]:
# Create array from Python list
list1 = [1, 2, 3, 4]
data = np.array(list1)
print(data)

In [None]:
# Find out object type
print(type(list1))
print(type(data))

Python being a dynamically typed language, Python lists can contain elements with hetarogenous data-types. But NumPy is constrained to array with which have homogenous data-types. If the data types are not homogenous, NumPy will upcast (if possible) to the most logical data type

In [None]:
# NumPy converts to most logical data type
data1 = np.array([1.2, 2, 3, 4])
print(data1)
print(data1.dtype) # all values will be converted to floats

In [None]:
# Here if we store a float in an int array, the float will be up-casted to an int
list2 = [1, 2, 3, 4]
data2 = np.array(list2)
data2[0] = 3.14159
print(data2)

In [None]:
# We can manually specify the datatype
data3 = np.array([1, 2, 3], dtype="str")
print(data3)
print(data3.dtype)

In order to perform any mathematical operations on NumPy arrays, all the elements must be of a type that is valid to perform these mathematical operations.

In [None]:
#This will give you a TypeError

a = np.random.normal(0,1,1000)
b = np.arange(1000, dtype=np.int8)
c = np.arange(1000, dtype=np.int16)
c += a + b
print(c)


In [None]:
# error is resolved by just changing the dtype of 'a' manually
a = np.random.normal(0,1,10)
a = a.astype(np.int16)
b = np.arange(10, dtype=np.int16)
c = np.arange(10, dtype=np.int16)
c += a + b
print(c)

Unlike python list, we can create multi-dimensional arrays using NumPy.

In [None]:
# nested lists result in multi-dimensional arrays
x1 = np.array([range(i, i + 3) for i in [2, 4, 6]])
print(x1)

For more information and other NumPy operations based on Python list, refer to the [NumPy documentation](http://numpy.org/).

##### Using NumPy routines

When dealing with very large array, it is more efficient to create arrays from scratch using routines built into NumPy. Here are several examples:

In [None]:
# Create a length-10 integer array filled with zeros
print(np.zeros(10, dtype=int))

In [None]:
# Create a 3x5 floating-point array filled with ones
print(np.ones((3, 5), dtype=float))

In [None]:
# Create a 3x5 array filled with 3.14
print(np.full((3, 5), 3.14))

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
print(np.arange(0, 20, 2))

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
print(np.random.random((3, 3)))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
print(np.random.normal(0, 1, (3, 3)))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
print(np.random.randint(0, 10, (3, 3)))

In [None]:
# Returns the identity matrix of specific squared size
print(np.eye(5))

You can always explore the [documentation](http://numpy.org/) for more.

#### 2. Basics of NumPy Array

##### Attributes of NumPy Array

Each NumPy array has the following attributes,

In [None]:
x3 = np.random.randint(10, size=(3, 4, 5))  # Create a 3-D array

print("x3 ndim: ", x3.ndim) # np.ndim yields the number of dimensions 
print("x3 shape:", x3.shape) # np.shape yields the size of each dimension
print("x3 size: ", x3.size) # np.size yields the total size of the array
print("dtype:", x3.dtype) # np.dtype yields the data type of the array
print("itemsize:", x3.itemsize, "bytes") # np.itemsize yields the size (in bytes) of each array element
print("nbytes:", x3.nbytes, "bytes") # np.nbytes yields the total size (in bytes) of the array

For more information, refer the [documentation](http://numpy.org/).

##### Accessing elements: Slicing and Indexing

Slicing and Indexing of NumPy Arrays is quite similar to that of Python lists

In [None]:
data = np.arange(10) # Create a 1-D array
print("Original Data:\n", data, "\n")

# Indexing
print("Indexing NumPy Array:")
print("  ", data[4]) # 4th element of the numpy array
print("  ", data[-1], "\n") # 1st element from right side of the numpy array

# Slicing: To access a slice of an array 'data', we use this `data[start:stop:step]`
print("Slicing NumPy Array:")
print("  ", data[:5]) # First 5 element of the numpy array
print("  ", data[::-1]) # All the elements of the numpy array but in reverse order


<u><i>Indexing in a multi-dimentional NumPy Array</i></u>: Multi-dimensional indices work in the same way, with multiple indices separated by commas

In [None]:
# Let's create a 3-D array
x3 = np.random.randint(10, size=(3, 4, 5))
print(x3)

In [None]:
print(x3[1]) # prints the 2nd 4x5 array in the generated 3-D array

In [None]:
print(x3[1,2]) # prints the 3rd row of the x3[1] array

In [None]:
print(x3[1,2,3]) # prints the 4th element of the x3[1,2] array

<u><i>Slicing in a multi-dimentional NumPy Array</i></u>: Multi-dimensional slices work in the same way, with multiple slices separated by commas

In [None]:
# Let's create a 3-D array
x3 = np.random.randint(10, size=(3, 4, 5))
print(x3)

In [None]:
print(x3[:2, :3, :4])  # prints a 3x4x5 array is sliced into 2x3x4 array

<u><i>Mask Indexing and Boolean Slicing</i></u>: These technique are used to filter and get quick inference about the nature of the dataset that we have

In [None]:
# Mask Indexing
numpy_array = np.random.randint(1, 11, size=(10))
print("NumPy Array:\n", numpy_array, "\n")

# Let's create a mask for the 'numpy_array' such that we can filter out the elements that are 'greater than 5'
mask = numpy_array > 5
print("Masked Array:\n", mask, "\n")

# Now let's just print the elements that follow our condition
print("Interested Array:\n", numpy_array[mask])

For further exploration, refer the [documentation](https://numpy.org/doc/1.17/user/basics.indexing.html) of NumPy

##### Python Lists and NumPy Arrays

NumPy utilizes efficient pointers to a location in memory and it will store the full array. Lists on the other hand are pointers to many different objects in memory.

<u><i>Subarray (default returns)</i></u>: Slicing returns a view for a NumPy Array, where as Python Lists returns a copy the list

In [None]:
# Let's create a NumPy Array and slice it
data_numpy = np.random.randint(10, size=(10))
print("Pre-slicing NumPy Array: ", data_numpy)
slicing_numpy = data_numpy[0:3]
print("Slice of NumPy Array: ", slicing_numpy)

# Let's create a Python List and slice it
import random
data_list = random.sample(range(0, 10), 10)
print("\nPre-slicing Python List: ", data_list)
slicing_list = data_list[0:3]
print("Slice of Python List: ", slicing_list)

In [None]:
# Let's change the first element of both array and list
slicing_numpy[0] = -1
print("Slice of NumPy Array: ", slicing_numpy)
slicing_list[0] = -1
print("Slice of Python List: ", slicing_list)

In [None]:
print("Post-slicing NumPy array: ", data_numpy) # has changed
print("Post-slicing Python list: ", data_list) # has not changed

<u><i>Subarray (custom)</i></u>: Slicing of NumPy Array should create a copy of the array just like Python Lists

In [None]:
# Creating copies of the array instead of views
data_numpy = np.random.randint(10, size=(10))
print("Pre-slicing NumPy Array: ", data_numpy)
slicing_numpy_copy = data_numpy[0:3].copy()
print("Slice of NumPy Array: ", slicing_numpy_copy)

In [None]:
# Let's chage the first element of our numpy array and observe
slicing_numpy_copy[0] = -1
print("Post-slicing NumPy Array: ", slicing_numpy_copy)
print("Pre-slicing NumPy Array: ", data_numpy) # now it is not a view any more but we created a copy of data_numpy

### Computation 

#### 1. Universal Function
A universal function (or ufunc) that is applied on an `ndarray` in an element-by-element fashion. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs.

In [None]:
# Let's define two NumPy Arrays
x = np.random.randint(1, 11, size=(10))
y = np.random.randint(1, 11, size=(10))
print ("Array 'x' = ", x)
print ("Array 'y' = ", y)

In [None]:
# Let's perform some arithmetic on these arrays
print(x + y)
print(x - y)
print(x * y)
print(x / y)
print(x // y)  # floor division
print(x % y)

Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy, for example, the `+` operator is a wrapper for the `add` function

In [None]:
print(np.add(x, y))
print(np.subtract(x, y))
print(np.multiply(x, y))
print(np.mod(x, y))

The following table lists some of the `ufunc` implemented in NumPy:


| Universal Functions	  | Operator (if any)  | Description                                                    |
|:-----------------------:|:------------------:|:--------------------------------------------------------------:|
|``np.add``               | ``+``              |Addition (e.g., ``[10  6  8] + [3 10  6] = [13 16 14]``)        |
|``np.subtract``          | ``-``              |Subtraction (e.g., ``[10  6  8] - [3 10  6] = [ 7 -4  2]``)     |
|``np.negative``          | ``-``              |Unary negation (e.g., ``[-10  -6  -8]``)                        |
|``np.multiply``          | ``*``              |Multiplication (e.g., ``[10  6  8] * [3 10  6] = [30 60 48]``)  |
|``np.divide``            | ``/``              |Division (e.g., ``[10  6  8] / [3 10  6] = [3.33 0.6 1.33]``)   |
|``np.floor_divide``      | ``//``             |Floor division (e.g., ``[10  6  8] // [3 10  6] = [3 0 1]``)    |
|``np.mod``               | ``%``              |Modulus/remainder (e.g., ``[10  6  8] % [3 10  6] = [1 6 2]``)  |
|``np.log``               |                    |Natural logarithm, element-wise                                 |
|``np.log2``              |                    |Base-2 logarithm of x                                           |


More information on universal functions (including the full list of available functions) can found in the NumPy [documentation](https://numpy.org/doc/1.17/reference/ufuncs.html).

#### 2. NumPy Routines

NumPy being a the scientific computing package, it has several in-build routines/functions to aid mathematical and scientific computing. Some of the common routines used in Machine Learning are discussed below.

In [None]:
# NumPy allows use to concatenate or append different NumPy Arrays
a = np.random.randint(1, 11, size=(3, 3, 2))
b = np.random.randint(1, 11, size=(3, 3, 3))
c = np.ones((1, 3, 2), dtype="int32")
d = np.ones((3, 1, 2), dtype="int32")

print("'a':\n", a, "\n")
print("'b':\n", b, "\n")
print("'c':\n", c, "\n")
print("'d':\n", d, "\n")

# Let's concatenate 'a' and 'b' together alond axis=2
print("Concatenate:\n", np.concatenate((a, b), axis=2), "\n")

# Let's append 'c' to 'a' vertically
print("Vertical Append:\n", np.vstack((a, c)), "\n") # try appending 'd' to 'a' vertically

# Let's append 'd' to 'a' horizontally
print("Horizontal Append:\n", np.hstack((a, d))) # try appending 'c' to 'a' horizontally

In [None]:
# Let's create a random NumPy Array
numpy_array = np.random.randint(1, 11, size=(9))
print("Original Array Shape: ", numpy_array.shape)
print("Original Array: ", numpy_array, "\n")

# Using np.reshape() routine to reshape an array
numpy_array = numpy_array.reshape(3,3)
print("New Array Shape: ", numpy_array.shape)
print("New Array:\n", numpy_array)

In [None]:
# We can also flatten matrices using ravel()
numpy_array = np.random.randint(1, 11, size=(24))
numpy_array = numpy_array.reshape(4,6)
print("Original Array Shape: ", numpy_array.shape)
print("Original Array:\n", numpy_array, "\n")

# Flattening an unflattened array
numpy_array = numpy_array.ravel()
print("Flattened Array Shape: ", numpy_array.shape)
print ("Flattened Array:\n", numpy_array)

In [None]:
# Other useful routines for data analysis using NumPy
numpy_array = np.random.randint(1, 11, size=(3, 4))

print(numpy_array, "\n")
print ("Sum of all Elements:", numpy_array.sum())
print("Smallest Element:", numpy_array.min())
print("Highest Element:", numpy_array.max())
print("Cumulative Sum of Elements:", numpy_array.cumsum())
print ("Column-wise Sum:", numpy_array.sum(axis=0))
print ("Row-wise Sum:",numpy_array.sum(axis=1))

You can do matrix multiplication and matrix manipulations

In [None]:
# Dot products of two "arrays"
a = np.random.randint(1, 11, size=(3, 3))
b = np.random.randint(1, 11, size=(3, 3))

print("'a':\n", a, "\n")
print("'b':\n", b, "\n")

print("Dot Product of 'a' and 'b' (arrays):\n", np.dot(a, b))

In [None]:
# Matrix product of two "arrays"
a = np.random.randint(1, 11, size=(3, 4))
b = np.random.randint(1, 11, size=(4, 2))

print("'a':\n", a, "\n")
print("'b':\n", b, "\n")

print("Matrix Product of 'a' and 'b' (arrays):\n", np.matmul(a, b))

In [None]:
# Taking the transpose of an array Matrix
a = np.random.randint(1, 11, size=(3, 4))
print("'a':\n", a, "\n")

# You can take transpose in two ways
print("'a' Transpose (using 'array.T'):\n", a.T, "\n")
print("'a' Transpose (using 'np.transpose()''):\n", np.transpose(a), "\n")

There so many more routines available in this package. To explore all the NumPy routines, refer the [documentation](https://numpy.org/doc/1.17/reference/routines.html).

In [None]:
print("Thank you for Joining !")