# Introduction to NumPy

NumPy (Numerical Python) is a Python library for scientific computing for n-dimensional array computations. The library contains sophisicated functions for high-performance computing and easy integration to other programming languages such as C/C++ and Fortran. If you have the Anaconda package installed in  your computer, then you should have NumPy already. The convention for importing NumPy goes as follows:

```python
import numpy as np
```

A good cheat sheet for NumPy basics can be found [HERE!](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf). NumPy reference documentation can be found [HERE!](https://docs.scipy.org/doc/numpy-1.13.0/reference/). The following sections introduce the NumPy functions you must know.

## Creating NumPy Arrays

With NumPy, you can create n-dimensional arrays called Numpy arrays. We can initialize Numpy arrays using the array() function and passing lists or tuples as arguments (I personally prefer to use list notation): 


In [1]:
import numpy as np
a = np.array([1,2,3])   # 1d-array

b = np.array([[1,2,3],  # 2d-array
              [4,5,6]])

c = np.array([[[1,2,3],[4,5,6]], # 3d-array
              [[7,8,9], [10,11,12]]])

print("1d-array:\n", a, "\n")
print("2d-array:\n", b, "\n")
print("3d-array:\n", c, "\n")

1d-array:
 [1 2 3] 

2d-array:
 [[1 2 3]
 [4 5 6]] 

3d-array:
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]] 



## Creating Placeholder Arrays

Sometimes you do not know what the contents of an n-dim array will be, but you will want to create a NumPy array that is able to hold those values in the future. Or maybe you will want to create an n-dim array that will be used for some computation, e.g. an identity matrix or a matrix full of ones. Here are some NumPy functions to create these.  

In [2]:
## To create an array of zeros
zeros = np.zeros((3,3,3))       # args: dimensions
print("Shape:", zeros.shape)
print(zeros)

Shape: (3, 3, 3)
[[[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]]


In [3]:
# To create an array of ones
ones = np.ones((3,2))          # args: dimensions
print("Shape:", ones.shape)
print(ones)

Shape: (3, 2)
[[1. 1.]
 [1. 1.]
 [1. 1.]]


In [4]:
# creating an identity matrix
i = np.eye(3,3)
print("Shpae:", i.shape)
print(i)

Shpae: (3, 3)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [5]:
# create an array of filled with the same constant (constant array)
c_arr = np.full((2,3), 7) # args: dimensions, constant
print("Shape:", c_arr.shape)
print(c_arr)

Shape: (2, 3)
[[7 7 7]
 [7 7 7]]


In [6]:
# create an array of random decimal values from interval [0.0,1.0)
rnd = np.random.random((3,3)) # args: shape
print("Shape:", rnd.shape)
print(rnd)

Shape: (3, 3)
[[0.71482023 0.00813743 0.88184308]
 [0.91979624 0.44551849 0.05127343]
 [0.11567891 0.30076669 0.95039098]]


In [7]:
# create an array of evenly-spaced values (step value!)
by_steps = np.arange(10,100,5) # args: start, end, steps
print("Shape:", by_steps.shape)
print(by_steps)

Shape: (18,)
[10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95]


In [8]:
# create array of evenly-spaced values (number of samples!)
by_samples = np.linspace(0,10,100) # args: start, end, number of samples
print("Shape:", by_samples.shape)
print(by_samples)

Shape: (100,)
[ 0.          0.1010101   0.2020202   0.3030303   0.4040404   0.50505051
  0.60606061  0.70707071  0.80808081  0.90909091  1.01010101  1.11111111
  1.21212121  1.31313131  1.41414141  1.51515152  1.61616162  1.71717172
  1.81818182  1.91919192  2.02020202  2.12121212  2.22222222  2.32323232
  2.42424242  2.52525253  2.62626263  2.72727273  2.82828283  2.92929293
  3.03030303  3.13131313  3.23232323  3.33333333  3.43434343  3.53535354
  3.63636364  3.73737374  3.83838384  3.93939394  4.04040404  4.14141414
  4.24242424  4.34343434  4.44444444  4.54545455  4.64646465  4.74747475
  4.84848485  4.94949495  5.05050505  5.15151515  5.25252525  5.35353535
  5.45454545  5.55555556  5.65656566  5.75757576  5.85858586  5.95959596
  6.06060606  6.16161616  6.26262626  6.36363636  6.46464646  6.56565657
  6.66666667  6.76767677  6.86868687  6.96969697  7.07070707  7.17171717
  7.27272727  7.37373737  7.47474747  7.57575758  7.67676768  7.77777778
  7.87878788  7.97979798  8.08080808 

Other important NumPy functions available for sampling random data, random permutations, and random distributions can be seen here: [Random sampling (numpy.random)](https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html)


## Inspecting NumPy Arrays

The next important thing you want to be able to do when working with NumPy arrays is to be able to inspect them. Primarily, we are interested in certain properties of n-dim arrays: 

* shape 
* length
* dimensions
* number of elements
* data type of elements


In [9]:
# create an array
arr = np.array([[[1,2],[3,4],[5,6]],[[7,8],[9,10],[11,12]]])
print("Array:\n", arr)
print("\nShape:", arr.shape)                            # shape
print("Length:", len(arr))                              # length
print("Dimensions:", arr.ndim)                          # dimension
print("Size:", arr.size)                                # size
print("Data type of array elems:", arr.dtype)           # data type
# Convert from int64 to int32
arr = arr.astype(np.int32)                              # data type conversion
print("New data type of array elems:", arr.dtype)


Array:
 [[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]

Shape: (2, 3, 2)
Length: 2
Dimensions: 3
Size: 12
Data type of array elems: int64
New data type of array elems: int32


## Mathematics: working with NumPy Arrays

We now get into the computational aspect of working with n-dimensional arrays. NumPy offers many easy to use functions for arithmetic and comparison operations. 

### Arithmatic Operations
Let us initialize some NumPy arrays

In [10]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[10,20,30],[40,50,60],[70,80,90]])

We can do arithmetic operations with NumPy arrays the same way we would do it with integer or decimal scalars in standard Python syntax.

In [11]:
# addition
print(a + b, "\n")

[[11 22 33]
 [44 55 66]
 [77 88 99]] 



In [12]:
# subtraction
print(b - a, "\n")

[[ 9 18 27]
 [36 45 54]
 [63 72 81]] 



In [13]:
# multiplication
print(a * b, "\n")

[[ 10  40  90]
 [160 250 360]
 [490 640 810]] 



In [14]:
# division
print(b / a, "\n")

[[10. 10. 10.]
 [10. 10. 10.]
 [10. 10. 10.]] 



In [15]:
# exponents
print(a**2, "\n")

[[ 1  4  9]
 [16 25 36]
 [49 64 81]] 



NumPy also has functions for these operations:
* np.add(a,b)
* np.subtract(b,a)
* np.multiply(a,b)
* np.division(b,a)

Computationally, these functions perform the same as the operators (+,-,*,/) but are able to receive extra arguments. You can check the documentation [HERE](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html)

Other mathematical functions of interest:

In [16]:
print("exp\n",np.exp(a), "\n")           # element-wise exponentiation
print("sqrt\n",np.sqrt(a), "\n")         # element-wise square root
print("sine\n",np.sin(a), "\n")          # element-wise sine (np.arcsin(), np.cos(), np.tan(), etc)
print("natlog\n",np.log(a), "\n")        # element-wise natural log

exp
 [[2.71828183e+00 7.38905610e+00 2.00855369e+01]
 [5.45981500e+01 1.48413159e+02 4.03428793e+02]
 [1.09663316e+03 2.98095799e+03 8.10308393e+03]] 

sqrt
 [[1.         1.41421356 1.73205081]
 [2.         2.23606798 2.44948974]
 [2.64575131 2.82842712 3.        ]] 

sine
 [[ 0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155 ]
 [ 0.6569866   0.98935825  0.41211849]] 

natlog
 [[0.         0.69314718 1.09861229]
 [1.38629436 1.60943791 1.79175947]
 [1.94591015 2.07944154 2.19722458]] 



In [17]:
# some linear algebra functions: inverse, dot, and cross product

A = np.array([[4,7],[2,6]])         # 2 by 2 matrix
A_inv = np.linalg.inv(A)            # inverse matrix (system must be linearly independent)

print("Inverse of A: \n ", A_inv, "\n")

I = np.dot(A, A_inv)                # dot product. Rule for dot product: (n x m) * (m x n)
print(I)


Inverse of A: 
  [[ 0.6 -0.7]
 [-0.2  0.4]] 

[[ 1.00000000e+00  0.00000000e+00]
 [-2.22044605e-16  1.00000000e+00]]


In [18]:
# for dot product of 2D or nD arrays, np.matmul() is typically preferred
print(np.matmul(A, A_inv))

[[ 1.00000000e+00  0.00000000e+00]
 [-2.22044605e-16  1.00000000e+00]]


In [19]:
# cross product example
x = [1,2,3]
y = [4,5,6]
print(np.cross(x,y))

[-3  6 -3]


### Comparisons Operations

Like in standard python, we can use the comparison operators (<,<=,<>,=>,>,==,!=) to do element-wise comparison between the elements of two n-dimensional arrays of the same dimensions. 

In [20]:
# create arrays
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,8,7],[6,5,4],[3,2,1]])

print("a:\n", a)
print("b:\n", b)

a:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
b:
 [[9 8 7]
 [6 5 4]
 [3 2 1]]


In [21]:
# element-wise comparison
print("a == b: \n", a==b, "\n")
print("a == a.T: \n", a==a.T, "\n")  # a.T is the transpose of a 
print("a < b: \n", a < b, "\n")

a == b: 
 [[False False False]
 [False  True False]
 [False False False]] 

a == a.T: 
 [[ True False False]
 [False  True False]
 [False False  True]] 

a < b: 
 [[ True  True  True]
 [ True False False]
 [False False False]] 



In [22]:
# array-wise comparison
print("Are 'a' and 'b' equal? -> ", np.array_equal(a, b))
print("Are 'a' and 'a' equal? -> ", np.array_equal(a, a))

Are 'a' and 'b' equal? ->  False
Are 'a' and 'a' equal? ->  True


### Aggregate Functions

An aggregate function is a function where the values of multiple rows or columns are grouped together to form a single "summary" value. It is very useful to use these kind of functions to explore datasets. Here are some common aggregate functions and there corresponding NumPy functions:

* sum -> np.sum()
* mean -> np.mean()
* standard deviation -> np.std()
* count -> np.count_nonzero()
* maximum -> np.max()
* minimum -> np.min()


In [23]:
# initialize a 2D array of integers
data = np.array([[1,2,3],[4, 5, 6],[7, 8, 9]])
print(data)
# sum of all values in array
print("\nSum of all: \n", np.sum(data))

[[1 2 3]
 [4 5 6]
 [7 8 9]]

Sum of all: 
 45


Notice that by passing only the array as an argument the function outputs the result from iterating through all the elements in the array. However, you can also describe the dimension by which you want the aggregated function to operate (e.g. return the sum of each columns or of each rows). 

To allow this, each function can receive an "axis" argument that indicates which dimension you want the results to be grouped by. This axis value is an integer starting from zero to the number of dimensions of the array. 

In NumPy, the axis numbering is as follows for up to three dimensions:

![image.png](img/numpy_axis_def.png)

In [24]:
# sum by columns
print("sum by col: \n", np.sum(data, axis=0))
# sum by rowa
print("sum by row: \n", np.sum(data, axis=1))

sum by col: 
 [12 15 18]
sum by row: 
 [ 6 15 24]


The same "axis" argument can be passed to the other aggregate functions listed above

## Subsetting, Slicing, and Indexing

### Subsetting

In [25]:
# NOTE: Remember that indexing begins at zero and ends at n-1
# 1D arrays
a = np.array([1,2,3])
print(a[0])
print(a[1])
print(a[2])

1
2
3


In [26]:
# 2D arrays
b = np.array([[1,2,3],[4,5,6]])
print(b)
print(b[1,1])
print(b[1,2])

[[1 2 3]
 [4 5 6]]
5
6


### Slicing
Slicing refers to subsetting more than one element from an n-dim array. Slicing NumPy arrays is the same as slicing standard lists objects. It can be a little confusing, though, since we deal with indexes that start at zero and half-closed intervals of the form `[start, finish)`. For example, if we state that we want the elements in a dimension with interval `[0:3]`, we are saying we want the elements in that interval from position `0` to `2` (3 minus 1). 

In [27]:
# 1D array
print(a)
print(a[0:2])

[1 2 3]
[1 2]


In [28]:
# 2D array
print(b, "\n")
print(b[0:2,1:3], "\n")
print(b[:1])

[[1 2 3]
 [4 5 6]] 

[[2 3]
 [5 6]] 

[[1 2 3]]


### Boolean Indexing

You can also index all the elements from an array that satisfy a conditional statement

In [1]:
# boolean indexing
print(a[a<=3])

NameError: name 'a' is not defined

# Important Linear Algebra 

## DON'T FORGET TO CHECK OUT THIS [NUMPY CHEAT SHEET](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf) (there is more content on array manipulation there).