# Intro to NumPy
NumPy is the most fundamental library to scientific computing in Python. It forms the basis for most of the important data science libraries like pandas and scikit-learn.

The main data structure that NumPy provides is the n-dimensional array object or **`ndarray`**. ndarray objects may be any number of dimensions. Typically in data science we are dealing with two dimensional tabular data of rows and columns so here we will begin by creating an array of random values from a normal distribution and do some basic analysis on it.

In [None]:
import numpy as np

## Create first array
To get things started we will create an array with numbers generated from a random normal distribution with mean 0 and standard deviation 1.

In [None]:
np.random.seed(123)
a = np.random.randn(30, 7)
a = a.round(2)
a

### Accessing elements
In native Python, the indexing operator, the brackets **`[]`**, select items from a container. This is most commonly done in tuples, lists and dictionaries. ndarrays use the same operator for selection. 

To select a single element, simply place the index of the row and column inside the brackets separated by a comma.

`array[row_selection, column_selection]`

In [None]:
array[10, 3]

In [None]:
# it is 0 indexed
array[0, 0]

### Use the colon, `:`, to select all elements of that dimension.

In [None]:
# select all the rows of the 5th column
array[:, 5]

### Use slice notation to select particular subsets of data
Slice notation is in the form `start:stop:step`.

In [None]:
# Use slice notation to select a block of data
array[5:10, 2:5]

In [None]:
# start:stop:step notation
array[3:18:5, ::2]

## Arithmetic operations on the entire array
Applying an arithmetic operation to an entire array is easy and looks exactly how it would in normal mathematical notation. These operations are not so trivial with python lists.

In [None]:
# multiply each element by 5
array * 5

In [None]:
# subtract 3
array - 3

## Vectorized Operations
Like mentioned previously, NumPy is blazingly fast by Python standards. It is fast because it executes its code in pre-compiled C and Fortran that is highly optimized for scientific computing.

In [None]:
# select the first row
row = array[:, 0]
some_list = list(row)

In [None]:
[x + 1 for x in some_list]

In [None]:
row + 1

In [None]:
%timeit -n 5 [x + 1 for x in some_list]

In [None]:
%timeit -n 5 row + 1

## Array attributes and methods
Much of the power and functionality within NumPy arrays are accessible via its methods with the dot notation. There are also a few attributes (not executed with parentheses) that are worthwhile.

In [None]:
# get dimensions
a.shape

In [None]:
# get number of dimensions
a.ndim

In [None]:
# total number of elements
a.size

In [None]:
# Transpose array
a.T

### Array Statistical Methods
A number of common statistical methods are available:

In [None]:
a.max()

In [None]:
a.min()

In [None]:
a.sum()

In [None]:
a.mean()

In [None]:
a.std()

In [None]:
a.var()

In [None]:
a.sum()

### Reshaping methods

In [None]:
# make a single dimension
a.flatten()

In [None]:
# reshape - pass a tuple of new shape
a.reshape((42, 5))

### Use the `axis` parameter to apply a method in a single direction

In [None]:
# take max of each column
a.max(axis=0)

In [None]:
# take max of each row
a.max(axis=1)

In [None]:
a.sum(axis=0)

![](images/numpy_axis.png)

### NumPy functions on arrays
Not all functionality is available as array methods. NumPy provides more functionality with its functions. These are accessed with **`np.`** followed by the function name.

In [None]:
# absolute value. There is no abs method
np.abs(a)

In [None]:
# take the square root of the absolute value and then round
np.sqrt(np.abs(array)).round(2)

In [None]:
# some functions do the same things as methods
np.sum(a)

In [None]:
# sort defaults to sorting by row
np.sort(a)

## Comparison operators
The 6 comparison operators <, >, <=, >=, ==, != work on all elements of the array.

In [None]:
a > 0

In [None]:
# find out how many values are greater than 0
np.sum(a > 0)

In [None]:
# find percentage of values greater than 0
np.mean(a > 0)

In [None]:
# find how many are between -2 and 2
(a > -2) & (a < 2)

In [None]:
# this should be about 95%
((a > -2) & (a < 2)).mean()

### Resources
+ [NumPy's own tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
+ [Datacamp NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)