# What is Numpy?

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

numpy is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed.

First of all, numpy arrays cannot contain elements with different types. If you try to build such a list, some of the elements' types are changed to end up with a homogeneous list. This is known as type coercion.

Second, the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and numpy arrays.

    True is converted to 1, False is converted to 0.

## Create a numpy array


In [1]:
# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np
import numpy as np

# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print (type(np_baseball))

The major advantage to using a numpy array instead of a python list is that it allows you to perform calculations extremely quickly



In [2]:
height_in = [74, 74, 72 ]

# Create a numpy array from height_in: np_height_in
np_height_in = np.array(height_in)

# Print out np_height_in
print (np_height_in)

# Convert np_height_in to m: np_height_m
np_height_m = np_height_in * 0.0254

# Print np_height_m
print (np_height_m)


[74 74 72]
[1.8796 1.8796 1.8288]


### Create a boolean array

Boolean arrays in numpy allow you to evaluate a boolean statement for each item in an array



In [5]:
# BMI list

bmi = np.array([23.11037639, 27.60406069, 28.48080465, 25.62295933, 20.54255679, 20.54255679, 20.69282047, 20.69282047, 20.34343189,
       20.34343189, 20.69282047])

# Create the light array to print all BMIs below 21
light = bmi < 21

# Print out light - this will evaluate each member against the condition
print (light)

# Print out BMIs of all baseball players whose BMI is below 21
print (bmi[light])

[False False False False  True  True  True  True  True  True  True]
[20.54255679 20.54255679 20.69282047 20.69282047 20.34343189 20.34343189
 20.69282047]


### Subsetting NumPy Arrays

You've seen it with your own eyes: Python lists and numpy arrays sometimes behave differently. Luckily, there are still certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same. To see this for yourself, try the following lines of code in the IPython Shell:

In [9]:
x = ["a", "b", "c"]
x[1]

np_x = np.array(x)
np_x[1]

'b'

### 2D dimensional arrays

You might occasionally hear an array referred to as a “ndarray,” which is shorthand for “N-dimensional array.” An N-dimensional array is simply an array with any number of dimensions. 

You might also hear 1-D, or one-dimensional array, 2-D, or two-dimensional array, and so on. The NumPy ndarray class is used to represent both matrices and vectors. 

A vector is an array with a single dimension (there’s no difference between row and column vectors), while a matrix refers to an array with two dimensions

In [14]:
x = np.array([[1, 2, 3], [4, 5, 6]])

print (type(x))

# The shape attribute will show the number of rows and columns in the matrix 
print (x.shape)
 

<class 'numpy.ndarray'>
(2, 3)


#### Subsetting 2D NumPy Arrays

If your 2D numpy array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy.

For regular Python lists, this is a real pain. For 2D numpy arrays, however, it's pretty intuitive! The indexes before the comma refer to the rows, while those after the comma refer to the columns. The : is for slicing; in this example, it tells Python to include all rows.

Remember, the index starts at 0


In [11]:
x[0][2]

3

In [12]:
x[0,2]

3

In [17]:
# Select the entire first column of x

np_first_column = x[:,0]

np_first_column

array([1, 4])

#### 2D Arithmetic

You can combine matrices with single numbers, with vectors, and with other matrices.

Execute the code below in the IPython shell and see if you understand:


In [18]:
np_mat = np.array([[1, 2],
                   [3, 4],
                   [5, 6]])
np_mat * 2
np_mat + np.array([10, 10])
np_mat + np_mat

array([[ 2,  4],
       [ 6,  8],
       [10, 12]])

#### Summary statistics 

It's always a good idea to check both the median and the mean, to get an idea about the overall distribution of the entire dataset.

x = [1, 4, 8, 10, 12]
np.mean(x)
np.median(x)
np.std(x)
np.corrcoef(x[0],x[1])