# Numpy Introduction

Numpy is a Python library for working with arrays. It is particularly useful for linear algebra, Fourier transform, and random number capabilities. Numpy also has a C API, which means that it is possible to write fast code in C and use it in Python through Numpy.

## 1.0 Installing and importing numpy

The numpy package is installed as part of the Anaconda distribution (you will not need to install it, only import it and use it).

In [65]:
import numpy # this is one way to import

In [66]:
numpy.asarray([1,2,3])

array([1, 2, 3])

In [67]:
# but, you can also rename a package (to a shorter name) when importing

import numpy as np # this is a very common way to import numpy

np.asarray([2,3,4])

array([2, 3, 4])

## 2. Numpy basics

### 2.1 Lists versus numpy array

In [68]:
# Here are some regular lists...
height = [1.75, 1.34, 1.56, 1.54, 1.48]
weight = [65.4, 56.7, 73.8, 65.4, 49.8]

Numpy is a package that is commonly used in data science. It allows you to easily conduct vector/matrix calculations; and in general, conduct operations easily (and fast) over large arrays of data. 

In [69]:
height_np = np.asarray(height)
weight_np = np.asarray(weight)

In [70]:
bmi = weight_np / height_np ** 2
print(bmi)

[21.35510204 31.57718868 30.32544379 27.57631978 22.73557341]


In [71]:
# notice that we can't do a calculation of bmi as easily if you only had the lists
# note that the code in this cell creates and error (to demonstrate now lists and numpy differ)
#bmi = weight / height ** 2
#print(bmi)

### 2.2 Operations on numpy arrays

In [72]:
np_ra1 = np.array([1,2,3,4])

In [73]:
np_ra1 * 2 # multiple all elements of the array by 2

array([2, 4, 6, 8])

In [74]:
np_ra1 ** 2 # take easy element in the array and square it

array([ 1,  4,  9, 16])

In [75]:
np_ra2 = np.array([3,4,5, 8])

In [76]:
np_ra1 * np_ra2 # we can even multiple (and subtract, add, or divide) two lists

array([ 3,  8, 15, 32])

In [77]:
np_ra1 / np_ra2 

array([0.33333333, 0.5       , 0.6       , 0.5       ])

In [78]:
np_ra1 + np_ra2 

array([ 4,  6,  8, 12])

In [79]:
np_ra1 - np_ra2 

array([-2, -2, -2, -4])

### 2.3 Numpy arrays only contain one type

In [80]:
np.array([1.0, "is", True]) # np will turn all of these into a string

array(['1.0', 'is', 'True'], dtype='<U32')

In [81]:
np.array([1.0, 2, 3.9]) # these will all become float

array([1. , 2. , 3.9])

Notice that Numpy will choose the smallest datatype that can hold all the content. 

### 2.4 NumPy Subsetting

In [82]:
bmi

array([21.35510204, 31.57718868, 30.32544379, 27.57631978, 22.73557341])

In [83]:
bmi > 23

array([False,  True,  True,  True, False])

In [84]:
bmi[bmi > 23]

array([31.57718868, 30.32544379, 27.57631978])

## 3.0 Multidimensional Numpy arrays

In [85]:
np_2d = np.array([np.asarray(height), np.asarray(weight)])
np_2d

array([[ 1.75,  1.34,  1.56,  1.54,  1.48],
       [65.4 , 56.7 , 73.8 , 65.4 , 49.8 ]])

In [86]:
np_2d.shape  # NOTICE that this isn't a method, rather it's an attribut of the np_2d object

(2, 5)

In [87]:
# selecting first row, 3rd element (this is the same as lists!)
np_2d[0][2]

1.56

In [88]:
# BUT, with numpy arrays we can also do...
np_2d[0,2] # once you get used to this approach, it's more intiutive and easier that the list approach

1.56

In [89]:
np_2d[:,1:3] # select the columns at index 1 and 2 for all rows

array([[ 1.34,  1.56],
       [56.7 , 73.8 ]])

In [90]:
np_2d[1,:] # select all the weights

array([65.4, 56.7, 73.8, 65.4, 49.8])

### 5.0 Numpy Basic Stats

In [91]:
np_height = np.array(height)
np_weight = np.array(weight)

In [92]:
np_height.mean()

1.534

In [93]:
np_height.std()

0.13260467563400619

In [94]:
# np_height.median() # note that there isn't a median method... 

... using np functions (not methods)....

In [95]:
np.median(np_height)

1.54

In [96]:
np.mean(np_height)

1.534

In [97]:
np.corrcoef(np_height, np_weight)

array([[1.       , 0.5034277],
       [0.5034277, 1.       ]])

In [98]:
np.std(np_height)

0.13260467563400619

In [99]:
np.var(np_height)

0.017583999999999995

In [100]:
np.sum(np_height)

7.67

In [101]:
# here, we will generate 20 weights and height from randomly sampling a normal distribution
np_height = np.round(np.random.normal(1.75, 0.2, 20), 2)
np_weight = np.round(np.random.normal(60.32, 15, 20), 2)
np_eye_color = np.random.choice(["blue", "green", "brown", "black"], 20)
np_people = np.column_stack((np_height, np_weight, np_eye_color))
np_people

array([['1.94', '57.8', 'green'],
       ['1.66', '41.26', 'green'],
       ['1.58', '75.54', 'brown'],
       ['1.7', '50.07', 'green'],
       ['1.59', '82.32', 'black'],
       ['1.73', '54.82', 'brown'],
       ['1.39', '27.71', 'blue'],
       ['2.09', '65.88', 'black'],
       ['1.51', '50.19', 'black'],
       ['2.15', '39.73', 'green'],
       ['1.53', '70.51', 'brown'],
       ['1.53', '52.53', 'blue'],
       ['1.73', '73.46', 'green'],
       ['1.62', '57.94', 'blue'],
       ['1.84', '61.54', 'black'],
       ['1.4', '62.97', 'brown'],
       ['1.61', '92.94', 'blue'],
       ['1.56', '26.64', 'blue'],
       ['1.73', '61.98', 'blue'],
       ['1.5', '53.97', 'brown']], dtype='<U32')

## 4.0 Comparison operators and logical operations on numpy arrays

### 4.1 Comparison Operators

In [103]:
bmi > 25

array([False,  True,  True,  True, False])

In [104]:
bmi[bmi > 25]

array([31.57718868, 30.32544379, 27.57631978])

In [124]:
# this will cause an error...
# bmi > 21 and bmi < 25
# you will see how we can do this in the next section

### 4.2 Logical Operations

We can't directly use the logical operators on numpy arrays, but numpy had a few functions that can be used instead

- `np.logical_and()`
- `np.logical_or()`
- `np.logical_not()`
- `np.logical_xor()`

In [106]:
np.logical_and(bmi > 21, bmi < 25)

array([ True, False, False, False,  True])

In [107]:
a = np.array([True, True, False, False])
b = np.array([True, False, True, False])
print(np.logical_and(a, b))
print(np.logical_or(a, b))
print(np.logical_not(a))
print(np.logical_xor(a, b))

[ True False False False]
[ True  True  True False]
[False False  True  True]
[False  True  True False]


## 5.0 Looping over NumPy array

In [108]:
heights = np.array([1.75, 1.34, 1.56, 1.54, 1.48])
weights = np.array([65.4, 56.7, 73.8, 65.4, 49.8])

In [109]:
for height in heights:
    print(height)

1.75
1.34
1.56
1.54
1.48


In [110]:
meas = np.array([heights, weights])
for val in meas:
    print(val)

[1.75 1.34 1.56 1.54 1.48]
[65.4 56.7 73.8 65.4 49.8]


In [111]:
meas = np.array([heights, weights])
for val in np.nditer(meas):
    print(val)

1.75
1.34
1.56
1.54
1.48
65.4
56.7
73.8
65.4
49.8


## 6.0 List comprehensions with numpy

Most often, the most elegent way to handle loop like functionality on numpy arrays is to use a list comprehension.

In the cell below we create a numpy array called n_ra that contains 10 random numbers between 0 and 1. 

In [112]:
n_ra = np.array([x for x in np.random.normal(0,1,10)])

n_ra

array([-0.11832915,  1.8379827 ,  0.4847034 ,  1.58030942, -1.21596307,
        0.41139   , -1.03111275, -0.73439928,  0.72803743, -0.54199408])

Often, we do not need this many decimal places. In such cases, we can incorporate the round function.

In [113]:
n_ra = np.array([x for x in np.random.normal(0,1,10).round(3)])

n_ra

array([ 0.609,  0.915,  1.509, -1.857,  0.051, -0.743,  0.563, -0.04 ,
       -0.658,  0.455])

In the cell below, we create a numpy array called n_ra that selects all values from the original n_ra that are greater than 0. It then rounds these reusults to 2 decimal places. 

In [114]:

n_ra = np.array([x**2 for x in n_ra if x > 0]).round(2)

n_ra

array([0.37, 0.84, 2.28, 0.  , 0.32, 0.21])

## 7.0 Arithmetic operations on numpy arrays

Let's create two numpy arrays, one called `height` and one called `width`.

In [115]:
height = np.array([1.75, 1.34, 1.56, 1.54, 1.48])
width = np.array([65.4, 56.7, 73.8, 65.4, 49.8])

We could calculate an array of area's by using the following vector operations.

In [116]:
area = height * width

area


array([114.45 ,  75.978, 115.128, 100.716,  73.704])

In the following cells, we will see that numpy supports all the common arithmetic operators (+, -, *, /, **, etc.) and that they are applied element-wise to arrays.

In [117]:
a = np.array([1,2,3])
b = np.array([4,5,6])

In [118]:
# subtract two numpy arrays
a - b

array([-3, -3, -3])

In [119]:
# multiple two numpy arrays
a * b


array([ 4, 10, 18])

In [120]:
# divide two numpy arrays
a / b

array([0.25, 0.4 , 0.5 ])

In [121]:
# Remainder of two numpy arrays

a % b

array([1, 2, 3])

In [122]:
# Integer division of numpy arrays
a // b

array([0, 0, 0])

In [123]:
# Exponentiation of numpy arrays
b** a

array([  4,  25, 216])