# NumPy

NumPy(Numerical Python) has many uses including:

- efficiently working with many numbers at once
- generating random numbers
- performing many different numerical functions (i.e., calculating sin, cos, tan, mean, median, etc.)

### One Dimensional Arrays

One dimensional(1-D) arrays are `single variable` datasets - contain a singe value for each property/data point.

Start by importing the NumPy package at the top of your file:
`import numpy as np`

Numpy uses an `array` data structure to organise data items.

- can be any data type, including strings, numbers and other arrays (MUST be of the same data type, otherwise numpy will try and upcast the value, e.g. ints cast to floats).
- transform a regular list into a NumPy array by using `.array()`
- common data types(dtype) include int(8/16/32/64), float(16/32/64/128), bool and string(fixed length)
- special numerical types nan(NaN) and inf(Infinity) - use isnan() and isinf() to identify nan and inf values, do not use ==.

In [2]:
import numpy as np

# you can create a numpy array directly from a python list
my_array = np.array([1,2,3,4,5,6])
my_array

array([1, 2, 3, 4, 5, 6])

In [4]:
# you can also explicitly set the data type with the `dtype` keyword
my_array = np.array([1,2,3,4,5,6,7,8], dtype='float32')
my_array

array([1., 2., 3., 4., 5., 6., 7., 8.], dtype=float32)

In [9]:
my_array = np.array([1,2,3,4,5,6], dtype='str')
my_array

array(['1', '2', '3', '4', '5', '6'], dtype='<U1')

In [10]:
my_array = np.array([1,0,1,0,1,0], dtype='bool')
my_array

array([ True, False,  True, False,  True, False])

In [11]:
my_array = np.array([1,2,3,4,0,0], dtype='bool')
my_array

array([ True,  True,  True,  True, False, False])

In [13]:
# convert a numpy array to python list
my_array = list(my_array)
my_array

[True, True, True, True, False, False]

You can **import data** directly into numpy arrays from **csv** files using the `.genfromtxt()`. The method takes two arguments, the firl path and the delimiter used to separate the values.

In [15]:
my_array = np.genfromtxt('data/sample.csv', delimiter=',')
my_array

array([1., 2., 3., 4., 5., 6., 7., 8., 9., 0.])

In [18]:
# all the fields in the csv file need to be of the same datatype
my_array = np.genfromtxt('data/test.csv', delimiter=',')
my_array

array([[      nan,       nan,       nan, ...,       nan,       nan,
              nan],
       [ 892.    ,    3.    ,       nan, ...,    7.8292,       nan,
              nan],
       [ 893.    ,    3.    ,       nan, ...,    7.    ,       nan,
              nan],
       ...,
       [1307.    ,    3.    ,       nan, ...,    7.25  ,       nan,
              nan],
       [1308.    ,    3.    ,       nan, ...,    8.05  ,       nan,
              nan],
       [1309.    ,    3.    ,       nan, ...,   22.3583,       nan,
              nan]])

In [21]:
np.genfromtxt('data/gender_submission.csv', delimiter=',')

array([[      nan,       nan],
       [8.920e+02, 0.000e+00],
       [8.930e+02, 1.000e+00],
       [8.940e+02, 0.000e+00],
       [8.950e+02, 0.000e+00],
       [8.960e+02, 1.000e+00],
       [8.970e+02, 0.000e+00],
       [8.980e+02, 1.000e+00],
       [8.990e+02, 0.000e+00],
       [9.000e+02, 1.000e+00],
       [9.010e+02, 0.000e+00],
       [9.020e+02, 0.000e+00],
       [9.030e+02, 0.000e+00],
       [9.040e+02, 1.000e+00],
       [9.050e+02, 0.000e+00],
       [9.060e+02, 1.000e+00],
       [9.070e+02, 1.000e+00],
       [9.080e+02, 0.000e+00],
       [9.090e+02, 0.000e+00],
       [9.100e+02, 1.000e+00],
       [9.110e+02, 1.000e+00],
       [9.120e+02, 0.000e+00],
       [9.130e+02, 0.000e+00],
       [9.140e+02, 1.000e+00],
       [9.150e+02, 0.000e+00],
       [9.160e+02, 1.000e+00],
       [9.170e+02, 0.000e+00],
       [9.180e+02, 1.000e+00],
       [9.190e+02, 0.000e+00],
       [9.200e+02, 0.000e+00],
       [9.210e+02, 0.000e+00],
       [9.220e+02, 0.000e+00],
       [

### Performing operations on NumPy arrays

Numpy arrays are more efficient than lists when performing operations. Numpy arrays allow you to perform 'element-wise operations', e.g. addition/multiplication/etc on elements in the array directly.

Note: these operations depend on ALL the elements in the arrays being of the SAME type

In [54]:
a = np.array([1,2,3,4,5,6,7,8])
a += 3
a

array([ 4,  5,  6,  7,  8,  9, 10, 11])

In [55]:
a -= 2
a

array([2, 3, 4, 5, 6, 7, 8, 9])

In [56]:
a **= 2
a

array([ 4,  9, 16, 25, 36, 49, 64, 81])

In [51]:
np.sqrt(a)

array([2., 3., 4., 5., 6., 7., 8., 9.])

**Adding/Substracting NumPy Arrays**

Two (or more) Numpy arrays can be added/multiplied/divided or subtracted. They must have the same number of elements, the arrays must have the same **shape**. Individual elements in the same positions will be added/subtracted. Numpy performs all calculations element-wise (i.e. element by element).

In [38]:
test_1 = np.array([92, 94, 88, 91, 87])
test_2 = np.array([79, 100, 86, 93, 91])
test_3 = np.array([87, 85, 72, 90, 92])
test_1 + test_2 + test_3

array([258, 279, 246, 274, 270])

In [40]:
# follows rules of presedence
test_1 - test_2 * test_3

array([-6781, -8406, -6104, -8279, -8285])

In [41]:
test_1 / test_2 * test_3 + test_1

array([193.3164557 , 173.9       , 161.6744186 , 179.06451613,
       174.95604396])

In [43]:
# the arrays must have the same number of values(shape)
test_4 = np.array([10, 20])
test_5 = np.array([1,2,3,4,5])
test_4 + test_5

ValueError: operands could not be broadcast together with shapes (2,) (5,) 

### Selecting Elements

You can select elements fomr a numpy array in the same way you do with Python lists, single lements with positive or negative integers, and ranges to select multiple elements.

In [57]:
a

array([ 4,  9, 16, 25, 36, 49, 64, 81])

In [58]:
a[-3]

49

In [62]:
# start from -1, and go from back to front(3rd arg)
a[-1:-4:-1]

array([81, 64, 49])

In [63]:
# the array itself remains unchanged
a

array([ 4,  9, 16, 25, 36, 49, 64, 81])

### Logical Operations

You can perform element-wise logical operations on NumPy arrays.

In [99]:
# determine which elements are > 12
a > 12

array([False, False,  True,  True,  True,  True,  True,  True])

In [98]:
# only return those elements > 32
a[a > 32]

array([36, 49, 64, 81])

In [97]:
# you can combine logical 'and' and 'or'(you need to surround each evaluation with '()')
a[(a > 15) & (a < 45)]

array([16, 25, 36])

In [3]:
import numpy as np

heights = [123.34, 143.34, 156.34, 178.56, 165.21, 189.43, 153.54]
ages = [23, 43, 34, 54, 46, 65, 25]

np_heights = np.array(heights)
np_ages = np.array(ages)

np_results = np_heights[np_ages > 45] # return that particular item when evaluation == True
print(np_results)

[178.56 165.21 189.43]


In [4]:
other_heights = np_heights[np_ages <= 45]
print(other_heights)

[123.34 143.34 156.34 153.54]


### Two Dimensional Arrays

Numpy supports 2-D arrays - nested arrays are all the SAME size, and Type(having one string or float value convrets all values to strings or floats). Boolean `True`/`False` converted to `1`/`0`. 2-D arrays are often used to represent a set of samples.

In [102]:
test_6 = np.array([[1,2,3,4], [5,3,7,8], [2,3,0,5], [4,5,6,7]])
test_7 = np.array([[1,0,1,0], [1,0,1,0], [1,0,1,0], [1,0,1,0]])
test_8 = np.array([[4,5,6,7], [6,7,8,9], [1,0,0,1], [6,5,4,3]])

You can carry out operations on 2-D arrays, e.g. operations on individual elements or adding/subtracting/multiplying or dividing two or more arrays together. Arrays must have the same shape, and be of the same `dtype`.

In [103]:
test_6 * 2

array([[ 2,  4,  6,  8],
       [10,  6, 14, 16],
       [ 4,  6,  0, 10],
       [ 8, 10, 12, 14]])

In [104]:
test_6 + test_7 * test_8

array([[ 5,  2,  9,  4],
       [11,  3, 15,  8],
       [ 3,  3,  0,  5],
       [10,  5, 10,  7]])

#### **Selecting elements from a 2-D array**

Specify both the `row` and the `column` index(rows start from zero)

NOTE:

In a 2-D array, the axes correspond to the interior arrays in the following way:

axis=0 are values that share an index (in the same column) and axis=1 are values share an array (in the same row) - the answer is flipped!

We can also think of **axis=0 == columns** and **axis=1 == rows**.

```py
        0  1  2  3  4
     0 [[1, 3, 4, 7, 0],
     1 [3, 5, 7, 9, 2],
     2 [2, 4, 6, 8, 1]])

```

In [112]:
a = np.array([[1,3,4,7,0], [3,5,7,9,2], [2,4,6,8,1]])
a[2,1] # row/col

4

To select an entire column, use `:` as the **row** index

In [113]:
a[:, 1]

array([3, 5, 4])

In [114]:
a[:, 0]

array([1, 3, 2])

To select an entire row, insert `:` as the **column** index.

In [115]:
a[1, :]

array([3, 5, 7, 9, 2])

In [116]:
a[0, :]

array([1, 3, 4, 7, 0])

To select a range **from** a row

In [117]:
# 2nd row, elements 1-3 inclusive
a[1, 1:4]

array([5, 7, 9])