# NumPy Introduction - How to create arrays
Numpy is a great tool for managing datasets in the form of multidimensional matrices, otherwise known as arrays. It's widely used in machine learning for handling our data, as we will see in later tutorials. There are several built in methods for creating arrays

In [2]:
import numpy as np #commonly imported as np for simplicity

## np.array
Converts given list into array

In [3]:
my_list = [1,2,3]
np.array(my_list) #converts a list to an array

array([1, 2, 3])

In [4]:
matrix1 = [[1,2,3],[4,5,6],[7,8,9]]
np.array(matrix1) #each nested list is seen as a row by numpy

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In the example below, matrix2 is not dimensionally consistent. Observe the output below and compare it to the previous output. 

In [5]:
matrix2 = [[1,2,3],[4,5,6,51,90,76],[7,8,9,10]]
np.array(matrix2) #each nested list is seen as a row by numpy

array([list([1, 2, 3]), list([4, 5, 6, 51, 90, 76]), list([7, 8, 9, 10])],
      dtype=object)

A 1D array of list objects is created for matrix2 and a 2D array is created for matrix1. It would be impossible to create a 2D array for matrix2 due to dimensional inconsistency. Important to remember!

## zeros and ones
Creates an array with only zeros or only ones

In [6]:
np.zeros((3,3)) 
#outputs a 3by3 matrix of zeros only
#notice double brackets. (3,3) is a tuple taken by the zeros method

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [7]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

## Random 
Numpy also has lots of ways to create random number arrays:

### rand
Creates an array of the given shape and populates it with random samples from a uniform distribution over ``[0, 1)``. [[reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.rand.html)]

In [8]:
np.random.rand(3,3)

array([[0.40222907, 0.57075049, 0.89319265],
       [0.56631686, 0.68505469, 0.9121734 ],
       [0.61690493, 0.85088508, 0.41710925]])

### randn

Returns a sample (or samples) from the "standard normal" distribution [σ = 1]. Unlike **rand** which is uniform, values closer to zero are more likely to appear. [[reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.randn.html)]

In [9]:
np.random.randn(3,3)

array([[-0.62405079, -1.2908205 ,  1.65111582],
       [-0.46078754,  0.61371018, -0.30913943],
       [-0.76940946, -1.01345867,  0.71724625]])

### randint
Returns random integers from `low` (inclusive) to `high` (exclusive).  [[reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.randint.html)]

In [10]:
np.random.randint(1,100)

33

### seed
Can be used to set the random state, so that the same "random" results can be reproduced. [[reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.seed.html)]

In [11]:
np.random.seed(42)
np.random.rand(4)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848])

In [12]:
np.random.seed(42)
np.random.rand(4) #the same output is produced twice even with randomness

array([0.37454012, 0.95071431, 0.73199394, 0.59865848])

## Reshape
Returns an array containing the same data with a new shape. [[reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.reshape.html)]

In [13]:
arr = np.random.randint(0,100,25)
arr

array([82, 86, 74, 74, 87, 99, 23,  2, 21, 52,  1, 87, 29, 37,  1, 63, 59,
       20, 32, 75, 57, 21, 88, 48, 90])

In [14]:
arr = arr.reshape(5,5) #reshapes arr from 1by25 to 5by5
arr

array([[82, 86, 74, 74, 87],
       [99, 23,  2, 21, 52],
       [ 1, 87, 29, 37,  1],
       [63, 59, 20, 32, 75],
       [57, 21, 88, 48, 90]])

### max, min, argmax, argmin

These are useful methods for finding max or min values. Or to find their index locations using argmin or argmax

In [15]:
arr.max() #max value in arr

99

In [16]:
arr.argmax() #position/index of max value in arr

5

In [17]:
arr.min()

1

In [18]:
arr.argmin()

10

## Shape
Outputs the dimensions of an array [[reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ndarray.shape.html)]

In [19]:
arr.shape

(5, 5)

## dtype
You can also grab the data type of the object in the array: [[reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ndarray.dtype.html)]

In [20]:
arr.dtype

dtype('int32')

In [21]:
arr2 = np.array([1.2, 3.4, 5.6])
arr2.dtype

dtype('float64')

# NumPy Indexing and Selection

In this lecture we will discuss how to select elements or groups of elements from an array.

In [23]:
#Creating sample array
arr = np.arange(0,11)

In [24]:
#Show
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [25]:
#Get a value at an index
arr[8]

8

In [26]:
#Get values in a range
arr[1:5]

array([1, 2, 3, 4])

In [27]:
#Get values in a range
arr[1:5]

array([1, 2, 3, 4])

## Indexing a 2D array (matrices)

The general format is **arr_2d[row][col]** or **arr_2d[row,col]**.

In [28]:
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))

#Show
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [29]:
#Indexing row
arr_2d[1]

array([20, 25, 30])

In [30]:
# Format is arr_2d[row][col] or arr_2d[row,col]

# Getting individual element value
arr_2d[1][0]

20

# NumPy Operations

## Arithmetic

You can easily perform *array with array* arithmetic, or *scalar with array* arithmetic. Let's see some examples:

In [31]:
arr = np.arange(0,10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [32]:
arr + arr #operations are done on values with corresponding indices

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [33]:
arr * arr

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [34]:
arr - arr

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [35]:
# This will raise a Warning on division by zero, but not an error!
# It just fills the spot with nan
arr/arr

  arr/arr


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [36]:
# Also a warning (but not an error) relating to infinity
# This happens because it tries to divide 1 by 0, which is not possible
1/arr

  1/arr


array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

In [37]:
arr**3

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729], dtype=int32)

## Universal Array Functions

NumPy comes with many [universal array functions](http://docs.scipy.org/doc/numpy/reference/ufuncs.html), or <em>ufuncs</em>, which are essentially just mathematical operations that can be applied across the array.<br>Let's show some common ones:

In [38]:
# Taking Square Roots
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [39]:
# Calculating exponential (e^)
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [40]:
# Trigonometric Functions like sine
np.sin(arr)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [41]:
np.log(arr)

  np.log(arr)


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])

## Summary Statistics on Arrays

NumPy also offers common summary statistics like <em>sum</em>, <em>mean</em> and <em>max</em>. You would call these as methods on an array.

In [43]:
arr = np.arange(0,10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [44]:
arr.sum()

45

In [45]:
arr.mean()

4.5

In [46]:
arr.max() #useful for inspecting images before proccessing

9

## Axis Logic
When working with 2-dimensional arrays (matrices) we have to consider rows and columns. This becomes very important when we get to the section on pandas. In array terms, axis 0 (zero) is the vertical axis (rows), and axis 1 is the horizonal axis (columns). These values (0,1) correspond to the order in which <tt>arr.shape</tt> values are returned.

Let's see how this affects our summary statistic calculations from above.

In [48]:
arr_2d = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
arr_2d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [53]:
arr_2d.sum(axis=0) #adds along vertical axis

array([15, 18, 21, 24])

In [54]:
arr_2d.sum(axis=1) #adds along horizontal axis

array([10, 26, 42])

## Source
www.pieriandata.com