# Numpy arrays
Numpy (Numerical Python) arrays are a very useful numerial datastructure. They are mostly the same as lists but with some benefits that we will see.

> The first most important thing to remember is that a numpy `array` *will store only __one type__ of data*.

eg: 
- `array([1, 2, 3])` is VALID as all the elements are of same dtype (int)


- `array([0, 'Hi', False])` is INVALID since there is a mix of dtypes in this array

### Import Numpy

In [1]:
import numpy as np

### Creating a numpy array
We pass in a list (tuple or other sequence or nested sequence). Opetional kwarg `'dtype'` (among others) where we can specify the datatype (eg: `numpy.int32`, `numpy.float64` etc)

In [2]:
a = np.array([2, 4, 4, 1, 1, 3, 9])
a

array([2, 4, 4, 1, 1, 3, 9])

### Indexing
__Syntax__: `arrayName[index]`

In [3]:
print('First element:', a[0])
print('Last element:', a[-1])

First element: 2
Last element: 9


But we can already to the same things with lists. Why do we need to use arrrays?

Main benefits of Numpy Arrays:
- Operations faster
- Takes far less memory
- More convenient

### Slicing
__Syntax__: `arrayName[start:stop:step]`

In [4]:
# A subset of the array
print(a[2:5])
# Slicing with step 2
print(a[::2])
# Slicing with negative step (reverse order)
print(a[::-1])

[4 1 1]
[2 4 1 9]
[9 3 1 1 4 4 2]


### Array attributes
- `size` shows the __number of elements__ in the array (same as `len(a)`).


- `ndim` shows the number of __dimensions__. eg: 2 for 2D, 3 for 3D


- `shape` shows the shape of the array. eg: __(rows, columns)__ for 2d


- `nbytes` shows the total size of the array in bytes.


- `dtype` shows the __data type__ of the array.


- `itemsize` shows the size in bytes of a sigle element (all elements of the array are the same size, as they are of same data type)

In [5]:
# Lets make a 2D array
a = np.array([[1, 2, 3],[4, 5, 6]])

print('Number of elements:', a.size)
print('Dimensions:', a.ndim)
print('Shape:', a.shape)
print('Total size of array:', a.nbytes)
a

Number of elements: 6
Dimensions: 2
Shape: (2, 3)
Total size of array: 24


array([[1, 2, 3],
       [4, 5, 6]])

In [6]:
# Lets make a 3D array
a = np.array([[[1, 2],[3, 4]],[[5, 6],[7, 8]]])

print('Number of elements:', a.size)
print('Dimensions:', a.ndim)
print('Shape:', a.shape)
a

Number of elements: 8
Dimensions: 3
Shape: (2, 2, 2)


array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

As we can see we can increase the dimensions of our array by nesting more lists inside lists. Lets recap

- 1D -> `[1, 2, 3, 4, 5, 6, 7, 8]`


- 2D -> `[[1, 2, 3, 4], [5, 6, 7, 8]]`


- 3D -> `[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]`

...We can keep going!

In [7]:
a.dtype, a.itemsize

(dtype('int32'), 4)

### Setting data type
We set the datatype by calling the `dtype` kwarg and assigning a numpy datatype. eg: `int 16`, `int32`, `float64`, `bool`, `double` etc. Numpy allows plenty of more datatypes (which are all availble from C) than regular Python.

For more info on Numpy data types visit: [Numpy Devdocs](https://www.numpy.org/devdocs/user/basics.types.html)

In [8]:
a = np.array([1, 5, 3, 9, 7, 5, 4, 2, 2, 1], dtype=np.float64)
a

array([1., 5., 3., 9., 7., 5., 4., 2., 2., 1.])

### Generating arrays
We can generate an array of `zeros` or `ones` by passing in the shape of the array.

Single row

In [9]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

3 rows, 4 columns

In [10]:
np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [11]:
np.ones((3, 4), dtype=np.int32)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

__Note__: these array generators also have an optional `dtype` kwarg.

In [12]:
np.arange(1, 6) # Default dtype for arange is int32

array([1, 2, 3, 4, 5])

In [13]:
np.arange(1, 6, dtype=np.int64)

array([1, 2, 3, 4, 5], dtype=int64)

The `arange` or __array range__ function is the same as the `range` function. The only difference is:
- `range` generates a range object which can be casted to list, iterated over etc.


- `arange` generates an array from the starting to before the ending point (start <= x < stop).

In [14]:
# 50 numbers between 1 and 5
a = np.linspace(1, 5, 50)
a

array([1.        , 1.08163265, 1.16326531, 1.24489796, 1.32653061,
       1.40816327, 1.48979592, 1.57142857, 1.65306122, 1.73469388,
       1.81632653, 1.89795918, 1.97959184, 2.06122449, 2.14285714,
       2.2244898 , 2.30612245, 2.3877551 , 2.46938776, 2.55102041,
       2.63265306, 2.71428571, 2.79591837, 2.87755102, 2.95918367,
       3.04081633, 3.12244898, 3.20408163, 3.28571429, 3.36734694,
       3.44897959, 3.53061224, 3.6122449 , 3.69387755, 3.7755102 ,
       3.85714286, 3.93877551, 4.02040816, 4.10204082, 4.18367347,
       4.26530612, 4.34693878, 4.42857143, 4.51020408, 4.59183673,
       4.67346939, 4.75510204, 4.83673469, 4.91836735, 5.        ])

In [15]:
a[1] - a[0]

0.08163265306122458

In [16]:
a[2] - a[1]

0.08163265306122436

The `linspace` function generates an array that is __linearly spaced__, meaning that the difference between each element is the same.


args: `start`, `stop`, `num`

### Reshaping arrays

In [17]:
a.shape

(50,)

Now the array is a 1D line of 50 values. Lets make it into a 5 row by 10 column array.

__Note__: `reshape()` returns a new array but doesnt change the original array. To change the array _inplace_ we can assign it to the return.

In [18]:
a = a.reshape((5, 10))
a

array([[1.        , 1.08163265, 1.16326531, 1.24489796, 1.32653061,
        1.40816327, 1.48979592, 1.57142857, 1.65306122, 1.73469388],
       [1.81632653, 1.89795918, 1.97959184, 2.06122449, 2.14285714,
        2.2244898 , 2.30612245, 2.3877551 , 2.46938776, 2.55102041],
       [2.63265306, 2.71428571, 2.79591837, 2.87755102, 2.95918367,
        3.04081633, 3.12244898, 3.20408163, 3.28571429, 3.36734694],
       [3.44897959, 3.53061224, 3.6122449 , 3.69387755, 3.7755102 ,
        3.85714286, 3.93877551, 4.02040816, 4.10204082, 4.18367347],
       [4.26530612, 4.34693878, 4.42857143, 4.51020408, 4.59183673,
        4.67346939, 4.75510204, 4.83673469, 4.91836735, 5.        ]])

In [19]:
a.shape

(5, 10)

Lets try a different shape. How about 6 rows x 8 cols?

In [20]:
# This will cause an error, so I used a try, except block
try:
    a.reshape((6, 8))
except ValueError:
    print('Cannot reshape array of size 50 into shape (6,8)')

Cannot reshape array of size 50 into shape (6,8)


### The product of the components of an array shape must equal the array size.
Here, the array size = 50.
- Initial shape: 50 * 1 = 50 __VALID!__


- WE reshaped: 5 * 10 = 50   __VALID!__


- Then we tried: 6 * 8 = 48  __INVALID!__

Now lets flatten our array into a line again.

In [21]:
a = a.ravel()
a

array([1.        , 1.08163265, 1.16326531, 1.24489796, 1.32653061,
       1.40816327, 1.48979592, 1.57142857, 1.65306122, 1.73469388,
       1.81632653, 1.89795918, 1.97959184, 2.06122449, 2.14285714,
       2.2244898 , 2.30612245, 2.3877551 , 2.46938776, 2.55102041,
       2.63265306, 2.71428571, 2.79591837, 2.87755102, 2.95918367,
       3.04081633, 3.12244898, 3.20408163, 3.28571429, 3.36734694,
       3.44897959, 3.53061224, 3.6122449 , 3.69387755, 3.7755102 ,
       3.85714286, 3.93877551, 4.02040816, 4.10204082, 4.18367347,
       4.26530612, 4.34693878, 4.42857143, 4.51020408, 4.59183673,
       4.67346939, 4.75510204, 4.83673469, 4.91836735, 5.        ])

Like before `ravel()` returns a new array but doesn't change the original one, so we assign the original to the return.

### Numpy Math operations
These functions are available as both independent numpy functions and also as array object methods.

`min`, `max`, `sum`

In [22]:
a = np.array([[1, 2], [3, 4], [5, 6]])
print('Shape:', a.shape)
a

Shape: (3, 2)


array([[1, 2],
       [3, 4],
       [5, 6]])

In [23]:
a.min(), a.max()

(1, 6)

In [24]:
np.min(a), np.max(a)

(1, 6)

Sum of all elements in the array

Below code same as `np.sum(a)`

In [25]:
a.sum()

21

Sum elements in each column

In [26]:
a.sum(axis=0)

array([ 9, 12])

Sum elements in each row

In [27]:
a.sum(axis=1)

array([ 3,  7, 11])

The diagram below shows how these sum operations accros axes work. Its important to remember, as this will come in handy later when using Pandas.
![](numsumaxis.png)

### Numpy math functions
`np.sqrt()` returns a new array with the square root of each element.

In [28]:
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [29]:
np.sqrt(a)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ],
       [2.23606798, 2.44948974]])

In [30]:
np.mean(a)

3.5

In [31]:
np.median(a)

3.5

Unfortunately Numpy doesnt have an np.mode().

`np.std()` gives the standard deviation (a measure of spread) of the array elements.

In [32]:
np.std(a)

1.707825127659933

## Array operations
Array addition, subtraction, multiplication, division can be done very simply.

In [33]:
a = np.array([[1, 2], [3, 4]], dtype=np.float64)
b = np.array([[2, 1], [4, 3]], dtype=np.float64)

In [34]:
a

array([[1., 2.],
       [3., 4.]])

In [35]:
b

array([[2., 1.],
       [4., 3.]])

### Operations on arrays lead to element wise operation

Operations: `+`, `-`, `*`, `/`, `//`, `%`, `**`, `+`, 

##### 1. Vector-scalar operations
eg: a + 2 means add 2 to each element of the array

In [36]:
# Vector addition
a + 2

array([[3., 4.],
       [5., 6.]])

In [37]:
# Vector subtraction
a - 2

array([[-1.,  0.],
       [ 1.,  2.]])

In [38]:
# Vector multiplication
a * 2

array([[2., 4.],
       [6., 8.]])

In [39]:
# Vector division
a / 2

array([[0.5, 1. ],
       [1.5, 2. ]])

In [40]:
# Vector floor division
a // 2

array([[0., 1.],
       [1., 2.]])

In [41]:
# Vector modulus
a % 2

array([[1., 0.],
       [1., 0.]])

In [42]:
# Vector exponentiation
a**2

array([[ 1.,  4.],
       [ 9., 16.]])

In [43]:
# Vector negation
-a

array([[-1., -2.],
       [-3., -4.]])

In [44]:
# Vector absolution
print(abs(a))
print(np.abs(a))

[[1. 2.]
 [3. 4.]]
[[1. 2.]
 [3. 4.]]


Above code same as pythons regular `abs(a)`

##### 2. Vector-vector operations
eg: a + b means for each index, add each element of a with each element of b. (Note array shapes must be same)

In [45]:
a + b

array([[3., 3.],
       [7., 7.]])

In [46]:
b - a

array([[ 1., -1.],
       [ 1., -1.]])

In [47]:
a * b

array([[ 2.,  2.],
       [12., 12.]])

In [48]:
a / b

array([[0.5       , 2.        ],
       [0.75      , 1.33333333]])

__Matrix dot product__

Below code same as `np.dot(a, b)`

In [49]:
a.dot(b)

array([[10.,  7.],
       [22., 15.]])

Below code same as `np.dot(b, a)`

In [50]:
b.dot(a)

array([[ 5.,  8.],
       [13., 20.]])

__Element wise max or min between two arrays__:

When comparing two arrays, we can find the `maximum` or `minumum` value at each index.

In [51]:
# Making arrays 1D for clear comparison
a = a.flatten()
b = b.flatten()

print("First array:", a)
print("Second array:", b)

First array: [1. 2. 3. 4.]
Second array: [2. 1. 4. 3.]


In [52]:
np.maximum(a, b)

array([2., 2., 4., 4.])

In [53]:
np.minimum(a, b)

array([1., 1., 3., 3.])

__Note__: these two are not the same as `np.max` and `np.min`.

- `maximum` and `minumum` compare two arrays.

- `max` and `min` find the highest and lowest value of one array.