# Data Science Boot Camp

## Introduction to Numpy

* NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.<br>
<br>
* It contains:
    * a powerful N-dimensional array object
    * sophisticated (broadcasting) functions
    * tools for integrating C/C++ and Fortran code
    * useful linear algebra, Fourier transform, and random number capabilities


* NumPy by itself does not provide very much high-level data analytical functionality.<br>
<br>
* Having an understanding of NumPy arrays and array-oriented computing will help you use tools like pandas much more effectively.<br>
<br>
* Efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science. <br>

![Arrays vs Lists](images/whyNumpy.png)

### Simple Benchmark Test

#### Python List vs Numpy Array

Let's genarate a list of integers 0 to 999

In [1]:
pythonList = range(1000)

In [2]:
pythonList[:10]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Now lets genarate a numpy array with same elements,<br>
so first thing we have to import numpy library under the alias np and check our version.<br>

In [3]:
import numpy as np
print(np.__version__)
numpyArray = np.arange(1000)

1.14.0


In [4]:
numpyArray[:10]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [5]:
%%timeit -n 100
for i in pythonList:
    i * 2

100 loops, best of 3: 155 µs per loop


In [6]:
%timeit -n 100 numpyArray * 2

The slowest run took 4.94 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.15 µs per loop


#### Why?

* Because of Python’s dynamic typing, each item in the list must contain its own type info, reference count, and other information—that is, each item is a complete Python object.<br>


In [7]:
import sys
sys.getsizeof(pythonList[0])

24

*  In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array like Numpy Arrays<br>


In [8]:
numpyArray.itemsize

8

* A NumPy array in its simplest form is a Python object build around a C array. That is, it has a pointer to a contiguous data buffer of values.<br>
<br>
* A Python list, on the other hand, has a pointer to a contiguous buffer of pointers, each of which points to a Python object which in turn has references to its data


![Arrays vs Lists](images/Arrays_vs_Lists.png)

## The NumPy ndarray

* One of the key features of NumPy is its N-dimensional array, or ndarray, which is a __fast__, __flexible__ container for large data sets in Python.<br>

* An ndarray is a multidimensional container for __homogeneous__ data; that is, all of the elements must be the same type.<br>

* Every ndarray has a shape attribute which is a tuple indicating the size of each dimension.<br>
<br>
* Also you can look at the ndim attribute for array's number of dimensions.<br>

In [9]:
numpyArray.ndim

1

In [10]:
numpyArray.shape

(1000,)

* "numpyArray" which we created and used in "Python List vs Numpy Array" section, is 1 dimensional array and has 1000 element.<br>

* Also every ndarray has a attribute called dtype which is an object describing the data type of the array.

In [11]:
numpyArray.dtype

dtype('int64')

* Another useful attributes are:<br>
    * itemsize, which lists the size (in bytes) of each array element.<br>
    <br>
    * nbytes, which lists the total size (in bytes) of the array.<br>
    <br>
    * size, which lists the bumber of elements in the array.<br>

In [12]:
numpyArray.itemsize

8

In [13]:
numpyArray.nbytes

8000

In [14]:
numpyArray.size

1000

### Creating ndarrays

* First, we can use np.array to create arrays from Python lists.<br>

In [15]:
L = [1, 4, 2, 1, 3, 2, 4, 0, 1, 0]
arr = np.array(L)

In [16]:
arr

array([1, 4, 2, 1, 3, 2, 4, 0, 1, 0])

* For explicitly set the data type of the resulting array, we can use the dtype keyword.<br>

In [17]:
arr1 = np.array(L, dtype='float32')

In [18]:
arr1.dtype

dtype('float32')

* Unless explicitly specified, np.array tries to infer a good data type for the array that it creates.<br>

In [19]:
arr2 = np.array([3.14, 5, 2.71])

In [20]:
arr2.dtype

dtype('float64')

* You can explicitly convert or cast an array from one dtype to another using ndarray’s astype method.<br>

In [21]:
arr2 = arr2.astype("int32")

In [22]:
arr2.dtype

dtype('int32')

In [23]:
arr2 = arr2.astype(arr1.dtype)

In [24]:
arr2.dtype

dtype('float32')

* You can see available data types in the next slide.<br>

<table>
<thead>
<tr>
<th style="text-align:left;">Data type</th>
<th style="text-align:left;">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">int8, uint8</td>
<td style="text-align:left;">Signed and unsigned 8-bit (1 byte) integer types</td>
</tr>
<tr>
<td style="text-align:left;">int16, uint16</td>
<td style="text-align:left;">Signed and unsigned 16-bit integer types</td>
</tr>
<tr>
<td style="text-align:left;">int32, uint32</td>
<td style="text-align:left;">Signed and unsigned 32-bit integer types</td>
</tr>
<tr>
<td style="text-align:left;">int64, uint64</td>
<td style="text-align:left;">Signed and unsigned 32-bit integer types</td>
</tr>
<tr>
<td style="text-align:left;">float16</td>
<td style="text-align:left;">Half-precision floating point</td>
</tr>
<tr>
<td style="text-align:left;">float32</td>
<td style="text-align:left;">Single-precision float    32-bit</td>
</tr>
<tr>
<td style="text-align:left;">float64</td>
<td style="text-align:left;">Standard double-precision floating point. Compatible with C double and Python float object</td>
</tr>
<tr>
<td style="text-align:left;">float128</td>
<td style="text-align:left;">Extended-precision floating point</td>
</tr>
<tr>
<td style="text-align:left;">complex64, complex128, complex256</td>
<td style="text-align:left;">Complex numbers represented by two 32, 64, or 128 floats, respectively</td>
</tr>
<tr>
<td style="text-align:left;">bool</td>
<td style="text-align:left;">Boolean type storing True and False values</td>
</tr>
<tr>
<td style="text-align:left;">object</td>
<td style="text-align:left;">Python object types</td>
</tr>
<tr>
<td style="text-align:left;">string_</td>
<td style="text-align:left;">Fixed-length string type (1 byte per character). For example, to create a string dtype with length 16, use 'S16'</td>
</tr>
<tr>
<td style="text-align:left;">unicode_</td>
<td style="text-align:left;">Fixed-length unicode type (number of bytes platform specific). Same specification semantics as string_ (e.g. 'U10')</td>
</tr>
</tbody>
</table>

* Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array.<br>

In [25]:
arr3 = np.array([[1,2,3], [4,5,6]])

In [26]:
arr3

array([[1, 2, 3],
       [4, 5, 6]])

In [27]:
arr3.ndim

2

* In addition to np.array, there are a number of other functions for creating new arrays. Here are some examples:<br>

* Create a 1 dimensional, length 10 integer array filled with __zeros __.<br>

In [28]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

* Create a 2 x 3 floating-point array filled with __ones__.<br>

In [29]:
np.ones((2, 3), dtype=float)

array([[1., 1., 1.],
       [1., 1., 1.]])

* Create a 3 x 3 array without initializing its values to any particular value.<br>

In [30]:
np.empty([3, 3])

array([[-1.72723371e-077, -1.72723371e-077,  2.21586276e-314],
       [ 2.21595874e-314,  2.21597073e-314,  2.21594108e-314],
       [ 2.20740871e-314,  0.00000000e+000,  2.20373365e-314]])

* It’s not safe to assume that np.empty will return an array of all zeros.<br>
<br>
* In many cases, as previously shown, it will return uninitialized garbage values.<br>

*  Create a 3x5 array filled with value 2<br>

In [31]:
np.full((3, 5), 2)

array([[2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2]])

* Create an array filled with a linear sequence, starting at 0, ending at 10, stepping by 2<br>
* Numpy arange is similar to python built-in range function.<br>

In [32]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

* Create an floating-point array of five values evenly spaced between 0 and 2.<br>

In [33]:
np.linspace(0, 2, 5)

array([0. , 0.5, 1. , 1.5, 2. ])

* Create a 3 x 3 array of uniformly (continuous) distributed filled in random values between 0 and 1.<br>

In [34]:
np.random.random((3, 3))

array([[0.95713288, 0.0995074 , 0.73652056],
       [0.08343602, 0.70776813, 0.97946538],
       [0.49493737, 0.84347528, 0.18599182]])

* Create a 3 x 3 array of normally distributed filled in random values with mean 0 and standard deviation 1.<br>

In [35]:
np.random.normal(0, 1, (3, 3))

array([[-1.87174981,  1.62520264,  1.2355561 ],
       [ 0.12827165,  0.38276749, -0.09771709],
       [-0.27984436,  0.3569242 ,  0.29422659]])

* Create a 4 x 4 array of random integers between 0 and 10.<br>

In [36]:
np.random.randint(0, 10, (4, 4))

array([[0, 6, 6, 5],
       [3, 7, 7, 1],
       [3, 2, 7, 2],
       [8, 7, 6, 5]])

* Create a 4 x 4 identity matrix with floating-point values. (1’s on the diagonal and 0’s elsewhere).<br>

In [37]:
np.eye(4,4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [38]:
randArr = np.random.randint(0, 10, (4, 4))
print(randArr)
print(randArr * np.eye(4,4))

[[7 7 8 9]
 [9 0 4 4]
 [2 7 0 1]
 [7 3 1 2]]
[[7. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 2.]]


* The product of any matrix and the appropriate identity matrix is always the original matrix.<br>

### Basic Indexing

* In a one-dimensional array, you can access the "i"th value (counting from zero) by specifying the desired index in square brackets, just as with Python lists.<br>

In [39]:
arr4 = np.array([4, 1, 2, 4, 6, 2, 7, 0, 3, 9])
print(arr4)

[4 1 2 4 6 2 7 0 3 9]


* For example, if you want to access element with value 3:<br> 

In [40]:
print(arr4[8])
print(arr4[-2])

3
3


* In a multidimensional array, you access items using a comma-separated tuple of indices.

In [41]:
arr5 = np.array([[3, 2],[8, 8],[7, 6],[8, 2],[5, 9]])
print(arr5)

[[3 2]
 [8 8]
 [7 6]
 [8 2]
 [5 9]]


* For example, if you want to access element with value 7:<br> 

In [42]:
print(arr5[2,0])

7


* You can also modify values using any of the above index notation.

In [43]:
print(arr5)

[[3 2]
 [8 8]
 [7 6]
 [8 2]
 [5 9]]


* For example, let's change element with value 3 to 1

In [44]:
arr5[0,0] = 1 

In [45]:
print(arr5)

[[1 2]
 [8 8]
 [7 6]
 [8 2]
 [5 9]]


* Don't forget that, unlike Python lists, NumPy arrays have a fixed type. This means, if you attempt to insert a string value to an integer array, you will get a value error.

In [46]:
arr5[0,0] = "string"

ValueError: invalid literal for long() with base 10: 'string'

### Basic Slicing

* Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character.<br>

* The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this <br>
    x[start:stop:step]<br>


* Let's create an array filled with a linear sequence, starting at 0, ending at 10.

In [47]:
arr6 = np.arange(10)
print(arr6)

[0 1 2 3 4 5 6 7 8 9]


* Let's take first 3 elements<br>

In [48]:
arr6[:3]

array([0, 1, 2])

* Let's take odd value elements, starting at index 1 and except 9<br>

In [49]:
arr6[1:-1:2]

array([1, 3, 5, 7])

* A convenient way to reverse an array give a negative value to step.<br>

In [50]:
arr6[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

* Multidimensional slices work in the same way, with multiple slices separated by commas.<br>

* First, let's create 3 x 3 array:<br>

In [51]:
arr7 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [52]:
arr7

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

* Let's take last 2 row and first 1 column.<br>

In [53]:
arr7[1::,0:1:]

array([[4],
       [7]])

* Let's take first 2 row and all columns.<br>

In [54]:
arr7[:2]

array([[1, 2, 3],
       [4, 5, 6]])

* One important thing to know about array slices is that they return views rather than copies of the array data. <br>
* This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. <br>

* Let's take first 2 row and all columns and assign the result to "arr8".<br>

In [55]:
arr8 = arr7[:2]

In [56]:
arr8

array([[1, 2, 3],
       [4, 5, 6]])

* Then, change value at index (0,0) to 11<br>

In [57]:
arr8[0, 0] = 11

In [58]:
arr7

array([[11,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])

* This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without load all data.<br>

* But if you want to create a new array from sliced subarray you can use copy method.<br>

In [59]:
arr9 = arr8[:2].copy()

In [60]:
arr9

array([[11,  2,  3],
       [ 4,  5,  6]])

In [61]:
arr9[0,0] = 1

In [62]:
arr8

array([[11,  2,  3],
       [ 4,  5,  6]])

* You can check your array has a own data or not by array flags attribute.<br>

In [63]:
arr9.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

* Another useful type of operation is reshaping of arrays.<br>
<br>
* For example, if you want to put the numbers 1 through 16 in a 4 x 4 grid, you can do the following:<br>


In [64]:
arr10 = np.arange(1,17).reshape(4,4)

In [65]:
arr10

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [66]:
arr10.reshape(16)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])

### Fancy Indexing

* Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.<br>
<br>
* First, lets create a 0 through 15 in a 4 x 4 grid

In [67]:
arr11 = np.arange(0,16).reshape(4,4)

In [68]:
arr11

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

* To select out a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order.<br>

In [69]:
arr11[[3,0,2,1]]

array([[12, 13, 14, 15],
       [ 0,  1,  2,  3],
       [ 8,  9, 10, 11],
       [ 4,  5,  6,  7]])

* You can also use negative values.<br>

In [70]:
arr11[[-1,0,-2,1]]

array([[12, 13, 14, 15],
       [ 0,  1,  2,  3],
       [ 8,  9, 10, 11],
       [ 4,  5,  6,  7]])

* Passing multiple index arrays does something slightly different; it selects a 1D array of elements corresponding to each tuple of indices.<br>

* For example, for get a array with values of 1, 7, 4<br>
<br>
* Corresponding indices are (0,1), (1,3) and (1,0).

In [71]:
arr11[[0,1,1], [1,3,0]]

array([1, 7, 4])

* We can also combine fancy indexing with slicing.<br>

* For example, for get a array with values 4, 8, 12<br>
<br>
* Corresponding indices are (1,1), (2,1) and (3,1).

In [72]:
arr11[1:, [1]]

array([[ 5],
       [ 9],
       [13]])

* Keep in mind that fancy indexing, unlike slicing, always copies the data into a new array.

In [73]:
arr12 = arr11[0:2, [0,3,2,1]]

In [74]:
arr12[1] = 0

In [75]:
arr12

array([[0, 3, 2, 1],
       [0, 0, 0, 0]])

In [76]:
arr11

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

### Array Concatenation and Splitting

* Concatenation, or joining of two arrays in NumPy, is primarily accomplished through the routines np.concatenate, np.vstack, and np.hstack.<br>

In [77]:
arr13 = np.array([0, 1])
arr14 = np.array([2, 3])
arr15 = np.array([4, 5])
np.concatenate([arr13, arr14, arr15])

array([0, 1, 2, 3, 4, 5])

* For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions.<br>

In [78]:
arr16 = np.array([0,1,2])
arr17 = np.array([3,4,5])

In [79]:
np.vstack([arr16, arr17])

array([[0, 1, 2],
       [3, 4, 5]])

In [80]:
np.hstack([arr16, arr17])

array([0, 1, 2, 3, 4, 5])

* The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit.<br>

In [81]:
arr18 = np.hstack([arr16, arr17])
print(arr18)

[0 1 2 3 4 5]


In [82]:
np.split(arr18, [2,4])

[array([0, 1]), array([2, 3]), array([4, 5])]

* Notice that n split points lead to n + 1 subarrays.<br>

* The functions np.hsplit, and np.vsplit are like the same.<br>

In [83]:
arr19 = np.arange(16).reshape(4, 4)

In [84]:
arr19

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [85]:
arr20, arr21 = np.vsplit(arr19, 2)

In [86]:
arr20

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [87]:
arr21

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

### Universal Functions

* A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays.<br>

* Ufuncs exist in two flavors: unary ufuncs, which operate on a single input, and binary ufuncs, which operate on two arrays.<br>

* You can see available unary and binary universal functions list in the next slide.

#### Unary Universal Functions

<table>
<thead>
<tr>
<th style="text-align:left;">Function</th>
<th style="text-align:left;">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">abs, fabs</td>
<td style="text-align:left;">Compute the absolute value element-wise for integer, floating point, or complex values. Use fabs as a faster alternative for non-complex-valued data</td>
</tr>
<tr>
<td style="text-align:left;">sqrt</td>
<td style="text-align:left;">Compute the square root of each element. Equivalent to arr \*\* 0.5</td>
</tr>
<tr>
<td style="text-align:left;">square</td>
<td style="text-align:left;">Compute the square of each element. Equivalent to arr \*\* 2</td>
</tr>
<tr>
<td style="text-align:left;">exp</td>
<td style="text-align:left;">Compute the exponent e^x of each element</td>
</tr>
<tr>
<td style="text-align:left;">log, log10, log2, log1p</td>
<td style="text-align:left;">Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively</td>
</tr>
<tr>
<td style="text-align:left;">sign</td>
<td style="text-align:left;">Compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative)</td>
</tr>
<tr>
<td style="text-align:left;">ceil</td>
<td style="text-align:left;">Compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element</td>
</tr>
<tr>
<td style="text-align:left;">floor</td>
<td style="text-align:left;">Compute the floor of each element, i.e. the largest integer less than or equal to each element</td>
</tr>
<tr>
<td style="text-align:left;">rint</td>
<td style="text-align:left;">Round elements to the nearest integer, preserving the dtype</td>
</tr>
<tr>
<td style="text-align:left;">modf</td>
<td style="text-align:left;">Return fractional and integral parts of array as separate array</td>
</tr>
<tr>
<td style="text-align:left;">isnan</td>
<td style="text-align:left;">Return boolean array indicating whether each value is NaN (Not a Number)</td>
</tr>
<tr>
<td style="text-align:left;">isfinite, isinf</td>
<td style="text-align:left;">Return boolean array indicating whether each element is finite (non-inf, non-NaN) or infinite, respectively</td>
</tr>
<tr>
<td style="text-align:left;">cos, cosh, sin, sinh, tan, tanh</td>
<td style="text-align:left;">Regular and hyperbolic trigonometric functions</td>
</tr>
<tr>
<td style="text-align:left;">arccos, arccosh, arcsin, arcsinh, arctan, arctanh</td>
<td style="text-align:left;">Inverse trigonometric functions</td>
</tr>
<tr>
<td style="text-align:left;">logical_not</td>
<td style="text-align:left;">Compute truth value of not x element-wise. Equivalent to -arr.</td>
</tr>
</tbody>
</table>


#### Binary Universal Functions

<table>
<thead>
<tr>
<th style="text-align:center;">Operator</th>
<th style="text-align:left;">Function</th>
<th style="text-align:left;">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center;">+</td>
<td style="text-align:left;">add</td>
<td style="text-align:left;">Add corresponding elements in arrays</td>
</tr>
<tr>
<td style="text-align:center;">-</td>
<td style="text-align:left;">subtract</td>
<td style="text-align:left;">Subtract elements in second array from first array</td>
</tr>
<tr>
<td style="text-align:center;">\*</td>
<td style="text-align:left;">multiply</td>
<td style="text-align:left;">Multiply array elements</td>
</tr>
<tr>
<td style="text-align:center;">/</td>
<td style="text-align:left;">divide</td>
<td style="text-align:left;">Divide</td>
</tr>
<tr>
<td style="text-align:center;">//</td>
<td style="text-align:left;">floor_divide</td>
<td style="text-align:left;">floor divide (truncating the remainder)</td>
</tr>
<tr>
<td style="text-align:center;">\*\*</td>
<td style="text-align:left;">power</td>
<td style="text-align:left;">Raise elements in first array to powers indicated in second array</td>
</tr>
<tr>
<td style="text-align:center;"></td>
<td style="text-align:left;">maximum, fmax</td>
<td style="text-align:left;">Element-wise maximum. fmax ignores NaN</td>
</tr>
<tr>
<td style="text-align:center;"></td>
<td style="text-align:left;">minimum, fmin</td>
<td style="text-align:left;">Element-wise minimum. fmin ignores NaN</td>
</tr>
<tr>
<td style="text-align:center;">%</td>
<td style="text-align:left;">mod</td>
<td style="text-align:left;">Element-wise modulus (remainder of division)</td>
</tr>
<tr>
<td style="text-align:center;"></td>
<td style="text-align:left;">copysign</td>
<td style="text-align:left;">Copy sign of values in second argument to values in first argument</td>
</tr>
<tr>
<td style="text-align:center;">></td>
<td style="text-align:left;">greater</td>
<td style="text-align:left;">Perform element-wise comparison</td>
</tr>
<tr>
<td style="text-align:center;">>=</td>
<td style="text-align:left;">greater_equal</td>
<td style="text-align:left;">Perform element-wise comparison</td>
</tr>
<tr>
<td style="text-align:center;"><</td>
<td style="text-align:left;">less</td>
<td style="text-align:left;">Perform element-wise comparison</td>
</tr>
<tr>
<td style="text-align:center;"><=</td>
<td style="text-align:left;">less_equal</td>
<td style="text-align:left;">Perform element-wise comparison</td>
</tr>
<tr>
<td style="text-align:center;">==</td>
<td style="text-align:left;">equal</td>
<td style="text-align:left;">Perform element-wise comparison</td>
</tr>
<tr>
<td style="text-align:center;">!=</td>
<td style="text-align:left;">not_equal</td>
<td style="text-align:left;">Perform element-wise comparison</td>
</tr>
<tr>
<td style="text-align:center;">&</td>
<td style="text-align:left;">logical_and</td>
<td style="text-align:left;">Compute element-wise truth value of logical operation</td>
</tr>
<tr>
<td style="text-align:center;">|</td>
<td style="text-align:left;">logical_or</td>
<td style="text-align:left;">Compute element-wise truth value of logical operation</td>
</tr>
<tr>
<td style="text-align:center;">^</td>
<td style="text-align:left;">logical_xor</td>
<td style="text-align:left;">Compute element-wise truth value of logical operation</td>
</tr>
</tbody>
</table>

* Let's some of the ufuncs in action.<br>

In [88]:
arr22 = np.arange(-8,8).reshape(4, 4)
print(arr22)

[[-8 -7 -6 -5]
 [-4 -3 -2 -1]
 [ 0  1  2  3]
 [ 4  5  6  7]]


In [89]:
arr22 + 5

array([[-3, -2, -1,  0],
       [ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [90]:
np.abs(arr22)

array([[8, 7, 6, 5],
       [4, 3, 2, 1],
       [0, 1, 2, 3],
       [4, 5, 6, 7]])

In [91]:
np.square(arr22)

array([[64, 49, 36, 25],
       [16,  9,  4,  1],
       [ 0,  1,  4,  9],
       [16, 25, 36, 49]])

In [92]:
arr22 >= 0

array([[False, False, False, False],
       [False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [93]:
np.greater_equal(arr22, 0)

array([[False, False, False, False],
       [False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [94]:
np.maximum(arr22, 0)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 1, 2, 3],
       [4, 5, 6, 7]])

### Aggregation Functions

* NumPy provides many aggregation functions.<br>
<br>
* Additionally, most aggregates have a NaN-safe counterpart that computes the result while ignoring missing values.<br>
<br>
* You can see available aggregation functions list in the next slide.

### Aggregation Functions

<table>
<thead>
<tr>
<th style="text-align:left;">Function</th>
<th style="text-align:left;">NaN-safe Version</th>
<th style="text-align:left;">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">sum</td>
<td style="text-align:left;">nansum</td>
<td style="text-align:left;">Compute sum of elements</td>
</tr>
<tr>
<td style="text-align:left;">prod</td>
<td style="text-align:left;">nanprod</td>
<td style="text-align:left;">Compute product of elements</td>
</tr>
<tr>
<td style="text-align:left;">mean</td>
<td style="text-align:left;">nanmean</td>
<td style="text-align:left;">Compute median of elements</td>
</tr>
<tr>
<td style="text-align:left;">std</td>
<td style="text-align:left;">nanstd</td>
<td style="text-align:left;">Compute standard deviation</td>
</tr>
<tr>
<td style="text-align:left;">var</td>
<td style="text-align:left;">nanvar</td>
<td style="text-align:left;">Compute variance</td>
</tr>
<tr>
<td style="text-align:left;">min</td>
<td style="text-align:left;">nanmin</td>
<td style="text-align:left;">Find minimum value</td>
</tr>
<tr>
<td style="text-align:left;">max</td>
<td style="text-align:left;">nanmax</td>
<td style="text-align:left;">Find maximum value</td>
</tr>
<tr>
<td style="text-align:left;">argmin</td>
<td style="text-align:left;">nanargmin</td>
<td style="text-align:left;">Find index of minimum value</td>
</tr>
<tr>
<td style="text-align:left;">argmax</td>
<td style="text-align:left;">nanargmax</td>
<td style="text-align:left;">Find index of maximum value</td>
</tr>
<tr>
<td style="text-align:left;">median</td>
<td style="text-align:left;">nanmedian</td>
<td style="text-align:left;">Compute median of elements</td>
</tr>
<tr>
<td style="text-align:left;">percentile</td>
<td style="text-align:left;">nanpercentile</td>
<td style="text-align:left;">Compute rank-based statistics of elements</td>
</tr>
<tr>
<td style="text-align:left;">any</td>
<td style="text-align:left;"></td>
<td style="text-align:left;">Evaluate whether any elements are true</td>
</tr>
<tr>
<td style="text-align:left;">all</td>
<td style="text-align:left;"></td>
<td style="text-align:left;">Evaluate whether all elements are true</td>
</tr>
</tbody>
</table>

* Let's some of the aggregation functions in action.<br>

In [95]:
arr22

array([[-8, -7, -6, -5],
       [-4, -3, -2, -1],
       [ 0,  1,  2,  3],
       [ 4,  5,  6,  7]])

In [96]:
np.sum(arr22)

-8

In [97]:
np.mean(arr22)

-0.5

In [98]:
np.std(arr22)

4.6097722286464435

In [99]:
np.var(arr22)

21.25

In [100]:
np.max(arr22)

7

### Broadcasting

* The term broadcasting describes how numpy to work with arrays of different shapes when performing arithmetic operations.<br>

* For example, we can just as easily add a scalar (think of it as a zero-dimensional array) to an array.<br>

In [101]:
arr23 = np.arange(16).reshape(4,4)

In [102]:
arr23 * 2

array([[ 0,  2,  4,  6],
       [ 8, 10, 12, 14],
       [16, 18, 20, 22],
       [24, 26, 28, 30]])

* We can think of this as an operation that stretches or duplicates the value 2 into the array like at the below.<br>

In [103]:
np.full((4, 4), 2)

array([[2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

* Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.<br>
* Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.<br>
* Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.<br>

In [104]:
arr24 = np.arange(4)
arr25 = np.ones(5)

In [106]:
arr24 + arr25

ValueError: operands could not be broadcast together with shapes (4,) (5,) 

In [107]:
arr24.reshape(4,1) + arr25

array([[1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

In [108]:
arr26 = np.ones([3,4])

In [109]:
arr26

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [110]:
arr24

array([0, 1, 2, 3])

In [111]:
arr24 + arr26

array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

### Boolean Arrays as Masks

* In the previous section, we look at comparison operators.<br>
* Comparsion operators return a boolean array.<br>
* We can use that boolean array as a mask for select the values from array.<br>

In [112]:
arr27 = arr22.copy()

In [113]:
arr27

array([[-8, -7, -6, -5],
       [-4, -3, -2, -1],
       [ 0,  1,  2,  3],
       [ 4,  5,  6,  7]])

In [114]:
mask = arr27 >= 0

In [115]:
mask

array([[False, False, False, False],
       [False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [116]:
arr27[mask]

array([0, 1, 2, 3, 4, 5, 6, 7])

* After masking process return array has a own data.<br>

In [117]:
arr28 = arr27[mask]

In [118]:
arr28

array([0, 1, 2, 3, 4, 5, 6, 7])

In [119]:
arr28[0] = 10

In [120]:
arr27

array([[-8, -7, -6, -5],
       [-4, -3, -2, -1],
       [ 0,  1,  2,  3],
       [ 4,  5,  6,  7]])