## MG-GY 8401: Programming for Business Intelligence and Analytics
## Lecture 3

We will take a look at some useful features of the `numpy` package incuding

1. Accessing  
1. Manipulating
1. Calculating

Note that Python has several versions. We will be using **Python 3.7**.

---
### Object Oriented Programming

Three main concepts:
- Objects
- Classes
- Inheritance

A class specifies the attributes and methods of the object. Each object is an instance of the class. 

```python
class NameOfTheClass(parentclass,...)
      
      def __init__(self, value_1, value_2,...):  # class constructor
         self.attribute1 = value_1
         self.attribute2 = value_2
        ...
       
      def method_1(self, value...):
         self.attribute_1 = value
```    
A classes can inherit attributes and methods from other classes. A class that inherits from another class is called a child class. A class that is inherited by another class is called a parent class.

Green letters - reserved key works


In [1]:
class SayHello(object):
    def __init__(self, input_name):
        self.attribute = input_name
        
    def method(self):
        print(f"hello {self.attribute}")

instance_of_SayHello = SayHello('Nica')
instance_of_SayHello.method()

hello Nica


In [1]:
class bicycle(object):
    def __init__(self, bike_type = None, n_gears = 1, handlebar = 'Drop'):
        print("...building the object...")
        self.bicycle_type = bike_type
        self.number_of_gears = n_gears
        self.handlebar_type = handlebar
        
    def get_handlebar_options(self, k=4):
        handle_options = ['Drop','Cruiser','Flat','Bullhorn']
        print(handle_options[:k])

Note that `self` is a keyword. The `self` keyword indicates that the method or attribute is accessible to instances of objects from the class.

In [2]:
my_bike = bicycle() # instantiating 

...building the object...


Having defined the class, we can construct instances of objects.

In [3]:
my_bike.bicycle_type = 'Mountain' # accessing an instance's variables
my_bike.number_of_gears = 3     

We can access and modify attributes.

In [4]:
my_bike.get_handlebar_options(2) # accessing an instance's method

['Drop', 'Cruiser']


We can use methods.

In [5]:
your_bike = bicycle(bike_type='Road', handlebar='Bullhorn') #instantiating

print(your_bike)

...building the object...
<__main__.bicycle object at 0x000001898279D388>


We can define a child class.

In [6]:
class mountain_bike(bicycle): 
    def __init__(self, n_gears = 10, handlebar = 'Bullhorn'):
        super().__init__('Mountain', n_gears, handlebar)
        
    def get_handlebar_options(self, k=3):
        handle_options = ['Drop','Flat','Bullhorn']
        print(handle_options[:k])

In [7]:
my_mountain_bike = mountain_bike()
my_mountain_bike.get_handlebar_options()

...building the object...
['Drop', 'Flat', 'Bullhorn']


#### Operator Overloading

Classes can modify built-in Python operators. These built-in operators have double underscores.

- `__repr__` call when printed or converted to a string
- `__add__` for + operator X + Y
- `__lt__`, `__gt__`, for comparisons X < Y, X > Y
- Many more...



In [8]:
class bicycle(object):
    def __init__(self, bike_type = None, n_gears = 1, handlebar = 'Drop'):
        print("...building the object...")
        self.bicycle_type = bike_type
        self.number_of_gears = n_gears
        self.handlebar_type = handlebar
        
    def get_handlebar_options(self, k=4):
        handle_options = ['Drop','Cruiser','Flat','Bullhorn']
        print(handle_options[:k])

    def __repr__(self):      
        return ""

Note that the `__repr__` method determines the output of the `print` function.

In [9]:
class mountain_bike(bicycle): 
    def __init__(self, n_gears = 10, handlebar='Bullhorn'):
        super().__init__('Mountain', n_gears, handlebar)
        
    def get_handlebar_options(self, k=3):
        handle_options = ['Drop','Flat','Bullhorn']
        print(handle_options[:k])
        
    def __repr__(self):      
        return 'Type: ' + self.bicycle_type + '\n Gears: ' + str(self.number_of_gears) + '\n Handlebar: ' + self.handlebar_type   

In [10]:
my_mountain_bike = mountain_bike()
print(my_mountain_bike)

...building the object...
Type: Mountain
 Gears: 10
 Handlebar: Bullhorn


Notice that the __repr__ function in the superclass does not include information about "suspension", therefore we need to extend (rather than replace) the __repr__ function.

In [11]:
class downhill_mountain_bike(mountain_bike): 
    def __init__(self, n_gears = 10, handlebar='Bullhorn', suspension = None):
        super().__init__(n_gears, handlebar)
        self.suspension_type = suspension
        
    def __repr__(self):
        return super().__repr__() +'\n'+'Suspension: ' + self.suspension_type

In [12]:
my_downhill_bike = downhill_mountain_bike(suspension = 'front_and_rear')
print(my_downhill_bike)

...building the object...
Type: Mountain
 Gears: 10
 Handlebar: Bullhorn
Suspension: front_and_rear


#### Class Methods

In [13]:
class ExampleClass(object):
    def __init__(self):
        pass
    
    def example_instance_method():
        return "Hello World"

example_object = ExampleClass()

# example_object.example_instance_method()
ExampleClass.example_instance_method()

'Hello World'

---
### Arrays



We need to import numpy 
```python
import numpy
import numpy as np
from numpy import *
```

We tend to use the alias `np`.

In [14]:
import numpy as np

Arrays are similar to lists. However we note some differences
- Every element in an array must be of the same type, typically a numeric type like float or int
- Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists
- Each array dimensions is called an axis
- Axes are numbered starting from 0
- Elements are accessed using [] (similar to lists)

In [15]:
print("1D arrays\n")
# creating 1D arrays from a list
a1d = np.array([1,3,5,7,9,10])

# creading 1D arrays from built-in functions
b1d = np.zeros((8))
c1d = np.ones((10))
d1d = np.arange(10)
e1d = np.linspace(1,2,5)

print('a1d=',a1d)
print('b1d=',b1d)
print('c1d=',c1d)
print('d1d=',d1d)
print('e1d=',e1d)

print("\n 2D arrays\n")
# creating 2D arrays from lists
a2d = np.array([[1,3,5,7,9,11],
                  [2,4,6,8,10,12],
                  [0,1,2,3,4,5]])

# creading 2D arrays from built-in functions
b2d = np.zeros((8,3))
c2d = np.ones((10,5))

print('a2d=\n',a2d)
print('b2d=\n',b2d)
print('c2d=\n',c2d)

1D arrays

a1d= [ 1  3  5  7  9 10]
b1d= [0. 0. 0. 0. 0. 0. 0. 0.]
c1d= [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
d1d= [0 1 2 3 4 5 6 7 8 9]
e1d= [1.   1.25 1.5  1.75 2.  ]

 2D arrays

a2d=
 [[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0  1  2  3  4  5]]
b2d=
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
c2d=
 [[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


Numpy arrays are objects called _ndarrays_
- ndarray.ndim - the number of axes (dimensions) of the array
- ndarray.shape - a tuple of integers indicating the size of the array in each dimension
- ndarray.size - the total number of elements of the array
- ndarray.dtype - an object describing the type of the elements in the array
- ndarray.itemsize - the size in bytes of each element of the array

In [16]:
print(" ndim")
print(a1d.ndim,b1d.ndim,c1d.ndim)
print(a2d.ndim,b2d.ndim,c2d.ndim)
print("\n shape")
print(a1d.shape,b1d.shape,c1d.shape)
print(a2d.shape,b2d.shape,c2d.shape)
print("\n size")
print(a1d.size,b1d.size,c1d.size)
print(a2d.size,b2d.size,c2d.size)
print("\n dtype")
print(a1d.dtype,b1d.dtype,c1d.dtype)
print(a2d.dtype,b2d.dtype,c2d.dtype)
print("\n itemsize")
print(a1d.itemsize,b1d.itemsize,c1d.itemsize)
print(a2d.itemsize,b2d.itemsize,c2d.itemsize)

 ndim
1 1 1
2 2 2

 shape
(6,) (8,) (10,)
(3, 6) (8, 3) (10, 5)

 size
6 8 10
18 24 50

 dtype
int32 float64 float64
int32 float64 float64

 itemsize
4 8 8
4 8 8


#### Accessing array elements 
- indices range from 0 to $k-1$, where $k$ is the number of enries in the axis.

In [17]:
print("1D Array \n", a1d)
print("Access Entries \n",a1d[0],a1d[3],a1d[-1])
print("2D Array \n", a2d)
print("Access Rows \n", a2d[1])
print("Access Entries \n", a2d[0,1],a2d[1,2],a2d[-1,-1])

1D Array 
 [ 1  3  5  7  9 10]
Access Entries 
 1 7 10
2D Array 
 [[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0  1  2  3  4  5]]
Access Rows 
 [ 2  4  6  8 10 12]
Access Entries 
 3 6 5


- Iterating over an array is done with respect to the first axis
- The _.flatten_ allows for traverse all elements

In [18]:
for r in a2d:
    print(r)
    
for i in a2d.flatten():
    print(i)

[ 1  3  5  7  9 11]
[ 2  4  6  8 10 12]
[0 1 2 3 4 5]
1
3
5
7
9
11
2
4
6
8
10
12
0
1
2
3
4
5


####  Slicing
- Array slicing works in the same way as sequence slicing, but in multiple dimensions
- Like lists of lists, omitting an index is considered a complete slice
- A slice is a view of the original array (similar to a reference), that is, data is shared, not copied

In [19]:
print(a2d)
print('fixing a row and traversing columns (equivalent to a2d[1])\n',
      a2d[1,:])

print('fixing a column and traversing rows\n',a2d[:,2])

print('traversing an array block\n',a2d[1:,2:5])

print('traversing a subset of rows \n',a2d[[0,2]])

print('traversing a subset of columns \n',a2d[:,[0,2,5]])

print('traversing a subset of array elements \n',a2d[[0,1,2],[0,2,5]])

[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0  1  2  3  4  5]]
fixing a row and traversing columns (equivalent to a2d[1])
 [ 2  4  6  8 10 12]
fixing a column and traversing rows
 [5 6 2]
traversing an array block
 [[ 6  8 10]
 [ 2  3  4]]
traversing a subset of rows 
 [[ 1  3  5  7  9 11]
 [ 0  1  2  3  4  5]]
traversing a subset of columns 
 [[ 1  5 11]
 [ 2  6 12]
 [ 0  2  5]]
traversing a subset of array elements 
 [1 6 5]


#### Copies and Views
- A _view_ is created by slicing an array
- A _view_ is like a reference to part to an array
- Changing elements of the _view_ will change the original array
- If necessary, you can explicitly make a copy

In [20]:
b2d = a2d[:,:3]
print(a2d)
print(b2d)
print('\n')

b2d[2,1] = -1
print(a2d)
print(b2d)
print('\n')

c2d = a2d[:,:3].copy()
c2d[0,0] = -1
print(a2d)
print(c2d)

[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0  1  2  3  4  5]]
[[1 3 5]
 [2 4 6]
 [0 1 2]]


[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0 -1  2  3  4  5]]
[[ 1  3  5]
 [ 2  4  6]
 [ 0 -1  2]]


[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0 -1  2  3  4  5]]
[[-1  3  5]
 [ 2  4  6]
 [ 0 -1  2]]


#### Reshaping

We can transpose to flip rows and columns.

In [21]:
print(a2d)
print(a2d.shape)
print("\n")

a2d_transposed = a2d.T
print(a2d_transposed)
print(a2d_transposed.shape)

[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0 -1  2  3  4  5]]
(3, 6)


[[ 1  2  0]
 [ 3  4 -1]
 [ 5  6  2]
 [ 7  8  3]
 [ 9 10  4]
 [11 12  5]]
(6, 3)


We can reshape to change the shape of the array.

In [22]:
print(a2d, "\n")
print(a2d * np.zeros(a2d.shape), "\n")

print(a2d * 0.0, "\n")

[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0 -1  2  3  4  5]] 

[[ 0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.]
 [ 0. -0.  0.  0.  0.  0.]] 

[[ 0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.]
 [ 0. -0.  0.  0.  0.  0.]] 



#### Filtering

We can use the values True and False to select rows.

In [23]:
print(a2d,"\n")
print(a2d > 6,"\n")
print(a2d[a2d > 6],"\n")

[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0 -1  2  3  4  5]] 

[[False False False  True  True  True]
 [False False False  True  True  True]
 [False False False False False False]] 

[ 7  9 11  8 10 12] 



Note that we have not used conditional statements.

#### Broadcasting

Broadcasting allows us to apply operations to arrays with different shapes.

In [24]:
left_array = np.array([0,0,0,1,1,1,2,2,2,3,3,3]).reshape(4,3)
right_array = np.array([1,2,3])
right_array_repeated = np.array([[1,2,3],[1,2,3],[1,2,3],[1,2,3]])

print(left_array, "\n")
print(right_array, "\n")
print(right_array_repeated, "\n")


print(left_array + right_array, "\n")
print(left_array + right_array_repeated, "\n")

[[0 0 0]
 [1 1 1]
 [2 2 2]
 [3 3 3]] 

[1 2 3] 

[[1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]] 

[[1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]] 

[[1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]] 



Remember that we have two situations where we can broadcast arrays.

In [25]:
left_array = np.array([0,3,6,9]).reshape(4,1)
right_array = np.array([3,4,5])
right_array_repeated = np.array([[3,4,5],[3,4,5],[3,4,5],[3,4,5]])

print(left_array, "\n")
print(right_array, "\n")
print(right_array_repeated, "\n")


print(left_array - right_array, "\n")
print(left_array - right_array_repeated, "\n")

[[0]
 [3]
 [6]
 [9]] 

[3 4 5] 

[[3 4 5]
 [3 4 5]
 [3 4 5]
 [3 4 5]] 

[[-3 -4 -5]
 [ 0 -1 -2]
 [ 3  2  1]
 [ 6  5  4]] 

[[-3 -4 -5]
 [ 0 -1 -2]
 [ 3  2  1]
 [ 6  5  4]] 



#### Operations

We can apply operations like addition entry by entry. 

In [26]:
print(a2d,"\n")
print(a2d + a2d,"\n")

[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0 -1  2  3  4  5]] 

[[ 2  6 10 14 18 22]
 [ 4  8 12 16 20 24]
 [ 0 -2  4  6  8 10]] 



In [27]:
print(a2d,"\n")
print(np.sin(a2d),"\n")

[[ 1  3  5  7  9 11]
 [ 2  4  6  8 10 12]
 [ 0 -1  2  3  4  5]] 

[[ 0.84147098  0.14112001 -0.95892427  0.6569866   0.41211849 -0.99999021]
 [ 0.90929743 -0.7568025  -0.2794155   0.98935825 -0.54402111 -0.53657292]
 [ 0.         -0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]] 



#### Vectorization 

Vectorization allows us to apply bulk operations without loops.

In [28]:
def adder_loop(x,y):
    output = np.zeros(x.shape[0])
    
    for i in range(x.shape[0]):
        output[i] = x[i] + y[i]
    
    return output

adder_loop(np.arange(8), np.arange(8))

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14.])

In [29]:
def adder(x, y):
    return x + y

adder_vectorized = np.vectorize(adder, otypes=[np.float64])

adder_vectorized(np.arange(8), np.arange(8))

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14.])

We have an improvement in performance.

In [30]:
%%timeit 

arr = np.arange(10000)
adder_vectorized(arr, arr)

1.72 ms ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [31]:
%%timeit 

arr = np.arange(10000)
adder_loop(arr, arr)

4.42 ms ± 405 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [32]:
%%timeit 

arr = np.arange(10000)
np.add(arr, arr)

11.4 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


#### Example

Suppose you're trying to meet your friend in either Central Park (denoted C) or Riverside Park (denoted R). 

Right now, you know that the chances of your friend being at $x$ are

| &nbsp; | Chance |
| --- | --- | 
| **x = C** | 0.4 |
| **x = R** | 0.6 | 

We want to determine the chance of your friend being at $y$ in 1 hour

| &nbsp; | Chance |
| --- | --- | 
| **y = C** | ? |
| **y = R** | ? | 

a. Use ```numpy``` to create a $2 \times 1$ array called ```v``` with chance of being in Central Park and Riverside Park. Try testing the dimensions.

In [33]:
v = np.array([0.4,0.6])
v.shape

(2,)

b. Since your friend likes to wander around at random, you need to make a table about the likelihood of locations   

| &nbsp; | x = C | x=  R| &nbsp; |
| --- | --- | --- | --- |
| **y = C** | 0.3 | 0.2 | 0.5 
| **y = R** | 0.1 | 0.4 | 0.5
| &nbsp; | 0.4 | 0.6 

Use ```numpy``` to create a $2 \times 2$ matrix ```M``` with chance of walking between parks. Try testing the dimensions.

In [34]:
M = np.array([
    [0.3,0.2],
    [0.1,0.4]])

M.shape

(2, 2)

Use the ```numpy``` sum method to sum the entries over columns and rows. 

In [35]:
print('Sum over Columns')
print(np.sum(M,axis=1))

print('Sum over Rows')
print(np.sum(M,axis=0))

Sum over Columns
[0.5 0.5]
Sum over Rows
[0.4 0.6]


In [36]:
print('P(y=C) + P(y=R) = ')
print(np.sum(M,axis=1).sum())

print('P(x=C) + P(x=R) = ')
print(np.sum(M,axis=1).sum())

P(y=C) + P(y=R) = 
1.0
P(x=C) + P(x=R) = 
1.0


Why should this be the case?

c. Using the definition of conditional probability, adjust the numbers in the tables to show the probability of $y$ given $x$

| &nbsp; | x = C | x=  R| &nbsp; |
| --- | --- | --- | --- |
| **y = C** | 0.3/0.4 | 0.2/0.6 | 
| **y = R** | 0.1/0.4 | 0.4/0.6 |


Adjust the probabilities in `M` to form `N` containing these four numbers 

In [37]:
N = np.multiply(M, np.array([[1/0.4, 1/0.6], [1/0.4, 1/0.6] ]))
print(N)

[[0.75       0.33333333]
 [0.25       0.66666667]]


d. Note that $$P(y = C) = P(y = C \text{ and } x = C) + P(y = C \text{ and } x = R)$$ This can be rewritten as $$P(y = C) = P(y = C | x = C) P(x =C) + P(y = C | x = R) P(x = R)$$ Compute these numbers using the values in `N` and `v`.

In [38]:
print(np.dot(N,v))

[0.5 0.5]


Therefore we have the probabilities for the location in 1 hour

In [39]:
print('Probability of y = C is')
print(np.dot(N,v)[0])

print('Probability of y = R is')
print(np.dot(N,v)[1])

Probability of y = C is
0.5
Probability of y = R is
0.5


In [None]:
import numpy as np

def rook_attack():
    rook_array = [[1,0,0,0],[0,1,0,0],[]]
    np.any(rook_array.sum(axis = 0));
