<p style="text-align: center; font-size: 192%"> Computational Finance </p>
<img src="img/ABSlogo.svg" alt="LOGO" style="display:block; margin-left: auto; margin-right: auto; width: 90%;">
<p style="text-align: center; font-size: 150%"> Week 2: Dealing with Data </p>
<p style="text-align: center; font-size: 75%"> <a href="#copyrightslide">Copyright</a> </p>

# Outline

* Broadcasting
* Matrix multiplication
* Debugging
    * Methods
    * In Jupyter

## Broadcasting

* Broadcasting is a useful functionality of numpy, but can be tricky to understand.
* "The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations".
* "Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes"
* See [NumPy documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html) for more details


* NumPy compares the shapes of two arrays dimension-wise. 
* It starts with the trailing (i.e. rightmost) dimensions, and then works its way left. 
* Two dimensions are compatible if
  * they are equal, or
  * one of them is 1 (or not present).
* *Tip*: write down the dimensions and draw the arrays.

In [9]:
import numpy as np

* The idea of broadcasting in a picture:
<img src="img/broadcasting.png" alt="Broadcasting" style="display:block; margin-left: auto; margin-right: auto; width: 80%;">

What is the shape?

In [10]:
a = np.arange(8).reshape(2,4)  # (2,4)
b = np.arange(4)               # (4,)
a

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [11]:
b

array([0, 1, 2, 3])

In [12]:
a + b

array([[ 0,  2,  4,  6],
       [ 4,  6,  8, 10]])

What is the shape?

In [13]:
a = np.arange(8).reshape(4,2)  # (4,2)
b = np.arange(4).reshape(1,4)  # (1,4)
a,b;

In [14]:
a + b

ValueError: operands could not be broadcast together with shapes (4,2) (1,4) 

What is the shape?

In [15]:
a = np.arange(8).reshape(4,2)  # (4,2)
b = np.arange(4).reshape(4,1)  # (4,1)
a, b;

In [None]:
a + b

What cell does NOT throw an error?

In [16]:
a = np.arange(8).reshape(4,2)   # (4,2)
b = np.arange(4)                # (4,)

In [21]:
a

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

In [20]:
b

array([0, 1, 2, 3])

In [17]:
a + b

ValueError: operands could not be broadcast together with shapes (4,2) (4,) 

In [18]:
a + b[:, np.newaxis]

array([[ 0,  1],
       [ 3,  4],
       [ 6,  7],
       [ 9, 10]])

In [19]:
a + b[np.newaxis, :]

ValueError: operands could not be broadcast together with shapes (4,2) (1,4) 

Dimensions of `b`

In [22]:
b   # (4,)

array([0, 1, 2, 3])

In [23]:
b[:, np.newaxis]  # (4,1)   

array([[0],
       [1],
       [2],
       [3]])

In [24]:
b[np.newaxis, :]  # (1,4)

array([[0, 1, 2, 3]])

In [25]:
np.newaxis == None

True

In [26]:
b[None, :]  # (1,4)

array([[0, 1, 2, 3]])

## Matrix multiplication

* Note than NumPy has reserved `*` for element-by-element multiplication.
* We cannot use `*` for [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication#Definition) 
* Instead, one can use the NumPy-function `dot` or `@` in Python 3.x:


In [27]:
A = np.arange(6).reshape(2,3) #(2,3)
b = np.arange(3)              #(3,)
A, b

(array([[0, 1, 2],
        [3, 4, 5]]),
 array([0, 1, 2]))

In [28]:
A*b
# 각각 자기 자리만 곱합
# 0*0, 1*1 ,2*4

array([[ 0,  1,  4],
       [ 0,  4, 10]])

In [30]:
A@b

array([ 5, 14])

In [31]:
A@b[:,np.newaxis]

array([[ 5],
       [14]])

The module `numpy.linalg` has a standard set of matrix decompositions and functions calculating things like inverse (`inv`), trace (`trace`), determinant (`det`) and eigenvalues and eigenvectors (`eig`):

In [32]:
A = np.arange(4).reshape(2,2) #(2,2)
A


array([[0, 1],
       [2, 3]])

In [33]:
np.linalg.inv(A)

array([[-1.5,  0.5],
       [ 1. ,  0. ]])

Finally, in order to transpose a matix/vector we could use:

In [34]:
A.T

array([[0, 2],
       [1, 3]])

## Debugging

* 'Definition': find and fix mistakes (a.k.a. bugs) in code
* Bad news
    * Inevitable and frustrating
    * Hard to predict how long it takes to solve
* Good news
    * Tools and methods available
    * Gain experience and learn from them

### Debugging methods

*Credit: mostly based on [Christoph Deil's talk at PyConDE](https://github.com/cdeil/pyconde2019-debugging).*



#### Read the code
* Can you spot (obvious) mistakes/typos?
* Need to know syntax and structure, e.g. how to define a function.
* Check documentation of Jupyter / Python / package
* Use `help()` or `Shift`+`Tab`

#### Read traceback
* Errors are your friend. Mistakes without error are harder to spot and may be overlooked.
* Error message holds useful information
    * *what* type of error.
    * *where*  was it raised, at what line of code.
* Bug is usually in your code, not in the Python packages such as numpy or pandas.

In [35]:
a = np.arange(8).reshape(4,2)   # (4,2)
b = np.arange(4)                # (4,)
a+b

ValueError: operands could not be broadcast together with shapes (4,2) (4,) 

#### Print intermediate variables
* Seems easy, but ...
* Clutters code and output
* Need to choose what to print and where



#### Debugger
* Execute code line by line
* Breakpoints
* Visual debugger 
    * Available in IDEs such as [PyCharm](https://www.jetbrains.com/pycharm/) and [Visual Studio Code](https://code.visualstudio.com/)
* Command line debugger

#### Rubber duck debugging
* Explain your problem to a *rubber duck*.
* Often then you realize what the problem is.
* Doesn't have to be a duck per se:

In [36]:
from IPython.display import HTML
# Youtube: GoogleDevelopers - #HowICode Rubber Duck Debugging
HTML('<iframe width="900" height="506.25" src="https://www.youtube.com/embed/fdaqudiSo5c" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')



### Debugging in Jupyter
* Visual debugger under development, see [GitHub](https://github.com/jupyterlab/debugger).
* Command line debugger: `pdb`, python debugger ([docs](https://docs.python.org/3/library/pdb.html), [debugger commands](https://docs.python.org/3/library/pdb.html#debugger-commands)).

* Post mortem analysis using `%debug` ([docs](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-debug)).
    * Step into function *after* the error occurred.

In [37]:
a = np.arange(11)
b = np.arange(5,15)
foo = a > 5
b[foo]  #feeding a boolean index for slicing

IndexError: boolean index did not match indexed array along dimension 0; dimension is 10 but corresponding boolean dimension is 11

In [38]:
%debug

> [1;32m<ipython-input-37-346e457f5a3c>[0m(4)[0;36m<module>[1;34m()[0m
[1;32m      1 [1;33m[0ma[0m [1;33m=[0m [0mnp[0m[1;33m.[0m[0marange[0m[1;33m([0m[1;36m11[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      2 [1;33m[0mb[0m [1;33m=[0m [0mnp[0m[1;33m.[0m[0marange[0m[1;33m([0m[1;36m5[0m[1;33m,[0m[1;36m15[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      3 [1;33m[0mfoo[0m [1;33m=[0m [0ma[0m [1;33m>[0m [1;36m5[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m----> 4 [1;33m[0mb[0m[1;33m[[0m[0mfoo[0m[1;33m][0m  [1;31m#feeding a boolean index for slicing[0m[1;33m[0m[1;33m[0m[0m
[0m
--KeyboardInterrupt--


* Breakpoints stop/pause the code so you can inspect the state
    * Set breakpoints using `set_trace()`
    * Alternative: `breakpoint()` from Python 3.7

In [3]:
import numpy as np
from IPython.core.debugger import set_trace   # import set_trace()
a = np.arange(11)
b = np.arange(5,15)
set_trace()  # breakpoint
foo2 = a > 5
b[foo2]  #feeding a boolean index for slicing

--Return--
None
> [1;32m<ipython-input-3-af8d2d3d84ed>[0m(5)[0;36m<module>[1;34m()[0m
[1;32m      3 [1;33m[0ma[0m [1;33m=[0m [0mnp[0m[1;33m.[0m[0marange[0m[1;33m([0m[1;36m11[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      4 [1;33m[0mb[0m [1;33m=[0m [0mnp[0m[1;33m.[0m[0marange[0m[1;33m([0m[1;36m5[0m[1;33m,[0m[1;36m15[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m----> 5 [1;33m[0mset_trace[0m[1;33m([0m[1;33m)[0m  [1;31m# breakpoint[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      6 [1;33m[0mfoo2[0m [1;33m=[0m [0ma[0m [1;33m>[0m [1;36m5[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m      7 [1;33m[0mb[0m[1;33m[[0m[0mfoo2[0m[1;33m][0m  [1;31m#feeding a boolean index for slicing[0m[1;33m[0m[1;33m[0m[0m
[0m


In [None]:
mysum = 0
for i in range(5):
    set_trace()
    mysum += i

### So... what should we do?

* First design, then write code
* Write simple clean code
* Add comments and docstrings
* Test your code at intermediate steps (functions, loops)

#### When you run into a bug
* Read code and error message
* Check for common mistakes
    * Typos (may be hard to spot!)
    * Syntax errors
    * Wrong shape
    * Wrong type
    * Overwriting variables
    * Not assigning output
    * Loop variable
    * ...


* Inspect variables at intermediate steps (functions, loops)
* Pen and paper
* Clean slate


* More tips: [Debugging for beginners](https://docs.microsoft.com/en-us/visualstudio/debugger/debugging-absolute-beginners?view=vs-2019) by Microsoft, aimed at VS Code users, but contains general tips on steps when debugging.