## Matrix Multiplication

Given a matrix $X$ with $n$ rows and $m$ columns,

And a matrix $Y$ with $m$ rows and $p$ columns,

The matrix $Z$ is a product of $X$ and $Y$ 

if

$\forall{i} \in 1\dots n, \forall{j} \in 1\dots p$

$c_{ij} = \displaystyle\sum_{k=1}^{m} X_{ik}Y_{kj}$

In [1]:
left = [[1, 2, 3], [4, 5, 6]] # 2 rows and 3 columns
right = [[1, 2], [3, 4], [5, 6]] # 3 rows and 2 columns

In [169]:
def get_row(m, i):
    return m[i]

def get_col(m, i):
    return [row[i] for row in m]

def dot_vector(l, r):
    return sum([t[0] * t[1] for t in list(zip(l, r))])

def zero_m(h, w):
    m = []
    for r in range(h):
        m.append([0] * w)
    return m
            

def shape_m(m):
    return (len(m), len(m[0]))  # (rows, cols)

def dot_matrix(l, r):
    rows_l, cols_l = shape_m(l)
    rows_r, cols_r = shape_m(r)
    
    if not cols_l == rows_r:
        raise ValueError('columns on left must equal rows on right')
        
    zeroes = zero_m(rows_l, cols_r)
    
    for row in range(rows_l):
        for col in range(cols_r): 
            zeroes[row][col] = dot_vector(get_row(l, row), get_col(r, col)) # c_ij
            
    return zeroes
    
    

In [170]:
dot_matrix(left, right)

[[22, 28], [49, 64]]

## `numpy`

The `numpy` library provides a more efficient implementation.

In [171]:
import numpy as np

In [172]:
p = np.array(left)
q = np.array(right)

In [173]:
p.dot(q)

array([[22, 28],
       [49, 64]])

A useful concept in Jupyter Notebook is the *magic command* which is similar to a macro.  Magic commands are always prefixed with a percent sign (*%*)

The `timeit` magic command will run the code in the rest of the line over and over and return the average run time.

In [174]:
%timeit dot_matrix(left, right)

11.2 µs ± 654 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [175]:
%timeit p.dot(q)

1.14 µs ± 40.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


This shows that the brute force implementation of matrix multiplication is a lot slower than the `numpy` implementation which is written in C.

In [None]:
data = [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

In [None]:
def sum_vector(*args):
    length = len(args[0])
    for v in args:
        if not len(v) == length:
            raise ValueError('length of all vectors must be the same')
    return [sum(t) for t in list(zip(*args))]

sum_vector(*data)

In [None]:
np.array(data[0]) + np.array(data[1]) + np.array(data[2])

## `pandas`

The `pandas` library is for statistical analysis

In [None]:
import pandas as pd

Data can be loaded from a csv file into a data frame.  This is the fundamental data structure in pandas.  It is similar to the data frame in R.

In [None]:
df = pd.read_csv('dow_jones_index/dow_jones_index.csv')

In [None]:
df.head()

Take a subset of the columns and copy them to suppress a warning.

In [None]:
v = df[df.columns[1:8]].copy()

In [None]:
v.head()

The values in the columns with a dollar sign are currently of type `str`.  However, we'd need to work with them as `float`.  This is easy in `pandas` converting 3000 values with 2 lines of code.  The `lambda` is an anonymous function that slices the value thus omitting the dollar sign and then converting it to a `float`.  This function is applied to every value in a column and then assigned back to that column in the data frame.

In [None]:
for column in v.columns[2:6]:
    v[column] = v[column].apply(lambda x: float(x[1:]), 1)

In [None]:
v.head()

A data frame can be filtered with boolean expressions by one or more columns.

In [None]:
listing = v[v['stock'] == 'HD']

In [None]:
listing.head()

## `matplotlib`

The `matplotlib` library is for visualization of data.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
y = listing.close
x = list(range(len(y)))

plt.xticks(np.arange(len(x)), listing.date, rotation='vertical')

y2 = listing.volume
plt.bar(x, y2, color='#cccccc')
ax2 = plt.twinx()
ax2.plot(x, y, color='k')

## Dashboard

Using `ipywidgets` will create interactive JavaScript controls which can be used to manipulate data with a friendly interface instead of code

In [None]:
import ipywidgets as widgets
from IPython.core.display import display

In [None]:
stock_dict = {}

for stock in v.stock.unique():
    stock_dict[stock] = v[v.stock == stock]

date_slider = widgets.SelectionRangeSlider(options=list(stock_dict['AA'].date), 
    index=(0, 24), 
    description='Dates', 
    layout=widgets.Layout(width='500px'))
show_vol2 = widgets.Checkbox(description='Show Volume')
dd_select3 = widgets.Dropdown(options=sorted(list(stock_dict.keys())))
button = widgets.Button(description='Show Graph')

def button_click(b):
    stock_row = stock_dict[dd_select3.value]
    start_index = date_slider.index[0]
    end_index = date_slider.index[1]
    stock_row = stock_row[start_index:(end_index+1)]
    
    y = stock_row.close
    x = list(range(len(y)))
    
    plt.xticks(np.arange(len(x)), stock_row.date, rotation='vertical')
    
    if show_vol2.value == False:
        plt.plot(x, y, color='k')
    else:
        y2 = stock_row.volume
        plt.bar(x, y2, color='#cccccc')
        ax2 = plt.twinx()
        ax2.plot(x, y, color='k')

button.on_click(button_click)

display(date_slider)
display(show_vol2)
display(dd_select3)
display(button)