<h1>01 Numpy</h1>
$\newcommand{\Set}[1]{\{#1\}}$ 
$\newcommand{\Tuple}[1]{\langle#1\rangle}$ 
$\newcommand{\v}[1]{\pmb{#1}}$ 
$\newcommand{\cv}[1]{\begin{bmatrix}#1\end{bmatrix}}$ 
$\newcommand{\rv}[1]{[#1]}$ 
$\DeclareMathOperator{\argmax}{arg\,max}$ 
$\DeclareMathOperator{\argmin}{arg\,min}$ 
$\DeclareMathOperator{\dist}{dist}$
$\DeclareMathOperator{\abs}{abs}$

<h2>Preliminaries</h2>
<p>
    One of my first code cells always looks like this:
</p>

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

<p>
    The first two lines mean that modules get reloaded before executing anything. So if I have my own module
    and I change it in an editor, then I can run the code without worrying about how to reload the changed
    module: it's done automatically.
</p>
<p>
    The third line says that when we draw graphs, they will appear in the notebook itself, not in a separate
    window.
</p>

<p>
    My next cell usually contains these three imports:
</p>

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<p>
    My third code cell also contains <code>import</code> statements, ones that are specific to this notebook. 
    Here's an example:
</p>

In [3]:
from math import sqrt

<p>
    So now we can compute square roots:
</p>

In [4]:
sqrt(81)

9.0

<h2>Numpy</h2>
<p>
    Numpy is short for Numerical Python. It offers <code>ndarray</code>, which is a fast and space-efficient
    multidimensional array providing vectorized arithmetic operations, among other things. Pandas is built atop
    of numpy, and is somehwat more high-level; scikit-learn uses numpy ndarrays as its main data structure;
    matplotlib works with numpy arrays also.
</p>

<h2>Exercises</h2>
<ol>
    <li>
        Let
        $$\v{u} = \cv{2\\-7\\1}\,\,\,
          \v{v} = \cv{-3\\0\\4}$$
        and
        $$\v{A} =  \begin{bmatrix}
                      1 &  2 & 0 \\
                      3 & -1 & 4
                  \end{bmatrix}\,\,\,
           \v{B} = \begin{bmatrix}
                       2 & -1 \\
                       1 &  0 \\
                      -3 & 4
                \end{bmatrix}$$
        Use numpy to compute:
        <ol>
            <li>$\v{u} + \v{v}$</li>
            <li>$-3\v{u}$</li>
            <li>$\v{u}\v{v}$ (Strictly, we should write $\v{u}^T\v{v}$. Why? But it is common to write it without the transpose.)</li>
            <li>$\v{u}\v{u}$</li>
            <li>$\sqrt{\v{u}\v{u}}$</li>
            <li>$\v{u} * \v{v}$</li>
            <li>$\v{A} + \v{A}$</li>
            <li>$\v{A} + \v{u}$</li>
            <li>$10\v{A}$</li>
            <li>$\v{A}\v{v}$</li>
            <li>$\v{A}\v{B}$</li>
            <li>$\v{A}^T$</li>
            <li>$\v{A}\v{A}^T$</li>
            <li>$\v{A}^T\v{A}$</li>
            <li>the smallest element in $\v{u}$</li>
            <li>the index of the smallest element in $\v{u}$</li>
            <li>the mean of the values in $\v{u}$</li>
        </ol>
    </li>
    <li>Play with the <code>cumsum</code> method on 1-dimensional numpy arrays. Then define a Python function
        that does the same thing for regular Python lists. Then compare how long they take to run on an 
        array/list that contains all the integers from 1 to 1000 inclusive.
    </li>
</ol>

In [5]:
# Question 1:
u = np.array([2, -7, 1])
v = np.array([-3, 0, 4])

A = np.array([[1, 2, 0], [3, -1, 4]])
B = np.array([[2, -1], [1, 0], [-3, 4]])

u.shape, v.shape, A.shape, B.shape

((3,), (3,), (2, 3), (3, 2))

In [6]:
# A. u + v
u + v
# np.add(u, v)

array([-1, -7,  5])

In [7]:
# B. -3u
-3*u
# np.multiply(-3, u)

array([-6, 21, -3])

In [8]:
# C. uv
np.dot(u, v)
# u.dot(v)

-2

In [9]:
# D. uu
np.dot(u, u)
# u.dot(u)

54

In [10]:
# E. sqrt(uu)
np.sqrt(np.dot(u, u))

7.3484692283495345

In [11]:
# F. u*v
u * v

array([-6,  0,  4])

In [12]:
# G. A + A 
A + A
# np.add(A, A)

array([[ 2,  4,  0],
       [ 6, -2,  8]])

In [13]:
# H. A + u
A + u
# np.add(A, u)

array([[ 3, -5,  1],
       [ 5, -8,  5]])

In [14]:
# I. 10A
10 * A
# np.multiply(10, A)

array([[ 10,  20,   0],
       [ 30, -10,  40]])

In [15]:
# J. Av
np.dot(A, v)
# A.dot(v)

array([-3,  7])

In [16]:
# K. AB
np.dot(A, B)
# A.dot(B)

array([[ 4, -1],
       [-7, 13]])

In [17]:
# L. A^T
A.transpose()

array([[ 1,  3],
       [ 2, -1],
       [ 0,  4]])

In [18]:
# M. AA^T
np.dot(A, A.transpose())
# A.dot(A.transpose())

array([[ 5,  1],
       [ 1, 26]])

In [19]:
# N. A^TA
np.dot(A.transpose(), A)
# A.transpose().dot(A)

array([[10, -1, 12],
       [-1,  5, -4],
       [12, -4, 16]])

In [20]:
# O. smallest element in u
np.min(u)

-7

In [21]:
# P. index of smallest element in u
np.argmin(u)

1

In [22]:
# Q. average of elements in u
np.average(u)

-1.3333333333333333

In [23]:
# Question 2
np.cumsum(u)

array([ 2, -5, -4])

In [24]:
def list_cumsum(input_list:list):
    result = []
    current_total = 0
    for element in input_list:
        current_total += element
        result.append(current_total)
    return result

In [25]:
list_u = u.tolist()
list_cumsum(list_u)

[2, -5, -4]

In [26]:
# compare timings
import time
list_input = [i for i in range(0, 1001)]
nparray_input = np.array(list_input)

np_time = time.time()
np.cumsum(nparray_input)
print("--- Numpy cumsum: %s seconds ---" % (time.time() - np_time))

--- Numpy cumsum: 9.679794311523438e-05 seconds ---


In [27]:
# compare timings
list_time = time.time()
list_cumsum(list_input)
print("--- List cumsum: %s seconds ---" % (time.time() - list_time))

--- List cumsum: 0.00014090538024902344 seconds ---
