# COMP 3105 Introduction to Machine Learning NumPy Tutorial


## To Start: Virtual Environments

### Why

A Python program/package can depend on many other programs/packages. Managing these dependencies is non-trivial because, for example, program A can depend on program C with version 1.0 while program B can depend on program C but with version 2.0. Then you cannot just install one single program C and hope that both programs A and B can work simultaneously. This is why virtual environments can be handy.

By setting up a virtual environment using [conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) or [venv](https://docs.python.org/3/library/venv.html), we create a fresh and empty environment to play with, without risking messing up the actual environment on your computer.

### How

Remotely, Google Colab hosts cloud machines with many Python packages preinstalled so that you can run Jupyter notebooks like this one without setting up a virtual environment.

Locally on your own computer, you can run the following to create a conda environment

```
conda create --name COMP3105 -c conda-forge --file requirements.txt
```

This creates a virtual env named `COMP3105` and installs Python packages specified in `requirements.txt` which can be found in the assignment and you are free to modify it as needed.

Now you can activate the env by

```
conda activate COMP3105
```

Then your prompt will become

```
(COMP3105) $
```

and you can see which python is used now

```
(COMP3105) $ which python
```

You can even install more packages by

```
(COMP3105) $ conda install PACKAGE
OR
(COMP3105) $ pip install PACKAGE
```

You can run your code using

```
(COMP3105) $ python PATH_TO_PYTHON_FILE
```

Finally, when you are done, you can exit the env by

```
(COMP3105) $ conda deactivate
```

and the prompt will change back.


## Python

Basics


In [None]:
a = 3
b = 2
print("Printing")
print(a)
print(f"Value of a/b is {a/b}")  # f-string in Python 3
print(f"Value of a//b is {a//b}")

Printing
3
Value of a/b is 1.5
Value of a//b is 1


Data types


In [2]:
a = 4
print(type(a))

b = 3.1415625
print(type(b))

c = "this is a string"
print(type(c))

d = (1, 2, 3)
print(type(d))

e = [1, 2, 3]
print(type(e))

f = True
print(type(f))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'tuple'>
<class 'list'>
<class 'bool'>


Pecular float point behavior


In [3]:
a = 0.3
b = 0.1 + 0.1 + 0.1

print(a)
print(b)
print(a==b)

0.3
0.30000000000000004
False


Some data types are mutable


In [4]:
a = [1, 2, 3]
print(a)
a[0] = -1
print(a)

[1, 2, 3]
[-1, 2, 3]


Some are not mutable


In [5]:
a = (1, 2, 3)
# a[0] = -1  # error

In [6]:
a = "this is a string"
# a[0] = "g"  # error

Python pass by reference


In [7]:
a = [1, 2, 3]
b = [a, a, a]  # list of lists
print(f"b: {b}")

a[0] = 100
print(f"b: {b}")

b: [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
b: [[100, 2, 3], [100, 2, 3], [100, 2, 3]]


Python list comprehension


In [8]:
a = [1, 2, 3]
b = [1 if x >= 2 else 0 for x in a]  # create a new list
print(b)

c = ["4", "5", "6"]
print(c)
d = [int(x) for x in c]  # apply a function to each element
print(d)

[0, 1, 1]
['4', '5', '6']
[4, 5, 6]


Function


In [9]:
def sum_and_product(a, b=1):  # set default value
    return a + b, a * b  # return as a tuple

print(sum_and_product(10, 5))
print(sum_and_product(10))

(15, 50)
(11, 10)


## NumPy

NumPy is a Python library that is very effective in manipulating and calculating arrays of numbers


In [10]:
import numpy as np

### 1D Arrays

Create and modify an array


In [11]:
a = np.array([1, 2, 3])

print(a)

a[0] = -1
print(a)

[1 2 3]
[-1  2  3]


Difference between Python list and np array


In [12]:
list1 = [1, 2, 3]
list2 = [4, 5, 6]

print(list1 + list2)  # concatenation of two lists

a = np.array([1, 2, 3])
b = np.array((4, 5, 6))

print(a + b)  # sum of two vectors

[1, 2, 3, 4, 5, 6]
[5 7 9]


Array products


In [13]:
print(list1 * 3)  # repeat lists

print(a * b)  # elementwise product
print(np.dot(a, b))  # inner/dot product

[1, 2, 3, 1, 2, 3, 1, 2, 3]
[ 4 10 18]
32


### 2D Arrays (Matrices)


In [14]:
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# A nicer look for us:
X = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

print(X)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


Check the dimensions/sizes/shape of an array


In [15]:
print(X.shape)
print(type(X.shape))

(4, 3)
<class 'tuple'>


Access rows and columns


In [16]:
print(f"First row of X: {X[0]}")

print(f"Second column of X: {X[:, 1]}")

print("Up until second row/column of X:")
print(X[:2, :2])

First row of X: [1 2 3]
Second column of X: [ 2  5  8 11]
Up until second row/column of X:
[[1 2]
 [4 5]]


Modifying rows and columns


In [17]:
X[0] = [-1, -2, -3]
print(X, '\n')

X[:, 1] = [0, 0, 0, 0]
print(X, '\n')

X[3, 2] = 100
print(X)

[[-1 -2 -3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]] 

[[-1  0 -3]
 [ 4  0  6]
 [ 7  0  9]
 [10  0 12]] 

[[ -1   0  -3]
 [  4   0   6]
 [  7   0   9]
 [ 10   0 100]]


Matrix transpose


In [18]:
X = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

print(X.T)

[[ 1  4  7 10]
 [ 2  5  8 11]
 [ 3  6  9 12]]


Some utilitity functions


In [19]:
X = np.zeros([2, 3])
print(X, '\n')
X = np.ones([3, 2])
print(X, '\n')
# np.random.seed(0)
X = np.random.randn(2, 2)
print(X)

[[0. 0. 0.]
 [0. 0. 0.]] 

[[1. 1.]
 [1. 1.]
 [1. 1.]] 

[[ 0.89250242 -0.61879329]
 [ 0.10299791  0.26746972]]


Indexing: https://numpy.org/doc/stable/user/basics.indexing.html


In [20]:
X = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

X[X <= 6] = 0
X[X > 6] = 1
print(X)

[[0 0 0]
 [0 0 0]
 [1 1 1]
 [1 1 1]]


Broadcast rules https://numpy.org/doc/stable/user/basics.broadcasting.html


In [21]:
X = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

print(X - 1)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


In [22]:
X_row_mean = np.mean(X, axis=0, keepdims=True)  # Note keepdims=True here (what if keepdims=False?)
print(X_row_mean)
print(f"X shape: {X.shape}")
print(f"X_row_mean shape: {X_row_mean.shape}")
print(X - X_row_mean)

[[5.5 6.5 7.5]]
X shape: (4, 3)
X_row_mean shape: (1, 3)
[[-4.5 -4.5 -4.5]
 [-1.5 -1.5 -1.5]
 [ 1.5  1.5  1.5]
 [ 4.5  4.5  4.5]]


In [23]:
X_col_mean = np.mean(X, axis=1, keepdims=True)
print(f"X shape: {X.shape}")
print(f"X_col_mean shape: {X_col_mean.shape}")
print(X - X_col_mean)

X shape: (4, 3)
X_col_mean shape: (4, 1)
[[-1.  0.  1.]
 [-1.  0.  1.]
 [-1.  0.  1.]
 [-1.  0.  1.]]


Matrix multiplication


In [24]:
X = np.array([[1, 2],
              [3, 4],
              [5, 6]])

w = np.array([[1],
              [1]])  # a column vector of all ones

print("X:")
print(X)
print("w:")
print(w)
print("X times w:")
print(X @ w)  # same as np.matmul(X,w)

X:
[[1 2]
 [3 4]
 [5 6]]
w:
[[1]
 [1]]
X times w:
[[ 3]
 [ 7]
 [11]]


Matrix inverse


In [25]:
X = np.array([[1, 1, 1],
              [2, 3, 4],
              [0, 0, 1]])

X_inv = np.linalg.inv(X)
print("Inverse of X:")
print(X_inv)

print("X times the inverse of X:")
print(X @ X_inv)  # should give identity (also check np.identity or np.eye)

Inverse of X:
[[ 3. -1.  1.]
 [-2.  1. -2.]
 [ 0.  0.  1.]]
X times the inverse of X:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


Concatenate two matrices


In [26]:
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[-1, -1],
              [-1, -1]])

C = np.concatenate([A, B], axis=0)
print(C, '\n')

D = np.concatenate([A, B], axis=1)
print(D)

[[ 1  2]
 [ 3  4]
 [-1 -1]
 [-1 -1]] 

[[ 1  2 -1 -1]
 [ 3  4 -1 -1]]


## Linear Programming

In A1, you will use CVXOPT to solve linear programming:
https://cvxopt.org/userguide/coneprog.html#cvxopt.solvers.lp

Let's consider the following problem.

**Question**:

---

Suppose we are making dishes.

|    Dishes    | Cost  |
| :----------: | :---: |
| Potato Salad | 12.75 |
|  Meat Stew   | 7.25  |

- Let $x_1$ be the number of potato salad dishes (first unknown)
- Let $x_2$ be number of meat stew dishes (second unknown)

- Requirements:

1.  Budget = 1200
2.  At least 60 veg dishes and 40 non-veg dishes

- Objective:
- Maximize number of dishes

**Solution**:

---

- Forming constraints
  - $12.75 x_1 + 7.25 x_2 \le 1200$
  - $x_1 \ge 60 ⟺ -x_1 \le -60$
  - $x_2 \ge 40 ⟺ -x_2 \le -40$
- Goal maximize $x_1 + x_2 ⟺ $ minimize $- x_1 - x_2$
- Let $\mathbf{x}=[x_1, x_2]$ unknown vector

Specifications of `cvxopt.solvers.lp`

$$
\begin{split}
\min_{\mathbf{x}}\quad& \mathbf{c}^\top \mathbf{x}
\\
s.t.\quad& G \mathbf{x} + \mathbf{s} = \mathbf{h}
\\
& A \mathbf{x} = \mathbf{b}
\\
& \mathbf{s} \succeq \mathbf{0}
\end{split}
$$

We can get rid of $\mathbf{s}$:

$$
\mathbf{s} = \mathbf{h} - G \mathbf{x} \succeq \mathbf{0}
⟺
G \mathbf{x} \preceq \mathbf{h}
$$

Need to figure out $\mathbf{c}, G, \mathbf{h}$ (no equality constraint, so not using $A, \mathbf{b}$)


In [27]:
import numpy as np
from cvxopt import matrix, solvers

c = np.array([-1.0, -1.0])

Then the objective function (goal) is
$\mathbf{c}^\top\mathbf{x} = (-1) x_1 + (-1) x_2$


In [28]:
G = np.array([[12.75, 7.25],
              [-1, 0],
              [0, -1]])
h = np.array([1200.0, -60.0, -40.0])

Then $G\mathbf{x}=
\begin{bmatrix}
12.75 & 7.25 \\
-1 & 0 \\
0 & -1
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}
=
\begin{bmatrix}
12.75x_1+7.25x_2 \\
-x_1 \\
-x_2
\end{bmatrix}
\preceq
\begin{bmatrix}
1200 \\
-60 \\
-40
\end{bmatrix}
=\mathbf{h}
$


In [29]:
# Need to convert to cvxopt.matrix type
c = matrix(c)
G = matrix(G)
h = matrix(h)

# Finally, call the solver
sol=solvers.lp(c, G, h)
print(np.array(sol['x']))

     pcost       dcost       gap    pres   dres   k/t
 0: -1.1342e+02 -1.5754e+03  2e+01  0e+00  1e+01  1e+00
 1: -1.1469e+02 -4.2327e+02  4e+00  2e-16  3e+00  3e-01
 2: -1.1981e+02 -1.7443e+02  9e-01  3e-16  5e-01  2e-01
 3: -1.1999e+02 -1.2096e+02  2e-02  3e-16  8e-03  2e-03
 4: -1.2000e+02 -1.2001e+02  2e-04  1e-16  8e-05  2e-05
 5: -1.2000e+02 -1.2000e+02  2e-06  1e-16  8e-07  2e-07
 6: -1.2000e+02 -1.2000e+02  2e-08  1e-16  8e-09  2e-09
Optimal solution found.
[[60.        ]
 [59.99999999]]
