# About these practical / tutorials

- These practical sessions are in the form of notebooks to follow and fill in.
- In case you cannot run the notebooks locally (e.g., missing libraries etc), you can use <https://jupyterlite.readthedocs.io/en/latest/_static/lab/index.html> (which runs the notebook using a Python running *in your browser* thanks to pyodide)... the first cell can take a long time to run, be patient.

In [2]:
%%html
<style>
/* *********************************************************** */
/* styling the notebook, you can ignore it if it does not work */
/* *********************************************************** */
h3 { color: #60a5fa !important; text-decoration: underline; font-variant-caps: small-caps;}
.jp-OutputArea-output { border-left: 10px solid grey; margin-left: 20px; }
</style>

# Let's Start: Numpy Basics

- Numpy is a Python library that is the fundation of most Python scientific libraries.
- Numpy makes it easy to efficiently manipulate vectors (1d arrays), matrices (2d arrays), and generally "n-dimensional arrays" (nd arrays).
- Deep learning framework actually re-implement the numpy API
    - they can be used (mostly) in the same way as numpy arrays
    - they often allow to manipulate arrays that are either in CPU or in GPU memory
    - they often call *tensor* what numpy calls a *ndarray*

It is thus mandatory to well understand numpy in order not to loose a lot of time when trying to manipulate libraries and deep learning frameworks.

By a well established convention, the numpy library is imported and renamed as `np`.

In [1]:
import numpy as np

### Creating simple Numpy arrays

Numpy allows for easy creation of arrays, with many different "constructors".

The first way to construct an array is to use an existing Python list.

In [2]:
# build an array from a 1d Python list
a = np.array([10, 20, 30, 40, 50])
print(a)

[10 20 30 40 50]


In [3]:
SEP = "\n======================\n"

# build a 2d array from a "rectangular" Python list
b = np.array([[10, 20, 30], [40, 50, 60]])
print(b, SEP)

# build a 3d array from a "rectangular" Python list
c = np.array([ [[10, 20, 30], [40, 50, 60]], [[11, 21, 31], [42, 52, 62]] ])
print(c, SEP)


[[10 20 30]
 [40 50 60]] 

[[[10 20 30]
  [40 50 60]]

 [[11 21 31]
  [42 52 62]]] 



One can also create 1d arrays with regularly spaced values (like `range()` in plain Python).

In [4]:
# np.arange() is like range() but for numpy arrays, it also allows non-integer values
r1 = np.arange(10)
r2 = np.arange(10, 20)
r3 = np.arange(10, 20, 2)
r4 = np.arange(10, 20, 1.5)
print(r1, r2, r3, r4, sep=SEP, end=SEP)

# np.linspace() conveniently allows to specify the start and end (both included) and the number of values we want
r5 = np.linspace(10, 20, 6)
print(r5, SEP)

[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[10 12 14 16 18]
[10.  11.5 13.  14.5 16.  17.5 19. ]
[10. 12. 14. 16. 18. 20.] 



### The notion of *shape*

- Each numpy array has a number of dimensions (also called axes), a vector is 1d, a matrix is 2d etc.
- Along each dimension, a numpy array has a number of values, e.g. the number of rows and columns for a 2d array.
- The tuple containing theses number of values is called the **shape** of a numpy array.
- The shape of an array `a` can be accessed with either `np.shape(a)` or `a.shape`.
- A new array with the same values as an existing array but *viewed* with a different shape can be created with the `reshape()` function.

Let's look at the shapes of our arrays.

In [5]:
for arr in [a, b, c, r1, r2, r5]:
    print(arr)
    print("............ is of shape", arr.shape, SEP)

[10 20 30 40 50]
............ is of shape (5,) 

[[10 20 30]
 [40 50 60]]
............ is of shape (2, 3) 

[[[10 20 30]
  [40 50 60]]

 [[11 21 31]
  [42 52 62]]]
............ is of shape (2, 2, 3) 

[0 1 2 3 4 5 6 7 8 9]
............ is of shape (10,) 

[10 11 12 13 14 15 16 17 18 19]
............ is of shape (10,) 

[10. 12. 14. 16. 18. 20.]
............ is of shape (6,) 



A few important points
- Shapes are Python tuples, which are immutable lists (like a list but it cannot be modified).
- We see that Python tuples are denoted with parenthesis.
- We see that the special case of a Python tuple with a single value is denoted with a comma at the end, like `(5,)` (this is to distinguish from `(5)` which is Python syntax is just the number 5.
- We see that the first dimension/axis is the rows (for the matrices), but generally the first `[` if we imagine it as a Python list.

### Creating Numpy arrays with a given shape, filled with stuff

We can create numpy arrays also with a variety of functions that accept a shape as a parameter.

For instance, we can create arrays filled with constant values.

In [6]:
s1 = np.zeros(7)
s2 = np.zeros((2, 3))
s3 = np.zeros((2, 3, 4))
s4 = np.ones((3, 2, 4))
s5 = np.full((2, 1, 5), 42)
print(s1, s2, s3, s4, s5, sep=SEP, end=SEP)

[0. 0. 0. 0. 0. 0. 0.]
[[0. 0. 0.]
 [0. 0. 0.]]
[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
[[[42 42 42 42 42]]

 [[42 42 42 42 42]]]


Similarly, there are a many functions to create new arrays filled with random values, all in the `np.random` module.

In [8]:
s10 = np.random.random((2, 5)) # usual uniform [0,1]
s11 = np.random.uniform(37.5, 42, (3, 5)) # uniform [37.5, 42]
s12 = np.random.normal(42, 0.1, (3, 5)) # normal distribution with mean 42 and standard deviation 0.1
s13 = np.random.randint(90, 100, (2, 15)) # like randrange() in Python

print(s10, s11, s12, s13, sep=SEP, end=SEP)

[[0.56209507 0.84610776 0.18417332 0.6755896  0.81117938]
 [0.58027297 0.805316   0.26191057 0.55291197 0.50135424]]
[[40.00686869 40.17137068 41.2804123  40.51913113 40.06334022]
 [37.95686152 41.83756207 38.57518293 37.52937007 40.45796269]
 [38.587786   37.57078228 40.32527227 41.78573059 38.86730405]]
[[42.01373317 41.93263248 42.21486495 42.24039728 42.02377156]
 [42.1673872  41.92000386 41.8922798  42.04252957 42.10344198]
 [41.86411132 42.0487679  42.09422487 42.1026139  42.25476147]]
[[98 99 92 91 99 92 90 92 91 93 90 98 96 93 95]
 [98 99 91 97 94 96 96 96 96 99 99 98 91 95 95]]


As hinted above, we can *change the shape* of an array.

In [9]:
s20 = np.arange(0, 47, 2)
print(s20, "\n...... has a shape of", s20.shape, "for a total of", s20.size, "elements.", SEP)

s21 = s20.reshape((3, 8))
# error = s20.reshape((3, 10))
s22 = s20.reshape((4, 6))
s23 = s20.reshape((2, 3, 4))
print(s20, s21, s22, s23, sep=SEP, end=SEP)

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46] 
...... has a shape of (24,) for a total of 24 elements. 

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46]
[[ 0  2  4  6  8 10 12 14]
 [16 18 20 22 24 26 28 30]
 [32 34 36 38 40 42 44 46]]
[[ 0  2  4  6  8 10]
 [12 14 16 18 20 22]
 [24 26 28 30 32 34]
 [36 38 40 42 44 46]]
[[[ 0  2  4  6]
  [ 8 10 12 14]
  [16 18 20 22]]

 [[24 26 28 30]
  [32 34 36 38]
  [40 42 44 46]]]


We can note that
- the last dimension/axis (e.g. the columns in 2d) are filled first, then the one before the last etc.
- we would get an error if we would put a shape that does imply the same number of elements as the original array

Reshape also allows for up to one *wildcard* expressed as `-1`
- as reshape knows the total size of the original array, it can infer one of the value in the new shape (by dividing the number of elements by the specified shape values),
- for instance if we have an array with 24 values and reshape it in an 2d array with 3 rows, then the number of columns can be inferred to be 8 (24 / 3).


In [10]:
# with a "-1" shortcut
s24 = s20.reshape((3, -1))
# error = s20.reshape((-1, -1))
s25 = s20.reshape((-1, 6))
s26 = s20.reshape((2, -1, 4))
# error = s20.reshape((2, -1, -1))
print(s20, s21, s22, s23, s24, s25, s26, sep=SEP, end=SEP)

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46]
[[ 0  2  4  6  8 10 12 14]
 [16 18 20 22 24 26 28 30]
 [32 34 36 38 40 42 44 46]]
[[ 0  2  4  6  8 10]
 [12 14 16 18 20 22]
 [24 26 28 30 32 34]
 [36 38 40 42 44 46]]
[[[ 0  2  4  6]
  [ 8 10 12 14]
  [16 18 20 22]]

 [[24 26 28 30]
  [32 34 36 38]
  [40 42 44 46]]]
[[ 0  2  4  6  8 10 12 14]
 [16 18 20 22 24 26 28 30]
 [32 34 36 38 40 42 44 46]]
[[ 0  2  4  6  8 10]
 [12 14 16 18 20 22]
 [24 26 28 30 32 34]
 [36 38 40 42 44 46]]
[[[ 0  2  4  6]
  [ 8 10 12 14]
  [16 18 20 22]]

 [[24 26 28 30]
  [32 34 36 38]
  [40 42 44 46]]]


### Numpy operations are element-wise

Any Python operator works on numpy arrays (or between an array and a scalar), and is done element-wise.

Here are a few examples, in 1d.

In [11]:
e1 = np.arange(10)
e2 = np.arange(1000, 2000, 100)
e3 = e1 * 10
e4 = 1 + e1
e5 = e1 + e2
e6 = e1*e1
e7 = e1**3
e8 = 2**e1
e9 = e1**2 / e1
print(e1, e2, e3, e4, e5, e6, e7, e8, e9, sep=SEP, end=SEP)

[0 1 2 3 4 5 6 7 8 9]
[1000 1100 1200 1300 1400 1500 1600 1700 1800 1900]
[ 0 10 20 30 40 50 60 70 80 90]
[ 1  2  3  4  5  6  7  8  9 10]
[1000 1101 1202 1303 1404 1505 1606 1707 1808 1909]
[ 0  1  4  9 16 25 36 49 64 81]
[  0   1   8  27  64 125 216 343 512 729]
[  1   2   4   8  16  32  64 128 256 512]
[nan  1.  2.  3.  4.  5.  6.  7.  8.  9.]


  e9 = e1**2 / e1


(an error, actually a RuntimeWarning is normal above, no exceptions are thrown, a "nan" value is produced)

The same works with multi-dimensional arrays, (for now) with the constraint that they have the same shape.

In [12]:
e10 = np.random.randint(90, 100, (2, 5))
e11 = np.arange(1, 11).reshape((2, 5))
e12 = e10 + 6000
e13 = 10000*e10 + e11
print(e10, e11, e12, e13, sep=SEP, end=SEP)

[[94 92 93 95 95]
 [99 91 91 97 96]]
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
[[6094 6092 6093 6095 6095]
 [6099 6091 6091 6097 6096]]
[[940001 920002 930003 950004 950005]
 [990006 910007 910008 970009 960010]]


### Element-wise functions

Most `math` functions exists in the `numpy` module and can work both on scalars and arrays of any sizes, where they are applied element-wise.

In [13]:
e20 = np.arange(180, 360)
e21 = np.radians(e20)
e22 = np.sin(e21)
e23 = np.exp(e22)
print(e20, e21, e22, e23, sep=SEP, end=SEP)


[180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197
 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215
 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251
 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287
 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305
 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341
 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359]
[3.14159265 3.15904595 3.17649924 3.19395253 3.21140582 3.22885912
 3.24631241 3.2637657  3.28121899 3.29867229 3.31612558 3.33357887
 3.35103216 3.36848546 3.38593875 3.40339204 3.42084533 3.43829863
 3.45575192 3.47320521 3.4906585  3.5081118  3.52556509 3.54301838
 

### Aggregation function

Going beyond the simple Python `sum()` function, numpy provides aggretation functions:
- some examples are `np.sum()`, `np.mean()`, `np.median()`, ...
- each function works on numpy arrays (and also Python lists)
- each function can specify which dimension/axis to aggregate over, e.g. to compute the mean of each column in a 2d array (that is averaging along the row axis).

Here are examples, where we use a 3d array.

In [14]:
u1 = np.random.normal(1.0, 0.01, (3, 5, 10)) # draw 7500 values close to 1, with standard deviaton 0.01
print("We have an array of shape", u1.shape, "for a total number of", u1.size, "elements")
print("From the number of integers in the shape, we can say that the array has", len(u1.shape), "axes/dimensions.")
for i in range(len(u1.shape)):
    print("... along axis", i, "there are", u1.shape[i], "elements")
print(SEP)
# simple examples to aggregate over the whole array
u2 = np.sum(u1)
u3 = np.mean(u1)
u4 = np.std(u1)
print(u2, u3, u4, sep=SEP, end=SEP)

We have an array of shape (3, 5, 10) for a total number of 150 elements
From the number of integers in the shape, we can say that the array has 3 axes/dimensions.
... along axis 0 there are 3 elements
... along axis 1 there are 5 elements
... along axis 2 there are 10 elements


149.90062845658542
0.9993375230439028
0.010612364550924073


In [15]:
# selecting which axis to aggregate over
u10 = np.sum(u1, axis=2) # summing **along** axis 2
u11 = np.mean(u1, axis=2)
u12 = np.std(u1, axis=2)
print(u10, u11, u12, sep=SEP, end=SEP)

u13 = np.sum(u1, axis=0)
u14 = np.sum(u1, axis=1)
u15 = np.sum(u1, axis=2)
# keeping the aggregated dimension (but then it has size 1)
u16 = np.sum(u1, axis=0, keepdims=True)
u17 = np.sum(u1, axis=1, keepdims=True)
u18 = np.sum(u1, axis=2, keepdims=True)
# aggregating along several dimensions
u19 = np.sum(u1, axis=(0,1))
u20 = np.sum(u1, axis=(1,2))
u21 = np.sum(u1, axis=(0,2))
u22 = np.sum(u1, axis=(0,2), keepdims=True)
for u in [u13, u14, u15, u16, u17, u18, u19, u20, u21, u22]:
    print(u.shape, end=SEP)

[[10.00498928  9.99927809  9.9809229  10.00722205  9.97661369]
 [10.0304272   9.97748666 10.01136918 10.01481299  9.96182899]
 [ 9.97497988  9.97649808 10.00422585  9.97675459 10.00321903]]
[[1.00049893 0.99992781 0.99809229 1.00072221 0.99766137]
 [1.00304272 0.99774867 1.00113692 1.0014813  0.9961829 ]
 [0.99749799 0.99764981 1.00042258 0.99767546 1.0003219 ]]
[[0.00987529 0.01085135 0.01156042 0.01260572 0.01439775]
 [0.01062965 0.01042457 0.0066556  0.00927654 0.00565406]
 [0.01244751 0.01082161 0.00788013 0.01119151 0.00881419]]
(5, 10)
(3, 10)
(3, 5)
(1, 5, 10)
(3, 1, 10)
(3, 5, 1)
(10,)
(3,)
(5,)
(1, 5, 1)


### Accessing elements and slices (plain Python lists)

Python supports list indexing (as all programming languages) but also allow "slicing" to extract a subpart (as a view) of an existing list.

In [16]:
l = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

# plain access
print(l[3])
# slicing
l2 = l[3:6]
print(l2)
# slicing with a step value
l3 = l[3:6:2]
print(l3)

30
[30, 40, 50]
[30, 50]


Slicing allows for two or 3 values (`start:end` or `start:end:step`) and this has to be put in parallel with the parameters of `range()`. Here is equivalent versions without slicing.

In [17]:
l2bis = [ l[i] for i in range(3, 6) ]
l3bis = [ l[i] for i in range(3, 6, 2) ]
# almost equivalent to
l3ter = []
for i in range(3, 6, 2):
    l3ter.append(l[i])
print(l2bis, l3bis, l3ter, sep=SEP, end=SEP)

[30, 40, 50]
[30, 50]
[30, 50]


There are also default values for beginning and end, and negative steps are allowed.

In [18]:
print(l[:3])
print(l[5:])
print(l[:5:2])
print(l[5::2])
print(l[::3])
print(l[:]) # not so useful for now
print(l[::-1]) # negative step
print(l[::-3])

[0, 10, 20]
[50, 60, 70, 80, 90]
[0, 20, 40]
[50, 70, 90]
[0, 30, 60, 90]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
[90, 80, 70, 60, 50, 40, 30, 20, 10, 0]
[90, 60, 30, 0]


Negative indices also work, counting from the end of the list.

In [19]:
print(l[-1])
print(l[len(l)-1]) # equivalent

print(l[-3])
print(l[len(l)-3]) # equivalent

# also with slices
print(l[1:-1])
print(l[1:len(l)-1]) # equivalent

print(l[:-3])

90
90
70
70
[10, 20, 30, 40, 50, 60, 70, 80]
[10, 20, 30, 40, 50, 60, 70, 80]
[0, 10, 20, 30, 40, 50, 60]


All indexing work for affectations (on the left of an equal sign) provided there are the same number of elements on the right (numpy waives this restriction).

In [20]:
print(l, end=SEP)
#error l[3:6:2] = 42
l[3:6:2] = [4, 2]
print(l, end=SEP)

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
[0, 10, 20, 4, 40, 2, 60, 70, 80, 90]


### Accessing elements and slices in numpy arrays

Numpy array with 1 dimensions can be used exactly like lists, including for all indexing, slices etc. Numpy additionally allow, like in operations between an array and a number, to automatically repeat a value, like below.

In [21]:
t1 = np.arange(10)*10 # same as the list l above 
print(t1)

[ 0 10 20 30 40 50 60 70 80 90]


In [22]:
# replacing several values by 4
t1[3:6:2] = 4
print(t1)

[ 0 10 20  4 40  4 60 70 80 90]


Multi-dimensional numpy arrays can be accessed by passing `,`-separated indices and/or 
slices. For instance in 2d, we pass 2 indices and/or slices.

In [23]:
t2 = np.zeros((10, 8))
print(t2, end=SEP)

t2[0, :] = 42
print(t2, end=SEP)

t2[3, :] = np.random.randint(10, 100, 8)
print(t2, end=SEP)

t2[:, 0] = t2[:, -1]
print(t2, end=SEP)

t2[1:-1, 5:] = 77
print(t2, end=SEP)

[[0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0.]]
[[42. 42. 42. 42. 42. 42. 42. 42.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]]
[[42. 42. 42. 42. 42. 42. 42. 42.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [73. 79. 72. 55. 65. 34. 11. 42.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]]
[[42. 42. 42. 42. 42. 42. 4

### A Few Examples

Computing the distance between two (random) vectors in $R^{1000}$

In [24]:
x1 = np.random.normal(0, 1, 1000)
x2 = np.random.normal(0, 1, 1000)
dx1x2 = np.sum((x1 - x2)**2)**0.5
print(dx1x2)

46.78079091417377


Computing the $\ell_2$ norm of 128 weight vectors in $R^{1000}$, and the sum of it.

In [25]:
W = np.random.normal(0, 1, (128, 1000))
Wl1 = np.sum(W**2, axis=1)**0.5
print(Wl1.shape)
Wl1Reg = np.sum(Wl1)
print(Wl1Reg)

(128,)
4031.1532985590575
