## Lecture III: NumPy

October 15, 2024

# 5 min warmup - 5 activity points!

* What datatype is '5'?

* How is python indexed?

* What is a a keyword in python?

* What is a keyword in a function?

* What is lambda in python?

In [None]:
import keyword
print(keyword.kwlist)

In [None]:
# will be useful today
# helping - similar to R "str"
print(list('abcd'))
dir(list('abcd')) 

### Warm up 2nd round:

In [3]:
# lambda function

In [None]:
x = [1, 2, 3, 4, 5]
doubled = list(map(lambda x: x * 2, x)) # map applies the function to each element of the list
print(doubled)

In [None]:
x = [1, 2, 3, 4, 5, 6, 7, 8]
even_numbers = list(filter(lambda x: x % 2 == 0, x)) # 
print(even_numbers)

In [None]:
pairs = [(1, 3), (2, 2), (3, 1)]
sorted_pairs = sorted(pairs, key=lambda x: x[1]) # sort by the second element of the tuple
print(sorted_pairs) 

# Moving forward from Python's primitive data types
* Q: can you name the primitives?


In [12]:
# Integers
# Float
# Strings
# Boolean

# Numpy <a name="numpy"></a>

* Num(erical) Py(thon)
* NumPy is at the base of Python's scientific stack of tools 
* Python already has *high-level number objects* (integers, floating point) and *containers*  (lists, dictionaries ) 
* np arrays contain *only one type* - unlike general lists
* **Memory-efficient container that provides fast numerical operations.**
* **ndarray** = block of memory + indexing scheme + data type descriptor
    * raw data 
    * how to locate an element
    * how to interpret an element

<img src="03_pics/ndarray.png" width="600">

Key Features of an `ndarray`:

- **Homogeneous:** All elements must have the same data type.
- **Multidimensional:** It can have any number of dimensions (1D, 2D, 3D, etc.).
- **Fixed size:** The size of the array is determined at creation and cannot change, though you can reshape it.


In [None]:
# it is not natively in the python distribution, do you have it installed?

!pip freeze | grep numpy # bit of bash magic

In [None]:
# executing shell comands from the jupyter notebook
!ls -lha
#!pip install numpy

In [15]:
# np is alias ""(used when name of the packages are too long or coders are rightly lazy)
import numpy as np
# very common usage

**Create an ndarray:** You can create an `ndarray` from:

- A Python list or nested lists (for multi-dimensional arrays).
- Other methods like `np.zeros()`, `np.ones()`, `np.random()`, etc.

In [None]:
# Simple array
List = [2.0, 3.0, 16.9, 17.2, 1.0]
List2 = np.array(List)
print("Elements in List2 variable are : ", List2)
print("Type of our List2 variable is : ", type(List2))

In [None]:
# Simple array
a = np.array(1, 2, 3, 4)

In [None]:
# Simple array
a = np.array([0, 1, 2, 3, 4])
a

In [None]:
dir(a)

In [24]:
# a.ndim

# len(a)

In [None]:
print(a) # prints almost like list

In [None]:
a.shape # 

An array → dimensions we address to

<img src="03_pics/np-axis.png">

Different way to look at an array

<img src="03_pics/array_construct.png">

## How can you define a matrix?

### array of arrays?

In [22]:
#multi dimensional objects
# array of array is a matrix
a = np.array([
    [1,3], [2,3]
])

In [None]:
a.shape #2 rows, 2 columns

In [None]:
a

In [28]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
print("Array:\n", arr)
print("Shape:", arr.shape)    # (2, 3)
print("Data type:", arr.dtype)  # int64
print("Number of dimensions:", arr.ndim)  # 2
print("Size:", arr.size)    # 6

Construct array like a civilized person. (Martin Hronec's way! :)

In [None]:
np.arange(6) #equivalent to list(range(5))

In [None]:
nda = np.arange(8).reshape((2,2,2))
nda

In [None]:
nda.shape #corresponds to the shape from inicialization

In [None]:
np.arange(10).reshape((2,6)) #is this ok?


NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions” (ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output. (from docs)
```
all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, invert, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where
```

In [None]:
a = np.array([1,2,3,4,5])
np.add(a, 1) # add is summing the arrays element wise

In [None]:
np.all(a == a) # all elements are equal to themselves

In [None]:
# evenly spaced

# chain operations on a single object
a = np.arange(10).reshape((2,5))

print(a)
print(a.mean())

In [None]:
a.mean(axis=0) #why axis 0? 

In [None]:
a.std(axis=1)

In [None]:
a.argmax()#what is the largest element?

In [None]:
a.cumprod(axis=1) #why is it useful?

In [None]:
a.T.dot(a) #what is going on here?

In [None]:
#bit of extra linear algebra

sq = np.arange(4).reshape((1,-1))
print(sq.shape)
sq

In [None]:
sq.T.dot(sq)

In [None]:
dir(a)

In [None]:
# generate sequences

# number of points from an interval
start = 0
end = 1
n_points = 100
a = np.linspace(start, end, n_points) #R: seq()
a

### Why is it useful?

In [43]:
### generating data with numpy is easy

In [None]:
dir(np.random)

In [None]:
# random seed is cell-specific! 
np.random.seed(1234)

# random (normal)
r = np.random.randn(4)
r

In [None]:
np.random.randn(4) #re-running this yields different result -> seed is not in play

In [None]:
np.random.standard_t(4, 4) #t-distribution

In [None]:
np.random.uniform(0, 1, 4) #uniform distribution

# A crucial skill

### Indexing and Slicing
* In 2D, the first dimension corresponds to rows, the second to columns.
* in the multidimensional case, `a[0]` gives all elements in the unspecified dimension

In [None]:
# create toy diagonal matrix
a = np.diag([1,2,3,4])
a

In [57]:
# print(a[2]) 

# print(a[2,:]) #slicing - equivalent to first

# print(a[2][2]) #access single element matrix
# print(a[2,2])

# print(a[:,1])

# print(a[:,-2:])

In [None]:
a[0:3:2] # by default going by axis 0

In [None]:
a[:,0:3:2] # by now going by axis 1 - notice the :, at beginning

In [None]:
# select from start to an end with certain step (could be zero instead of missing)
# advanced tricks

a[:3:2,:3:2] #step n is every n-th observation

In [None]:
s = np.arange(100)
print(s)
# step can also be negative
# start:end:step
print(s[:80:-3])

The sliced array `s[:80:-3]` results in `[99, 96, 93, 90, 87, 84, 81]`. 
This array is generated by starting from the end of s (since the start index is omitted and step is negative, it defaults to the last item, which is 99), and includes every third element in reverse order until just before it reaches the index 80.

**Copies vs. views**
* a slicing creates a **view** on the original array (just a way of accessing array data)
    * the original array is not copied in memory
* when modifying the view, the original array is modified as well! (SURPRISE, SURPRISE)
    * allows to save memory and time
* In CS it is called **shallow copy** vs **deep copy**

In [None]:
a = np.arange(10)
print(a)
b = a 
b[2] = 22

# #print(a.data, b.data)

print(a)
print(b) #a anb b are the same??

#print(np.may_share_memory(a, b)) #

In [None]:
a = np.arange(10)
c = a.copy()  # force a copy -> create new memory
c[0] = 12
print(c)
print(a)

print(np.may_share_memory(a, c))
#print(a.data, c.data)

### Typical mistake in pandas slicing dataframes -> stay tuned for next lecture!

SettingWithCopyWarning:
 
A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

**Speed of basic numpy operations**
   * much faster then in pure python

In [None]:
# if unsure about an algo
# run on a small sample and get a time estimate! before run for days...
a = np.arange(10000)
%timeit -n 100 a + 1  

#caching results is good!

In [None]:
l = range(10000)
%timeit -n 100 [i+1 for i in l] 

In [None]:
# remember the difference between %time and %timeit
# timeit runs a number of loops
# time times just one evaluation of the cell

%time res = [i+1 for i in a]

In [None]:
%timeit res =  a + 1 

**Changing shape of an array**
* flattening
* reshaping (inverse of flattening)

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
# flattening
print(a.flatten())


In [None]:
print(a.shape)
a.T

In [None]:
a.T.flatten() #or use flatten/ravel order='F' in Fortran column-wise order

In [None]:
a.T.flatten(order='F')

In [None]:
np.exp(a) # a bit like R - vectorized operations (broadcasted)

**Pictures? Just pixels.**

In [80]:
# we will get to the matplotlib and pyplot in the last part of the lecture
import matplotlib.pyplot as plt
# another ipython magic
%matplotlib inline 

In [None]:
# for more M.C. Escher's pictures: https://www.mcescher.com/
import matplotlib.pyplot as plt

img = plt.imread("03_pics/mc_escher_print gallery.png")
plt.imshow(img, interpolation="nearest", aspect="auto")

In [None]:
type(img)

In [None]:
img

In [None]:
# image shape as (H, W, D), depth: https://www.wikiwand.com/en/Color_depth
img.shape

In [None]:
# just an array!
img

In [92]:
self_centered = img[200:,200:500]

In [None]:
plt.imshow(self_centered)

In [None]:
lx, ly, ld = img.shape
X, Y = np.ogrid[0:lx, 0:ly]
mask = (X - lx / 2) ** 2 + (Y - ly / 2) ** 2 > lx * ly / 4
img[mask] = 0
img[range(300), range(300)] = 255

plt.figure(figsize=(3, 3))
plt.axes([0, 0, 1, 1])
plt.imshow(img, cmap=plt.cm.gray)

In [None]:
# Element-wise operations
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Addition
sum_arrays = array1 + array2

# Multiplication
product_arrays = array1 * array2

sum_arrays, product_arrays

# Other topics

In [None]:

# Joining arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
joined_array = np.concatenate((array1, array2)) #concat is a general concept

# Splitting arrays
split_arrays = np.split(joined_array, 2)
#from docs:
# If indices_or_sections is an integer, N, the array will be divided into N equal arrays along axis. If such a split is not possible, an error is raised
# If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in
    # ary[:2]
    # ary[2:3]
    # ary[3:]

joined_array, split_arrays

In [None]:
#other operations
print(array1.sum())
#2d array ->
print(array1.argmax())
print(array1.mean())