# **Introductory Jupyter notebook**
Hi there! Welcome to this short introductory Jupyter notebook. The purpose of this notebook is to quickly introduce the way the practicals in the course will work, while also introducing you to some Numpy features. In this course, I often ask you to write functions that take specific inputs and produce specific outputs. Some of you may be used to this functional programming, while others are more familiar with scripting or coding as you go. The Introduction to Python course doesn't spend a huge amount of time on making functions (as far as I know), so this refresher might be useful. In the previous edition of the course some students thought learning ML stuff _and_ adapting to this functional style was a bit much, so this introductory notebook is to aid you **_if you need it_**. 

[Here](https://defkey.com/jupyter-notebook-shortcuts) are some Jupyter notebook keyboard shortcuts. 

In [None]:
#run this cell to set things up
import numpy as np
import seaborn as sns

## Loading in some data
Below, load in `SampleData1.csv` into the variable `my_awesome_data` using the `np.loadtxt()` function ([documentation](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html)). Make sure to set `encoding='utf-8-sig'`. Print the data, and compute the average and standard deviation of all the values. Look for the Numpy functions to use using your favourite search engine (:

**If you have never worked with Numpy at all:** please read [this](https://numpy.org/doc/stable/user/absolute_beginners.html). 

In [None]:
#your answer here


## Working along the axes
Okay great. You get only one value out for the average and standard deviation. That's because the default behaviour flattens the array. Maybe you want the average per row or per column. And what does flattening mean? Let's investigate.  
Below:
* print `my_awesome_data.shape`
* print `my_awesome_data.flatten()` and `my_awesome_data.flatten().shape`
* print `np.array([my_awesome_data.flatten(), np.newaxis]).shape`
* print `my_awesome_data.flatten()[:, np.newaxis]` and try to figure out what is going on by a) changing around the : and np.newaxis and b) printing the shape of the resulting array.
* Calculate the average and standard deviation over the rows of `my_awesome_data` by looking at the `axis` argument of the numpy functions used to calculate them.
* Finally, use `np.reshape()` to turn the flattened array back into the correct 3 by 3 matrix of numbers. Compare what happens when you use (3,3) and (3,-1) and (-1, 3) for the shape, and explain why this happens. Documentation is [here](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html)



In [None]:
#your answer here



## Summing that up

What you should see is that flatten makes a set of numbers into a 1D array, with dimensions (n_numbers, ). However, if we want a table, i.e. a 2D array, we can make it by adding another axis to the data. Then, for instance, the 9-number array has dimension  (1,9), for 1 row, nine numbers. Now, you could append another row below by using `np.concatenate((my_awesome_data.flatten()[np.newaxis, :], my_awesome_data.flatten()[np.newaxis, :]))`. You don't have to do too much with this information, except that you know that Numpy uses axes to determine what 'shape' numbers are in, and we might sometimes want to change the shape. Things can break down if numbers are not both in 2D arrays when you want to combine them into a bigger table, and then you need to know what might be going on, namely: mismatched shapes.

## Working with functions

The bread and butter of this course will be me asking you to write a specific function, and you thanklessly slaving away to satisfy these inane desires. Such is life. Let's practice this!

Remember, a function in Python is defined using the def keyword, along with arguments (and optionally their standard values). Below, I give an example function definition of a highly useful function. Make sure you understand what is going on.

In [None]:
def myFunc(some_number, some_other_number, a_string_for_good_measure="String theory is not about me *cries in 17 dimensions*", printit = True):
    if printit: print("Testing 1, 2, 3:")
    if not isinstance(some_number, (int, float)) or not isinstance(some_other_number, (int,float)):
        print("Error, numbers should be numbers, man.")
        return
    the_sum = some_number + some_other_number
    if printit: print(f'Sum of numbers: {the_sum}')
    if printit: print(a_string_for_good_measure.split(" "))
    return the_sum

a_sum = myFunc(12, 3)
print("\n")
a_mistake = myFunc("whoopsie", 4)
a_sum_two_electric_boogaloo = myFunc(15, 3, printit=False)
print(a_sum)
print(a_mistake)
print(a_sum_two_electric_boogaloo)


## Making functions

Say I want you to make a function `giveMeMyMatrixMan()` that takes in two arguments `shape` and `method`, in that order. Shape is a tuple with 2 values (e.g. `(5, 6)`), method can be either `'zeroes'`, `'ones'`, or `'normal'`. The function should return a matrix/table of dimension shape filled with either all 0s, all 1s, or values drawn from a standard normal distribution, depending on the method argument. If the method argument is something other than the three possibilities, the function should say so and return nothing (specifically: `None`). If the method argument is not given, it should default to the normally distributed values.

How do you tackle this? Well, the easiest way is to start by just defining variables and doing the operations necessary, and only then turning it all into a function. For instance, you could define `shape = (3,4)` and `method = 'normal'`, and then write the desired behaviour using an if-statement for method and the correct Numpy function. Then you can expand this, and finally put everything into a function. As an illustration for zeroes below:

In [None]:
shape = (3,4)
method = 'zeroes'
if not isinstance(shape, tuple):
    print("Wrong argument type! shape should be a tuple with 2 values")
if not len(shape) == 2:
    print("Wrong # of dimensions! Only 2D array supported. Change the shape tuple!")
if not isinstance(method, str) or not method in ["zeroes", "ones", "normal"]:
    print("Wrong method value, should be str and one of [zeroes, ones, normal] ")
if method == "zeroes":
    result = np.zeros(shape)
print(result)
print("\n")

#if you would only want the function to do this one thing:

def giveMeMyZeroMatrixMan(shape, method='zeroes'):
    if not isinstance(shape, tuple):
        print("Wrong argument type! shape should be a tuple with 2 values")
        return
    if not len(shape) == 2:
        print("Wrong # of dimensions! Only 2D array supported. Change the shape tuple!")
        return
    if not isinstance(method, str) or not method in ["zeroes", "ones", "normal"]:
        print("Wrong method value, should be str and one of [zeroes, ones, normal]")
        return
    if method == "zeroes":
        result = np.zeros(shape)
    else:
        result = None
    return result

print(giveMeMyZeroMatrixMan((2,8)))
print("\n")
print(giveMeMyZeroMatrixMan((2,8), method = 'normal'))
print("\n")
print(giveMeMyZeroMatrixMan((2,8), method = 'DastardlyDeviouslyWrongMethodString'))

## Truly up to you now!

Implement the function that can also give you a matrix of values drawn from a normal distribution and a matrix of ones. Test it by running `print(giveMeMyMatrixMan((6,5)))`.

**Note:** I went a bit far in checking the arguments. More of the function consists of that than of actual function calls. During the course you won't have to be so strict, but do keep in mind the maxim 'Fail early'. Check that things are as they should be and if not make sure to fail. It will protect you from arcane errors down the line!

In [None]:
# your answer here

## Sum of squares

Say you have a linear regression where you are predicting life satisfaction (which is, obviously, a measurable continuous variable from 0-100) based on the number of guitars you have stored away in your parents' attic (a continuous value if ever there was one, I have 3.79 stowed away myself). Write a function that can take in a 1D array of predicted values and a 1D array of true values, and calculate the sum of squares error term for the predictions (that is, the sum of the square of the difference between the predicted values and the true values). If you want to check whether something is an array, use `np.ndarray`. If you want to check the dimensionality of an array you can use the `.ndim` property. So:
* Make `mySSEFunction(predicted, target)` which returns a tuple of two numbers, the SSE between predicted and target, and the mean-squared error (MSE): `(SSE, MSE)`.
* Test it with the provided arrays. The result should be (111, 22.2).

In [None]:
testPred   = np.array([25, 50, 54, 80, 92])
testTarget = np.array([30, 47, 50, 74, 97])

# Your function here


#uncomment when done
#print(mySSEFunction(testPred, testTarget))

## Broadcasting

Let's look at a nice feature of Numpy: broadcasting. If you have a 3 by 4 matrix and a 3 by 1 column, you might expect that Numpy doesn't know how to multiply these. But it does, because it automatically _broadcasts_. It will assume that what you want is to multiply each of the columns in the 3 by 4 matrix with the same 3 by 1 column. See below and read up on it [here](https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm).

In [None]:
mat_norm1 = np.random.normal(size = (3,4))
mat_norm2 = np.random.normal(size = (3,1))
print(mat_norm1.shape)
print(mat_norm1)
print(mat_norm2.shape)
print(mat_norm2)

print(mat_norm1 * mat_norm2)

## Find the coordinates of the maximum and minimum value in a matrix

This is a more difficult final task to see how you get on with more minimal instructions. In principle practical exercises during the course should be explained in more detail, so just try this and see how you get on. If you don't manage or need to spend a long time: just skip it! You already have some idea of what Numpy can do and that's fine as a basis. Remember to use search engines to your advantage during the course (:

Write a function `getMaxMinCoords` that returns the coordinates of the maximum and minimum element in a `numpy.ndarray` with 2 or more dimensions as a tuple of tuples. That is, if it is a 2D array, you will get something like `((coord1_max, coord2_max), (coord1_min, coord2_min))`. Here `coord1_max` would be like the row-index of the maximal number in the 2D array, and `coord2_max` the column-index. If there's no unique maximum or minimum element, _the function should fail, return `None`, and tell you so_. Be sure to search through the documentation of Numpy and Stackoverflow for this one!
Let's not make it too arduous a search: check out `np.argmax()` and the examples on its documentation page (: Good luck!

In [None]:
test_mat_one = np.random.normal(loc = 50, scale = 5, size = (5,5))
test_mat_two = np.zeros_like(test_mat_one)
test_mat_three = np.random.normal(loc = 50, scale = 5, size = (3, 3, 3)) #this is a cube of values.
test_1D_array = np.array([42, 42, 42])


# Your function here



# Tests (uncomment once your function is done)

# print(test_mat_one)
# print(getMaxMinCoords(test_mat_one))
# print("\n")
# print(test_mat_two)
# print(getMaxMinCoords(test_mat_two))
# print("\n")
# print(test_mat_three)
# print(getMaxMinCoords(test_mat_three))


## Final words

This was a very basic (can you feel those hydroxide ions deep inside?) introduction to the type of practicals you'll be doing during the course: using Numpy and writing functions yourself. Of course, we will be applying that to ML problems instead of random Numpy functionality. To get a bit of an overview of ML before we get started, I highly recommend scanning [this](https://www.nature.com/articles/s41580-021-00407-0). If that doesn't work, try [this](https://rdcu.be/c6jAm). If that doesn't work you have been blessed by whatever deity you'd prefer and can skip this completely. You can skip this completely anyway since it's an optional thing to read before the course starts, but that'll be our secret. 