# Chapter 1 - Writing Python scripts

## Hello World
To begin we will need a text editor. Though any text editor can work for our purposes, the best would be to use an integrated developement environment (IDE). A good recommendation for both windows PCs and macs is VScode, though if you have any other preferrence it will be totally fine. The Hub's JupyterLab is also suitable.
For the time being, we will work within the in the AIHub python environment, so we will assume you are working with the JupyterLab.
*In case you are not familiar with using the JupyterLab server, or in case you need a refresher, see the mini-tutorial at `extra/jupyterlab.ipynb`*

We can now write and execute a simple introductory script.
1. Open a new file
2. Write the following bit of code into the file `print('Hello World!')`, and save it
3. Save the file as `hello_world.py`
4. Open a either a terminal window in your IDE (or a terminal window on your mac/linux, or a powershell window on your windows PC).
5. Type in `python hello_world.py`. The words `Hello World!` should appear (be printed) on the following line.

Congrautlations! You have just run and executed the simplest of python scripts!

When we run a python script from the command line, each line of the script is executed in order, from top to bottom. 
Let's illustrate this by a slightly more complex example:
1. Re-open the `hello_world.py` script.
2. Modify it to include the following piece of code:
```
def print_hello():
    print(Hello World!)
```
3. Save the file and run it again.

Nothing happens! Why is this?

The answer is quite simple really - we defined a function that prints `Hello World!`, but we didn't call the function! If we add `print_hello()` to the end of our script and re-run it, we will now see the same message as before printed on the new line.

Finally, anything that we could do in a jupyter notebook we can do in a script, except that the order matters now, and the run isn't (by default) interactive anymore.
Let's illustrate this once more:

1. Re-open the `hello_world.py` script.
2. Move the `print_hello()` to the $1^{st}$ line of code.
3. Re-run the script.

You will now get the following error: `NameError: name 'hello_world' is not defined`.
This is because we tried running a function before it was defined! moving the `print_hello()` to the end of the file will fix this.

As for the interactivity - it is possible to add varying degrees of interactivity to the code. As a rule of thumb (for reasones that we will see later on), for code that we inted to run on a high-performance machines, interactivity is a bad idea.

## The KNN tutorial
For this part, we will solve again the [KNN exercise](https://github.com/ai-hub-weizmann/ex-intro-tutorials/blob/main/ex_home_KNN.ipynb), 
so make sure you already solved it in the "normal" way, and got your feedback.
Note that even though some parts of the solution to the KNN exercise will be given here, this exercise relies heavily on **your** solution.

Now, let's move on to implement the KNN tutorial in script format.
For this, let's open a new python file and call it `iris_knn.py`.

Generally, best practice is to write all you imports in the beginning of the file.
You usually won't know all your imports at the when you begin writing you script. It's perfectly OK! Everything can always be edited later.

### Let's begin!
So, let begin by importing `load_iris` by writing:
`from sklearn.datasets import load_iris`.

Now, rewrite the answer to Question 1 of the KNN exercise:
1.  Write a function `load_iris_data(random_state, test_size)` that loads the data with a random state and test size as input parameters.
2.  Print the shape of the four output matrices (format the printed string to be meaningful)
3.  Print the features and targets in the database
4.  Print a statistical description of the data

**Note:** There is no need to import **anything** else to solve this part!

So now run this code and verify that your results now are the same as when you solved the KNN tutorial.

### Default values for arguments
When we define functions we have to declare the variables that the function will accept (the arguments) in the declaration statement. Sometimes, when there is an argument that should have a default value that will be commonly used but we want to allow the flexibility of changing it if needed, we would like to give it the default value, and not have the user input it every time they call the function.
Luckily for us, python has two types of arguments - positional (mandatory) arguments and keyword (optional) arguments (actually this is a lie. There is also an arbitrary positional and an arbitrary keyword type of arguments but this is too much for now).
The difference is that optional arguments have default values so we have the option to not mention them explicitly when we call the function, and when we do we can call them by their name so the exact position where we put them doesn't matter.
For example, if we define `def func(a, b, c=10, d=15)`, the arguments `a` and `b` are mandatory and they are determined by their position, while `c` and `d` can be either ommitted, called by their position, or called by their name.
To illustrate this, the following function calls are equivalent:

`func(2, 4)`, `func(2, b=4, c=10)`, `func(2, 4, d=15, c=10)`

However the following two function calles will have different results:

`func(2, 4, d=15)`, `func(4, 2, c=10)`

This is because in the firs case `a` will be assigned a value of `2` and `b` a value of `4`, while in the second case it will be the opposite.

Now let's look at some common errors:
* Calling `func(2)` will give the error `TypeError: func() missing 1 required positional argument: 'b'`, because one mandatory variable is missing.
* Calling `func(c=8, 2, 4)` will give the error `SyntaxError: positional argument follows keyword argument`, because the positional arguments are in the wrong possition.
* Calling `func(a=2, 4, c=9)` will give the error `SyntaxError: positional argument follows keyword argument`, because positional arguments must always come before keyword arguments, and we treat the argument `a` as a keyword argument.
* Callin `func(4, a=2)` will give the error `TypeError: func() got multiple values for argument 'a'` because the 1<sup>st</sup> argument is assigned to `a` and then we try to assign it again in the 2<sup>nd</sup> argument as well.



### The `if __name__ == '__main__':` statement as a "`main`" function
As you might already know, everything is an object in python.
Objects are better explained in other tutorials, but for our purpuses it is enough to know that objects are a special type of variables that can contain both data (called "attributes") and functions (called "methods").
When we say that "everything" in python is an object we mean that ***absolutely everything*** is an object. That means that the scripts that we are writing are also objects, and as such they have methods (for example the functions we write inside the script) and attributes (for example the variables we declare inside the script).
Some special attributes and methods in python objects start and end with double underscores, so we abreviate them to "dunder".

One example of the dunder attributes is __name__. This attribute gets its value dependeing on how we run our code, so that if we run it as a script by calling e.g. `python iris_knn.py` it will get the value `__main__`, but if we run the code as a module by importing it inside another script e.g. `import iris_knn` it will get the name of the module as its value.
This allows us to create code that will only be executed if and when we run our code as a script.

In our example it's not too important where we print the information about our data, but if you plan to use only some part of the functions we will write later on in a different project you wouldn't want to execute the printing parts every time you import your code, so you could put all the printing parts inside the `if __name__ == '__main__':` statement, and then it will only be executed when the code is run as a script.

Let's try this now:
1. Add `if __name__ == '__main__':` to the end of your code.
2. Move all the variables, function calls, description, and printing statements inside the `if __name__ == '__main__':` statement.
3. Run the code and verify that nothing has changed.

### Back to the KNN
When we write our code, it is good practice to write short functions that do a single task each.
In this tutorial short is ~5 lines, but in your code short can be also 100+ lines. 
So long as the logic of the functions is divided such that each function has a single task this is OK.
There are many reasons for this, so let's number a few of them:
* Single-task functions are much easier to read and document.
* You might think now that two tasks will always come together, but later on you will need them separately.
* Whenever there is an error or unexpected behavior of the code, single-task functions are much easier to debug.

So, taking these points into consideration, let's continue with the KNN by defining some more functions:
1. Define the `calc_distance(x0,X)` function as in Question 2
2. Define the `knn(k, x0, X, y)` function as in Question 3
3. Define the `calc_accuracy(k, X_train, X_test, y_train, y_test)` function as in Question 4
4. Define the `calc_acc_vec(k_vec, X_train, X_test, y_train, y_test)` function as in Question 5.
Make sure that this function only calculates the accuracy vector, and does not plot anything!
5. Define a `calc_multiple_acc_vec` function that receives at most the same positional arguments as `calc_acc_vec`, with two added keyword arguments `n_iterations` and `max_k` that allow the user to chose how many iterations and the maximal k to try with default values of 20 and 25 respectively.
This function should be symilar to the "main" function of Question 6, so it should load the dataset without a predefined random state.
Feel free to add any necessary parameters.
6. Finally, let's try making a new function for plotting purposes.
Define a single function `plot_acc(k_vec, accuracy)`. This function should plot a line graph of the accuracy Vs. k if a single run of `calc_accuracy` was performed, and an errorbar plot if multiple runs of `calc_accuracy` were performed.
You are free to chose the formatting of the graphs, but **do not** add any input arguments.
Make sure your graphs are nicely formatted.

Excellent!
Now just don't forget to test all use cases of your functions under the `if __name__='__main__':` statement, save it, and verify it with your tutor before continuing.