# `DSML_WS_01` - Setup & Tools

In this tutorial we will introduce the main tools we will be working with throughout the rest of the course. Everything shown here will become the basis on top of which we will build more sophisticated (and hopefully fun) data science and machine learning tasks.

We will go through the following:

- `Git` and `GitHub` as central course repository
- `Anaconda` installation
- `Jupyter Notebook` introduction
- `Python` introduction

---

## `Git` and `GitHub`

[Git](https://git-scm.com) is an open-source distributed version-control system that has gained increasing popularity in recent years. It allows large teams of software engineers or data scientists to collaboratively work on projects, while keeping track of changes.

In this course we will use a git repository hosted by [GitHub](https://github.com/about) to **share code**, **small data snippets**, links to **lecture material** and **course information** with you.

For the purpose of this course you will not require a deep understanding of Git. All you need is access to our Chair's public GitHub repository from where you can download code and data. This repo can be accessed here:
- Navigate to [`https://github.com/IS3UniCologne/DSML_2023`](https://github.com/IS3UniCologne/DSML_2023)

You should see a screen similar to this one (shown here in dark mode):

![GitLab home](DSML_WS_01_GitHub_Screen.png)

You can view and follow this course via this `GitHub` repository. You can also download all material onto your local machines for when you need to work with code.

**Note**: If you have some experience with using `git` you may also whish to clone the repository on your local machine for ease of use. **We will give you a quick introduction into git in a future session once you have familiarized yourself a little with the tools presented today!**

### <font color='green'>If not already done, please download the course folder from `GitHub` now</font>

---

## `Anaconda`

We highly recommend that you install the [Anaconda Python Distribution](http://docs.continuum.io/anaconda/install/). It will make your life much easier. You can download and install Anaconda on Windows, OSX and Linux.

There are two options:

- **Option 1** (probably the easiest): Install the full **Anaconda distribution**, which comes with a lot of pre-installed packages (incl. Jupyter Notebook) and a package manager graphical user interface (GUI). It is available for download [here](https://www.anaconda.com/distribution/).
- **Option 2** (for slightly more advanced users and if you are short on disk space): Install the much lighter **Miniconda distribution** which essentially is just the Conda package manager and Python. From there you can simply install Python packages as required via the Terminal (Mac, Linux) or the Anaconda Command Prompt (Windows). It is available for download [here](https://docs.conda.io/en/latest/miniconda.html).

### <font color='green'>If not already done, please download and install `Anaconda` or `Miniconda` now and follow the below steps </font>

After installing, open up a terminal:

* If you are on a **Windows** computer, use the "Anaconda Prompt" from the Start menu. 
* On a **Mac**, start up the "Terminal". 
* In **Linux**, use any of the terminals available.

If you have installed the full Anaconda Distribution, to ensure all your packages are up to date, run the following two commands consecutively:

```
conda update -n base conda
conda update jupyter numpy scipy matplotlib pandas

```

If you have installed the Miniconda Distribution, which comes without any pre-installed packages, run the following commands consecutively:

```
conda update -n base conda
conda install jupyter
conda install jupyterlab
conda install numpy scipy matplotlib pandas

```

You can easily check which packages and libraries are installed by executing the following command in your respective terminal:

```
conda list

```

### A note on environments in Python

A virtual environment is a tool you can use to isolate projects and manage dependencies. Virtual environments allow you to install and use packages only for a particular project. You can easily share virtual environments, ensuring that all members in a project team are working with the same tools and versions.

#### Activating and deactivating an environment

Your default ("global") environment is called `base`. Once you open a terminal, you should see `(base)` at the very beginning of the first line, indicating that you are currently using the default environment. You can deactivate the current environment by typing the following command:
```
conda deactivate
```
You should see that the `(base)` at the beginning disappears. Similarly, you can activate the base environment again using:
```
conda activate base
```
Remember, you can only use the packages you installed when the respective environment is activated.

#### Creating an environment

By completing the steps above, you installed the packages (jupyter, numpy, scipy etc.) in your base environment. However, you might want to create a designated environment for DSML. Let's create an environment called `DSML_env` by typing the following command in your terminal:
```
conda create --name DSML_env
```
You might be asked to confirm where the new environment will be stored (y/n?) by typing `y` and confirm with enter.

We can activate our new environment as learned before:
```
conda activate DSML_env
```
You can check whether you are using the new environment by confirming that `(DSML_env)` is shown at the beginning of the line in your Terminal.

After activating the environment, you can install packages in it using the commands as seen above.

You can get a list of all environments that you have available by typing:
```
conda env list
```


---

## `Jupyter Notebook`

The main computational tool we will be using during this course is the [Jupyter notebook](http://jupyter.org/). Notebooks are a convenient way to thread text, code and the output it produces in a simple file that you can then share, edit and modify. You can think of notebooks as the __Word document of Data Scientists__.

Jupyter is a browser-based coding environment, used extensively for prototyping and interactive development in data science applications.  Jupyter Notebook is an evolution of an older project called the IPython Noteboook (this is the origin of the notebook file extension ".ipynb").

The central unit within a Jupyter Notebook are "cells".  These cells can either contain code or Markdown (a simple formatting language, which can also include things like LaTeX equations).  The dropdown menu at the top of the screen indicates the type of the current cell.  

In the following we explain the basic concepts of how to use Jupyter.

### <font color='green'> Follow the below instructions to start up the notebook we are currently working in </font>

### Start a notebook

In order to begin a notebook session, you need to do it from what is called the *command line*, a terminal window that allows you to interact with your computer through written commands. This is how you can start up a terminal:

* If you are on a **Windows** computer, you can start the "Anaconda Prompt" from the Start menu. 
* On a **Mac**, fire up the Terminal utility. 
* In **Linux**, use any of the terminals available.

Then activate the environment you want to work with, for example `DSML_env`:

```
> conda activate DSML_env
```

**NOTE**: you might have to ignore `conda` if you are on Windows and simply type `activate DSML_env`.

Then launch `Jupyter` by typing on the same terminal:

```
> jupyter notebook
```

This should bring up a browser window with a homepage that looks more or less like this (although probably with a different list of files):

![Jupyter home](DSML_WS_01_JupyterNotebook_Start_Screen.png)

Alternatively you can also use `jupyter lab`, a more advanced and (I think) better interface for working with jupyter notebooks. To do so, type the following command in the Terminal/Anaconda Prompt:

```
> jupyter lab
```

(you may have to install by running `conda install jupyterlab`)


Navigate to the folder where you have placed the `DSML_WS_01_Setup&Tools.ipynb` file for this tutorial and click on it. This will open the notebook on a different tab. You are now on the interactive version of the notebook!

![Jupyter home](DSML_WS_01_JupyterNotebook_Target_Folder.png)

When you are finished with the session, you can save the notebook with `File -> Save and Checkpoint` or by clicking the save button. Everything you do on the notebook (text, code and output) is saved into an `.ipynb` file that you can open later, share, commit to a remote git repo, etc.

### Cells

The main building block of notebooks are cells. These are chunks of content which can be cut, pasted, and moved around in a notebook. 

Code cells can be executed by pressing the <i class="fa fa-step-forward"></i> button at the top of the notebook, or 
by simultaneously pressing `shift + enter` (execute and move to the next cell) or `ctrl + enter` (execute and stay on that cell).  All Python code is executed in a single running Python environment, called the "Kernel" in Jupyter Notebook.  Variables are shared across all cells, and the code is executed in the order in which the cells are run (not necessarily sequential in the notebook), which can get your notebook into rather confusing states if you don't always execute cells in order.

Cells can be of two types:

* **Text**, (or markdown) like the one where this is written.
* **Code**, like the following one below:

**Tip**: you can easily switch between text (markdown) and code via shortcuts. If you press `<escape>` and then `y` you switch to **Code**. If you press `<escape>` and then `m` you switch to **Text** (i.e. markdown).

In [1]:
# This is a code cell. If we want to add a comment in a code cell, we can do this after a preceding "#"
# Let's type some code by assigning a variable and printing it

var = 5+7
print(var)

12


__This is a text cell__

Assigning a variable and printing it does not work here:

var = 5+7\
print(var)

It's just text here.

### Shortcuts

You can create a new cell by clicking `Insert` -> `Cell Above`/`Below` in the top menu. By default, this will be a code cell, but you can change that on the `Cell` -> `Cell Type` menu. Choose `Markdown` for a text cell. Once a new cell is created, you can edit it by clicking on it, which will create the cursor bar inside for you to start typing.

**Tip**: Alternatively, cells can also be created with shortcuts. If you press `<escape>` and then `b` (`a`), a new cell will be created below (above). 

As eluded to above, shortcuts can make your life a lot easier. There is a whole bunch of shortcuts you can explore by pressing `<escape>` and `h` (press `<escape>` again to leave the help).

### Code and its output

A particularly useful feature of notebooks is that you can save, in the same place, the code you use to generate any output (tables, figures, etc.). As an example, the cell below contains a snippet of Python that returns a printed statement. This statement is then printed below and recorded in the notebook as output:

In [2]:
print("Hello, world!!!")

Hello, world!!!


Note also how the notebook has automatic syntax highlighting support for Python. This makes the code much more readable and understandable. More on Python as a coding language below.

### Markdown

Text cells in a notebook use the [Github Flavored Markdown](https://help.github.com/articles/github-flavored-markdown/) markup language. This means you can write plain text with some rules and the notebook renders a more visually appealing version of it. Let's see some examples:

* __BOLD__:

`This is **bold**.`

Is rendered:

This is **bold**.

* **ITALIC**:

`This is *italic*.`

Is rendered:

This is *italic*.

* **LISTS**:

You can create unnumbered lists:

```
* Item 1
* Item 2
* ...
```

Which will produce:

* Item 1
* Item 2
* ...

Or you can create numbered lists:

```
1. First element
1. Second element
1. ...
```

And get:

1. First element
1. Second element
1. ...

Note that you don't have to write the actual number of the element, just using `1.` always produces a numbered list.

You can also nest lists:

```
* First unnumbered element, which can be split into:
    1. One numbered element
    2. Another numbered element
* Second element.
* ...
```

* First unnumbered element, which can be split into:
    1. One numbered element
    2. Another numbered element
* Second element.
* ...

This creates many oportunities to combine things nicely.

* **LINKS**

`You can easily create hyperlinks, for example to [WikiPedia](https://www.wikipedia.org/).`

You can easily create hyperlinks, for example to [WikiPedia](https://www.wikipedia.org/).

* **HEADINGS**: including `#` before a line causes it to render a heading.

`# This is Header 1`

Turns into:

# This is Header 1

`## This is Header 2`

Turns into:

## This is Header 2

`### This is Header 3`

Turns into:

### This is Header 3

And so on...

You can see a more in detail introduction in the following links:

>* https://help.github.com/articles/markdown-basics/

>* https://help.github.com/articles/github-flavored-markdown/

If you use headers right the Table of Contents feature will provide a nice tool to navigate your notebook.

### Rich content in a notebook

Notebooks can also include rich content from the web. For that, we need to import the `display` module. This module allows you to embed, for example:
- Youtube videos
- HTML code
- interactive maps
- sound content
- etc.

We will not cover these aspects in great detail but leave this up to you to explore in your own time, if you are interested. A thorough exploration of these features is available at the following [link](https://notebook.community/CestDiego/emacs-ipython-notebook/tests/notebook/nbformat3/Display%20System).


---

## `Python`

The main bulk of the course relies on the [Python](https://www.python.org/) programming language. Python is a [high-level](https://en.wikipedia.org/wiki/High-level_programming_language) programming language widely used today. To give a couple of examples of its relevance, it is underlying [most of the Dropbox](https://www.quora.com/How-does-dropbox-use-python-What-features-are-implemented-in-it-any-tangentially-related-material?share=1) systems, but also heavily [used](https://www.python.org/about/success/usa/) to control satellites at NASA.

This course uses Python because it has emerged as one of the main and most solid options for Data Science, together with other free alternatives such as R. Python is widely used for data processing and analysis both in academia and in industry. There is a vibrant and growing scientific community, working at both universities and companies, that supports and enhances its capabilities for data analysis by providing new and refining existing extensions (a.ka.a. libraries, see below). All of this means that, whether you are thinking of continuing in Higher Education or trying to find a job in industry, Python will be an importan asset that employers will significantly value.

Being a high-level language means that the code can be "dynamically interpreted", which means it is run on-the-fly without the need to be compiled. This is in contrast to "low-level" programming languages, which first need to be converted into machine code (i.e. compiled) before they can be run. With Python, one does not need to worry about compilation and can just write code, evaluate, fix it, re-evaluate it, etc. in a quick cycle, making it a very productive tool. The rest of this tutorial covers some of the basic elements of the language, from conventions like how to comment your code, to the basic data structures available.

We recommend the following resources for further introductory reading:
- [Standard Python Tutorial](https://docs.python.org/3/tutorial/) 
- [Python Data Science Handbook (free ebook)](https://github.com/jakevdp/PythonDataScienceHandbook)

The most common "built-in" data types you will interact with when doing data science work are lists and dictionaries (there are of course additional types like Numpy Arrays, Pandas Dataframes, and others, but these are provided by external libraries). It's good to have a brief understanding of how to use these data structures effectively.

### Python data structures

The standard Python you can access without importing any additional libraries contains a few core data structures that are very handy to know. Most of data analysis is done on top of other structures specifically designed for the purpose (numpy arrays and pandas dataframes, mostly; see the following sessions for more details), but some understanding of these core Python structures is very useful. In this context, we will look at three: `values` (intergers and floats), `lists`, and `dictionaries`.

An abundance of other data structures, which can be specific to external libraries are available (such as Numpy Arrays, for example). We will cover some of these in later tutorials.

#### Values 

These are the most basic elements to organize data and information in Python. You can think of them as numbers (integers or floats) or words (strings). Typically, these are the elements that will be stored in lists and dictionaries.

An `integer` is a whole number:

In [3]:
# assign a value to a variable i and return type of i
i = 7
type(i)

int

In [4]:
# print i
print(i)

7


A `float` is a number that allows for decimals:

In [5]:
# assign a value to a variable f and return type of f
f = 5.5
type(f)

float

In [7]:
# print f
print(f)

5.5


Note that a float can also not have decimals and still be stored as such:

In [8]:
# assign the value 5. to a variable fw and return type of fw
fw = 5.
type(fw)

float

In [9]:
# print fw
print(fw)

5.0


The standard Python language includes some data types (e.g. lists, tuples, dictionaries, etc.) and allows many basic operations (e.g. sum, product, etc.). For example, right out of the box, and without any further action needed, you can use Python as a calculator:

In [10]:
# calculate 5 + 5
5+5

10

In [14]:
# calculate 2.0 / 3
2. / 3

0.6666666666666666

In [16]:
# perform a more complex calculation
(3 + 5) * 2. / 3

5.333333333333333

The modulo operator (%) returns the remainder of a division.

In [18]:
# calculate the remainder of 4 divided by 2
4%2 

0

A `string` is a word, which can be delimited by single or double quotation marks (quotes have to match): 

In [19]:
# assign a string to variable A and another string to variable B
A = "data science"
B = 'and machine learning'

In [21]:
# print A
print(A)

data science


In [20]:
# print A,B
print(A,B)

data science and machine learning


In [22]:
# print the type of A
type(A)

str

Some mathematical operations can be applied for strings but they have different functions. Only sum `+` and multiply `*` can be used for strings. Note that the "+" operator concatenates two strings without a space in between, whereas using the print function and a comma in between several strings does add a space (see above).

In [23]:
# print A,B
print(A,B)

data science and machine learning


In [24]:
# print A+B
print(A+B)

data scienceand machine learning


In [27]:
# print * 3
print(A*3)

data sciencedata sciencedata science


In [28]:
# add a space in the operation above
print((A + " ")*3)

data science data science data science 


* **Lists**: a list is an ordered sequence of values that can be of mixed types. They are represented between squared brackets (`[]`) and, although not very efficient in memory terms, are very flexible and useful to "put things together".

For example, the following list of integers:

In [29]:
l = [1, 2, 3, 4, 5, 6, 7]
print(l)

[1, 2, 3, 4, 5, 6, 7]


In [30]:
type(l)

list

Or the following mixed one:

In [31]:
m = ['a', 'b', 5, 'c', 6, 7.6]
m

['a', 'b', 5, 'c', 6, 7.6]

Lists can be queried and sliced. For example, the first element can be retrieved by:

In [32]:
# print the length of list m using len(m)
len(m)

6

In [33]:
# retrieve the first element of m using m[0]
m[0]

'a'

In [35]:
# retrieve the last element of m using m[-1]
m[-1]

7.6

In [36]:
# retrieve the first 2 elements of m using m[0:2]
m[0:2]

['a', 'b']

Lists can be added:

In [37]:
# add lists l and m
l + m

[1, 2, 3, 4, 5, 6, 7, 'a', 'b', 5, 'c', 6, 7.6]

Note that this does not change the original lists:

In [39]:
# return l
l

[1, 2, 3, 4, 5, 6, 7]

Use the extend() method to change l to include the elements of m.

In [40]:
# use l.extend(m)
l.extend(m)

In [41]:
# return l
l

[1, 2, 3, 4, 5, 6, 7, 'a', 'b', 5, 'c', 6, 7.6]

You can also change a specific element of a list

In [42]:
# remember our list m
m

['a', 'b', 5, 'c', 6, 7.6]

In [43]:
# print the second value of m
m[1]

'b'

In [44]:
# assign a new value to m[1]
m[1] = 4

In [45]:
# return m
m

['a', 4, 5, 'c', 6, 7.6]

* **Dictionaries**: dictionaries are unordered collections of "keys" and "values". A key, which can be of any kind, is the element associated with a "value", which can also be of any kind. Dictionaries are used when order is not important but you need fast and easy lookup. They are expressed in curly brackets, with keys and values being linked through columns.

For example, we can think of a dictionary to store a series of names and the ages of the people they represent:

In [46]:
ages = {'Ana':24, 'John': 20, 'Li': 27, 'Ivan': 40, 'Tali':33}
ages

{'Ana': 24, 'John': 20, 'Li': 27, 'Ivan': 40, 'Tali': 33}

In [47]:
# return the type of ages
type(ages)

dict

In [48]:
# return the keys of ages by using the .keys() method
ages.keys()

dict_keys(['Ana', 'John', 'Li', 'Ivan', 'Tali'])

Dictionaries can then be queried and values retrieved easily by using their keys. For example, if we quickly want to know John's age:

In [50]:
# return John's age by using ages['John']
ages['John']

20

Similarly to lists, you can modify and assign new values:

In [51]:
# let's add Karsten to our list, who is 29 years old
ages['Karsten'] = 29
ages

{'Ana': 24, 'John': 20, 'Li': 27, 'Ivan': 40, 'Tali': 33, 'Karsten': 29}

In [54]:
# Li turned 28 so let's change her age
ages['Li'] = 28
ages

{'Ana': 24, 'John': 20, 'Li': 28, 'Ivan': 40, 'Tali': 33, 'Karsten': 29}

You can create entirely empty dictionaries using curly brackets and populate them later on:

In [55]:
# create an empty dictionary called dict
dict ={}

In [56]:
# add a key, value pair to dict
dict["key1"] = 99

In [57]:
# return dict
dict

{'key1': 99}

### Help

A very handy feature of Python is the ability to access on-the-spot help for its different functions. This means that you can check what a function is supposed to do, or how to access it, right inside your Python session. Of course, this also works handsomely inside a notebook. There are a couple of ways to access the help. 



In [67]:
help(type)

Help on class type in module builtins:

class type(object)
 |  type(object) -> the object's type
 |  type(name, bases, dict, **kwds) -> a new type
 |  
 |  Methods defined here:
 |  
 |  __call__(self, /, *args, **kwargs)
 |      Call self as a function.
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __dir__(self, /)
 |      Specialized __dir__ implementation for types.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __instancecheck__(self, instance, /)
 |      Check if an object is an instance.
 |  
 |  __or__(self, value, /)
 |      Return self|value.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __ror__(self, value, /)
 |      Return value|self.
 |  
 |  __setattr__(self, name, value, /)
 |      Implement setattr(self, name, value).
 |  
 |  __sizeof__(self, /)
 |      Return mem

To call up the help menu another extremely useful shortcut is the key combination `shift+tab`. This, when pressed while your cursor is inside a function or method will produce a pop-up help menu.

### Control flow (a.k.a. `for` loops and `if` statements)

Although this is not a comprehensive introduction to computer programming or general purpose Python (check the references for that, in particular Allen Downey's [book](http://www.greenteapress.com/thinkpython/thinkpython.html)), it is important to be aware of two building blocks of almost any computer program: `for` loops and `if` statements. 


They can also come in very handy in cases where you need some extra functionality out of standard methods. Without further ado, let us have a look and the two single most relevant tools of computer programming.

* `for` loops

The general structure of a `for` loop is:

```
for conditional == True:
    Do
```

This allows you to repeat a particular action or task over a sequence. As an example, you can print your name ten times without having to type it yourself every single time:

In [68]:
# let's define a list of values called lst from 0 to 9

lst = [0,1,2,3,4,5,6,7,8,9]

In [69]:
# use lst to print your name ten times
for i in lst:
    print(i,'yourname')

0 yourname
1 yourname
2 yourname
3 yourname
4 yourname
5 yourname
6 yourname
7 yourname
8 yourname
9 yourname


Note that you do not have to create a list for the sequence you want to loop over. Alternatively you can use the buil-in `range()`function.

In [70]:
for i in range(10):
    print(i,'myname')

0 myname
1 myname
2 myname
3 myname
4 myname
5 myname
6 myname
7 myname
8 myname
9 myname


Note a couple of features in the loop:

1. You loop *over* a sequence, in this particular case the sequence of ten numbers defined in `lst` or created by `range(10)`.
1. In every step, for every element of the sequence in this case, you repeat an action. Here we are printing the same text, `my name`.
1. Although not used in this simple loop, each of the elements you loop over can be accessed inside the loop. This can be irrelevant, as in the loop above, or extremely useful, it depends on the context. For example, see a case where you use the value of the sequence in each step:

In [71]:
for i in range(12):
    print("I am at step ", i)

I am at step  0
I am at step  1
I am at step  2
I am at step  3
I am at step  4
I am at step  5
I am at step  6
I am at step  7
I am at step  8
I am at step  9
I am at step  10
I am at step  11


One more note: for convention, we are calling the element of the sequence `i` (for iterator), but this could be named anything. In fact, in many cases, more meaningful names make code much more readable. For example, you could re-write the loop above as:

In [72]:
for step in range(10):
    print("I am at step ", step)

I am at step  0
I am at step  1
I am at step  2
I am at step  3
I am at step  4
I am at step  5
I am at step  6
I am at step  7
I am at step  8
I am at step  9


* `if` statements

We have just seen how `for` loops allow you to repeat an action over a sequence. In the case of `if` statements, these allow you to select or restrict such actions to only those cases that meet a condition(s) you specify in the statement.

The general structure of `if` statements is:

```
if conditional 1:
    statement 1
    
elif conditional 2:
    statement 2 
    
else : 
    statement 3
```


For example, if you think of the loops written above, you might want to only print those that are even, skipping those that are odd:

In [74]:
for i in range(10):
    if i%2 == 0:        # remember: a number is even if it is divisible by 2
        print(i)

0
2
4
6
8


A full `if` statement also allows for an action to be taken if the original condition is not satisfied. This is called an "ifelse" statement. For example, you can think of a loop that prints the type of each number in a sequence:

In [75]:
lst

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [76]:
for i in lst:
    # Check if i is odd
    if i%2 != 0:
        print(i, ' is odd')
    # If not odd, then do the following
    else:
        print(i, ' is even')

0  is even
1  is odd
2  is even
3  is odd
4  is even
5  is odd
6  is even
7  is odd
8  is even
9  is odd


### Functions

The last part of this quick overview relates to functions, or more properly termed, methods. The motivation is that, so far, we have only seen how you can create Python code that, if you want to run again somewhere else, you need to copy and paste entirely. However, as we will see in more detail later in the course, one of the main reasons why you want to use Python for data analysis, instead of a point-and-click graphical interface like SPSS, for instance, is that you can easily reuse code and re-run analyses. Methods help us accomplish this by encapsulating snippets of code that perform a particular task and making them available to be called.

We have already *used* methods here. When we call `range()`, we are using one of them. Now, we will see how to *create* a method of our own that performs the specific task we want it to do. For example, let us create a very simple method to reproduce the first loop we created above:

In [79]:
# define the method
def run_simple_loop():
    
    for i in range(10):
        print(i)
    
    return None

Already with this simple method, there is a bunch of interesting things going on:

* First, note how we define a bit of code as a method, as oposed to plain Python: we use `def` followed by the name of our function (we have chosen `run_simple_loop`, but anything is possible).
* Second, we append `()` after the name, and finish the line with a colon (`:`). This is necessary and will allow us to specify requirements for the function (see below).
* Third, realize that everything inside a function needs to be indented. This is a core property of Python and, although some people find it odd, it enhances readibility greatly.
* Fourth, the piece of code to do the task we want, printing the sequence of numbers, is inside the function in the same way it was outside, only properly indented.
* Fifth, we finish the method with a line starting by `return`. In this case, we follow it with `None`, but this will change as methods become more sophisticated. Essentially, this is the part of the method where you specify which elements you want it to return and save for later use.

Once we have paid attention to these elements, we can see how the method can be *called* and hence the code inside it executed:

In [80]:
# call our defined method
run_simple_loop()

0
1
2
3
4
5
6
7
8
9


This is the simplest possile method you can write: you do not require any inputs, just executing the method and the code produces an output (the printout), which it is not saved anywhere. The rest of this section relaxes these two aspects to allow us to build more complex, but also more useful, methods.

First, you can specify "arguments" to be passed that modify the behaviour of the method. The main aspect to pay attention to in this context is that the arguments need to be variables, not particular values. Let us see a modified example of our method:

In [82]:
# define method with argument
def run_simple_loopX(x):
    for i in range(x):
        print(i)
    return None

We have replaced the fixed length of the sequence (10) by a variable named `x` that allows us to specify *any value we want* when we call the method:

In [85]:
# call method with argument
run_simple_loopX(3)

0
1
2


Our function does not save (or more accurately return) anything:

In [89]:
# let's try to save what our function outputs to a variable called b
b = run_simple_loopX(3)

0
1
2


In [90]:
# let's print b
print(b)

None


We can modify this using the last line of a method. For example, let us assume we want to return a sequence as long as the series of numbers we print on the screen. The method should be:

In [91]:
# define method
def run_simple_loopX(x):
    for i in range(x):
        print(i)
    return range(x)

Note the main difference: instead of returning `None`, we are telling Python to return a sequence, which has the same length as the one used to specify the loop. Now, there is an alternative way of being more efficient in this method, and that is assigning the sequence to a new object and using it when necessary later on. The results are exactly the same, but there are less computations performed and, more critically, we minimize the chances of making mistakes.

In [98]:
# define method
def run_simple_loopX(x):
    seq = range(x)
    for i in seq:
        print(i)
    return seq

Either of these two new versions of the method return an output:

In [99]:
a = run_simple_loopX(3)

0
1
2


In [100]:
# return a
a

range(0, 3)

The advantage of methods, as opposed to straight code, is that they force us to think in a modular (object-oriented) way, helping us identify exactly what needs to be done, in what order, and what is required. Encapsulating these atomic bits of functionality in methods allows us to write things once and flexibly use them everywhere, saving us time (and headaches) in the long run.

A final note on functions. It is important that, whenever you create a function, you include some documentation about what it expects, what it does, and what it returns. Although there are many ways of doing this, the typical convention for these so-called __docstrings__ (i.e., documentation strings) is as follows:

In [101]:
def run_simple_loopXout(x):
    
    """
    Print out the values of a sequence of certain length
    ...
    
    Arguments
    ---------
    x     : int
            Length of the sequence to be printed out
    
    Returns
    -------
    seq   : np.array
            Sequence of values printed out
    """
    
    seq = np.arange(x)
    for i in seq:
        print(i)

    
    return seq

Documentation, as any string, are highlighted in red on a notebook. Let us have a look at the structure and components of a well-made documentation:

* It is encapsulated between triple commas (`"""`).
* Begins with a short description of what the method does. The shorter the better, the more concise, the even better.
* There is a section called "Arguments" that lists the elements that the function expects. 
* Each argument is then listed, followed by its type. In this case it is an object `x` that, as we are told, needs to be an integer.
* The arguments are followed by another section that specifies what the function returns, and of what type the output is.

Documentation in this way is very useful to remember what a function does, but also to force yourself to write clearer code. A bonus is that, if you include documentation in this way, it can be checked with the standard `help` or `?` systems reviewed above:

In [103]:
help(run_simple_loopXout)

Help on function run_simple_loopXout in module __main__:

run_simple_loopXout(x)
    Print out the values of a sequence of certain length
    ...
    
    Arguments
    ---------
    x     : int
            Length of the sequence to be printed out
    
    Returns
    -------
    seq   : np.array
            Sequence of values printed out



### Exercise to work on your own

Write a properly documented python function that can perform the taks of a simple calculator with the following behaviour:

1. The user shall pass the desired mathematical operation (plus, minus, divide, multiply) and two numbers

1. The result shall be calculated and printed

1. If the input for the mathematical operation is not covered by the list from above, print "Please correct your input"

**You should use the following elements:** 

- If/elif/else statement
- Functions
- Mathematical experessions

In [1]:
def simple_calc(first_num, ops, second_num):
    
    
    """
    Perform a mathematical operation between two numbers
    ...
    
    Arguments
    ---------
    first_num     : int/float
                    first number in calculation
                    
    ops           : str
                    mathematical operation; ops=["plus", "minus", "multiply", "divide"]
    
    second_num    : int/float
                    second number in calculation
    
    Returns
    -------
    result : str
            Operation and results of operation as string
    """
    #### Your Code below
    
    # check numerical input
    if type(first_num) != int and type(first_num) != float:
        return print("Please provide numerical input for first_num")
        
    if type(second_num) != int and type(second_num) != float:
        return print("Please provide numerical input for second_num")
    

    # check if selected operation is in ["plus","minus","multiply","divide"]
    if ops not in ["plus","minus","multiply","divide"]:
        return print("Please provide valid operation! Choose from [plus,minus,multiply,divide]")
    
    
    # perform calculations
    if ops == "plus":
        result = first_num + second_num
    elif ops == "minus":
        result = first_num - second_num
    elif ops == "multiply":
        result = first_num * second_num
    elif ops == "divide":
        result = first_num / second_num

  
    return print(first_num, ops, second_num, "=", result)
    

In [5]:
simple_calc(5,"divide",2)

5 divide 2 = 2.5


In [6]:
simple_calc("five","divide",2)

Please provide numerical input for first_num


In [7]:
simple_calc(5,"div",2)

Please provide valid operation! Choose from [plus,minus,multiply,divide]


---