The computer labs in this workshop will be run on Google Colab. This first lab will be about using Google Colab. It has been adapted from [Lab 01 of IIBM3202 Molecular Modeling and Simulation](https://github.com/pb3lab/ibm3202/blob/master/tutorials/lab01_intro.ipynb) from the Institute for Biological and Engineering at Pontificia Universidad Catolica de Chile.

# Part I. Introduction to Google Colab

## What is Google Colab? 🤔

Google Colaboratory, or "Colab" for short, allows you to write and execute Python code in your browser, with 
- Zero configuration required
- Free access to graphical processing units (GPUs)
- Easy sharing

The only requirement is a Google account. Calculations are run remotely on a Google-operated virtual machine.

You can watch the [Introduction to Colab](https://www.youtube.com/watch?v=inN8seMm7UI) video recommended by Google Colab to learn more, or just get started below!

Note that at your left you have three icons. The first one looks like this:

><img src="https://upload.wikimedia.org/wikipedia/commons/b/bb/Summary_icon.svg" width="100">

and corresponds to the Table of Contents of this tutorial.

The last folder icon corresponds to the File Explorer of the virtual machine hosted in Google Cloud that is assigned to your session. 

## The central concept of Google Colab Notebook: Cells

A notebook is a list of cells. Cells contain either explanatory text or executable code and its output.

* Click a cell to select it. 
* Double-click a cell to edit it. 
* Use **Shift+Enter** to execute it.


**Adding and moving cells**

You can add new cells by using the **+ CODE** and **+ TEXT** buttons that show when you hover between cells. These buttons are also in the toolbar above the notebook, and they can be used to add a cell below the currently selected cell.

You can move a cell by selecting it and clicking **Cell Up** or **Cell Down** in the top toolbar. 

Consecutive cells can be selected by "lasso selection", i.e. by dragging from outside one cell and through the group.  Non-adjacent cells can be selected concurrently by clicking one and then holding down **Ctrl** while clicking another.  Similarly, using **Shift** instead of Ctrl will select all cells between two non-adjacent selections.

Try moving this cell around. Also try selecting a few cells.

## Text cells


Colaboratory has two types of cells: text and code. The text cells are formatted using a simple markup language called **markdown**, based on [the original](https://daringfireball.net/projects/markdown/syntax) markdown project. 
This is a **text cell**. You can **double-click** to edit this cell. Text cells
use markdown syntax. To learn more, see the [markdown
guide](/notebooks/markdown_guide.ipynb) recommended by Google Colab.

### Markdown ⌨️




To see the markdown source, double-click a text cell, showing both the markdown source (left) and the rendered version (right). Above the markdown source there is a toolbar to assist editing.



You can also use tags to format your text. The following are examples of markdown text formats. Each word/phrase is shown in the desired format, and the tags around it are those required to achieve each specific format.

**Text Formats:**

\**italics*\* or \__italics__

**\*\*bold\*\***

\~\~~~strikethrough~~\~\~

\``monospace`\`

**Indentations:**

No indent
>\>One level of indentation
>>\>\>Two levels of indentation

**An ordered list:**
1. 1\. One
1. 1\. Two
1. 1\. Three

**An unordered list:**
* \* One
* \* Two
* \* Three



******If you are interested in learning more about markdown in Google Colab you can read this nice article which includes a cheat sheet [here](https://towardsdatascience.com/cheat-sheet-for-google-colab-63853778c093)**

### Math 🧮 & Equations ✏️



You can also add math to text cells using [$\LaTeX$](http://www.latex-project.org/)
to be rendered by [MathJax](https://www.mathjax.org). Just place the statement
within a pair of **`$`** signs. For example `$\sqrt{3x-1}+(1+x)^2$` becomes
$\sqrt{3x-1}+(1+x)^2.$

Also, if you double the **`$`** tags in your $\LaTeX$ equations, you can set the contents off on its own centered line. For example, `$$y = 0.1 x$$` renders the following equation: $$y = 0.1 x$$

### Tables 📍



Tables:
```
First column name | Second column name
--- | ---
Row 1, Col 1 | Row 1, Col 2
Row 2, Col 1 | Row 2, Col 2
```

becomes:

>First column name | Second column name
>--- | ---
>Row 1, Col 1 | Row 1, Col 2
>Row 2, Col 1 | Row 2, Col 2

Horizontal rule done with three dashes (\-\-\-):

---


### Gifs 😱

YES! you can add animated gifs

<img src='https://media.giphy.com/media/3o72F8t9TDi2xVnxOE/giphy.gif'/>


## Code cells


Below is a **code cell**. To execute the contents of a code cell, you first must connect to a hosted runtime by clicking on the **Connect** button located in the toolbar menu.

 <img src='https://media.giphy.com/media/lRLBURv0hpcHqiraBI/giphy.gif'/> 

Once the toolbar button changes to **'Connected'**, click in the code cell below to select it and execute the contents in the following ways:




* Click the **Play icon** in the left gutter of the cell;
* Type **Cmd/Ctrl+Enter** to run the cell in place;
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists); or
* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.

There are additional options for running some or all cells in the **Runtime** menu.


In [None]:
W = 'Tryptophan'
C = 'Cysteine'
W,C

# Part II. Introduction the BASH Shell 🏃🏃‍♀️🏃‍♂️

### IPython and Shell Commands

<!--BOOK_INFORMATION-->

*This part of this tutorial contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

When working interactively with the standard Python interpreter, one of the frustrations is the need to switch between multiple windows to access Python tools and system command-line tools. IPython bridges this gap, and gives you a syntax for executing shell commands directly from within the IPython terminal.

The magic happens with the exclamation point: anything appearing after ``!`` on a line will be executed not by the Python kernel, but by the system command-line.

The following assumes you're on a Unix-like system, like this Google Colab cloud instance.
Some of the examples that follow will fail on Windows, which uses a different type of shell by default.

If you're unfamiliar with shell commands, I'd suggest reviewing the [Shell Tutorial](http://swcarpentry.github.io/shell-novice/) put together by the always excellent Software Carpentry Foundation.

### Quick Introduction to the Shell



A full intro to using the shell / terminal / command-line is well beyond the scope of this lab, but for the uninitiated we will offer a quick introduction here.

>The shell is a way to interact textually with your computer.
Ever since the mid 1980s, when Microsoft and Apple introduced the first versions of their now ubiquitous graphical operating systems, most computer users have interacted with their operating system through familiar clicking of menus and drag-and-drop movements.
But operating systems existed long before these graphical user interfaces, and were primarily controlled through sequences of text input: at the prompt, the user would type a command, and the computer would do what the user told it to.
Those early prompt systems are the precursors of the shells and terminals that most active data scientists still use today.

Someone unfamiliar with the shell might ask why you would bother with this, when many results can be accomplished by simply clicking on icons and menus.
A shell user might reply with another question: why hunt icons and click menus when you can accomplish things much more easily by typing?
While it might sound like a typical tech preference impasse, when moving beyond basic tasks it quickly becomes clear that the shell offers much more control of advanced tasks, though admittedly the learning curve can intimidate the average computer user.

As an example, the following are the most common bash commands and a short description:

```bash
cd: change directory
ls: list
mv: move
cp: copy
mkdir: make new directory
history: terminal history (i.e. history of the commands you have executed) help: command list & help
echo: outputs to terminal
expr: evaluate expression and outputs to terminal
wc: word count
cat: “concatenate” streams all input to the terminal
sed: “stream editor” edits input (most importantly substitute) and outputs to the terminal 
vim: “visual” visualizes input and allows for edition
grep: “Global regular expression print”, searches a given expression and outputs to terminal 
awk: pattern scanning & processing language
```

Notice that all of this is just a compact way to do familiar operations (navigating a directory structure, creating a directory, moving a file, etc.) by typing commands rather than clicking icons and menus.
Note that with just a few commands (``pwd``, ``ls``, ``cd``, ``mkdir``, and ``cp``) you can do many of the most common file operations.
It's when you go beyond these basics that the shell approach becomes really powerful.

### Shell Commands in iPython



Any command that works at the bash command-line can be used in IPython by prefixing it with the ``!`` character.
For example, the ``ls``, ``pwd``, and ``echo`` commands can be run as follows:

```ipython
In [1]: !ls
sample_data

In [2]: !pwd
/content

In [3]: !echo "printing from the shell"
printing from the shell
```

Try these commands in the next code cells!

In [None]:
# --> List all files and directories with ls


In [None]:
# --> Print the working directory of Google Colab cloud linux instance with pwd


In [None]:
# --> Print a Hello world using echo


### Passing Values to and from the Shell


In [None]:
directory = !pwd
print(directory)

Shell commands can not only be called from IPython, but can also be made to interact with the IPython namespace.
For example, you can save the output of any shell command to a Python list using the assignment operator:

```ipython
In [4]: contents = !ls

In [5]: print(contents)
['sample_data']

In [6]: directory = !pwd

In [7]: print(directory)
['/content']
```

Try them out yourself!

In [None]:
# --> Try the commands indicated above


Note that these results are not returned as lists, but as a special shell return type defined in IPython:

```ipython
In [8]: type(directory)
IPython.utils.text.SList
```

This looks and acts a lot like a Python list, but has additional functionality, such as
the ``grep`` and ``fields`` methods and the ``s``, ``n``, and ``p`` properties that allow you to search, filter, and display the results in convenient ways.
For more information on these, you can use IPython's built-in help features.

Communication in the other direction–passing Python variables into the shell–is possible using the ``{varname}`` syntax:

```ipython
In [9]: message = "hello from Python"

In [10]: !echo {message}
hello from Python
```

The curly brackets contain the variable name, which is replaced by the variable's contents in the shell command.

In [None]:
# --> Try assigning a string to variable and the print it using echo


### Shell-Related Magic Commands

If you play with IPython's shell commands for a while, you might notice that you cannot use ``!cd`` to navigate the filesystem:

```ipython
In [11]: !pwd
/content/

In [12]: !cd ..

In [13]: !pwd
/content/
```



In [None]:
# --> Try for yourself!


The reason is that shell commands in the notebook are executed in a temporary subshell.
If you'd like to change the working directory in a more enduring way, you can use the ``%cd`` magic command:

```ipython
In [14]: %cd ..
/
```

This is known as an ``automagic`` function, and this behavior can be toggled with the ``%automagic`` magic function.

Besides ``%cd``, other available shell-like magic functions are ``%cat``, ``%cp``, ``%env``, ``%ls``, ``%man``, ``%mkdir``, ``%more``, ``%mv``, ``%pwd``, ``%rm``, and ``%rmdir``, any of which can be used without the ``%`` sign if ``automagic`` is on.
This makes it so that you can almost treat the IPython prompt as if it's a normal shell:

This access to the shell from within the same terminal window as your Python session means that there is a lot less switching back and forth between interpreter and shell as you write your Python code.

In [None]:
# --> Try it here using the !cd to access and exit the 'sample_data' folder 


In [None]:
# --> Try using the %cd 


💡 Hint: You can use the %%bash at the beginning of your cell code instead of the `!` approach to active automagic for the whole cell code. Try it below!


In [None]:
%%bash
pwd
cd sample_data
pwd
cd ..

### Lets practice!

The terminal enters the **/content** directory by default in Google Colab, which contains all the main folders that you will use.

The `ls` command will list all the folders and files within the current folder.

Sadly, there is no color code in Google Colab to differentiate between files and folders in your code cells – which is very common on Linux. Therefore, you will have to manually inspect your directories on the File Explorer built in Google Colab and available on the menu to your left.

In [None]:
# --> List the files using ls


Now, we can start making new directories by typing `mkdir NAME`, where NAME is the name for our new folder.

To change directory to a particular folder, type `%cd` followed by the folder name

In [None]:
# --> Make two directories named folder and folder/folder2


Files and folders can be renamed by using the  `mv` command as follows: `mv file1 file2`.

This will move the content from the first file into a second file. In the cell below rename folder to newfolder using `mv`

In [None]:
# --> Rename folder to newfolder and list


In order to create and edit text files in Google Colab you can use the included text editor. Use `touch` to create and empty file 

In [None]:
# --> Use touch to generate a empty file in newfolder


Refresh the files in the file explorer at your left and double click the generated text file. Add whaterever you want inside and remember to click Ctrl+S or Cmd+S to save the changes.

Use `cat` to show what is inside your just edited text file

In [None]:
# --> Use cat here


This information and exercises should be sufficient for you to start using the bash shell commands more often. As with any other piece of software, it requires practice, but we are sure you will get used to it by the end of this course.
Now here are some tasks for you to complete for more practice!
1. Make a copy of your file using `cp` just like `mv`
2. Move your copy to the HOME directory
3. Output the contents or your file to the terminal using `cat`
4. Use the `paste` command instead of the `cat` and describe what happens.

In [None]:
# --> Your turn!

# Part III. Integration with Google Drive

A Google Colab virtual machine only lasts for a short time. If you want to keep your data, you need to download files or save them to Google Drive.

First, we will mount Google Drive so that it is accessible from the virtual machine.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Now try the file explorer. You should be able to see the contents of your Google Drive.

Next, we will **clone** or update the workshop labs into Google Drive.

In [None]:
GitHub_dir = '/content/drive/MyDrive/GitHub'
import os
if not os.path.isdir(GitHub_dir):
  !mkdir -p {GitHub_dir}

os.chdir(GitHub_dir)
if not os.path.isdir(os.path.join(GitHub_dir,'modelingworkshop')):
  !git clone https://github.com/CCBatIIT/modelingworkshop
else:
  os.chdir(os.path.join(GitHub_dir,'modelingworkshop'))
  !git pull origin main

# Appendix A 🤪 Introduction to Python 🐍 

### Before you start

The following are excerpts from many different resources that we included in this tutorial so that you can familiarize with the use of **Python** for many different analysis. Some of the tools described here will be used in the following tutorials. 

**What is Python?** Executive Summary
Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.



In [None]:
# In order to "print" in python we use the print() function
print("Hello world!")

In [None]:
#Install packages with PIP package manager
!pip install mdanalysis

### Variables and data types


In [None]:
a=1
print(a)

Note that we did not need to introduce the variable `a` in any way. No type was given for the variable. Python automatically detected that the type of `a` must be `int` (an integer). We can query the type of a variable with the builtin function `type`:

In [None]:
type(a)

Note also that the type of a variable is not fixed:

In [None]:
a="some text"
type(a)

In Python the type of a variable is not attached to the name of the variable, like in C for instance, but instead with the actual value. This is called dynamic typing.

![typing.svg](https://github.com/csmastersUH/data_analysis_with_python_2020/blob/master/typing.svg?raw=1)

### Expressions
An *expression* is a piece of Python code that results in a value. It consists of values combined together with *operators*. Values can be literals, such as `1`, `1.2`, `"text"`, or variables. Operators include arithmetics operators, comparison operators, function call, indexing, attribute references, among others. Below there are a few examples of expressions:

```
1+2
7/(2+0.1)
a
cos(0)
mylist[1]
c > 0 and c !=1
(1,2,3)
a<5
obj.attr
(-1)**2 == 1
```

<div class="alert alert-warning">Note that in Python the operator `//` performs integer division and operator `/` performs float division. The `**` operator denotes exponentiation. These operators might therefore behave differently than in many other common languages.</div>

As another example the following expression computes the kinetic energy of a non-rotating object:
`0.5 * mass * velocity**2`

### Modules 📦

#### • Using modules

Let’s say that we need to use the cosine function.
This function, and many other mathematical functions are
located in the `math` module.
To tell Python that we want to access the features offered by
this module, we can give the statement `import math`.
Now the module is loaded into memory.
We can now call the function like this:
```python
math.cos(0)
1.0
```

Note that we need to include the module name where the `cos`
function is found.
This is because other modules may have a function (or other
attribute of a module) with the same name.
This usage of different namespace for each module prevents
name clashes. For example, functions `gzip.open`, `os.open` are not to be confused
with the builtin `open` function.

In [None]:
# --> Use the math module to calculate the cosine of 0!


#### • Breaking the namespace

If the cosine is needed a lot, then it might be tedious to
always specify the namespace, especially if the name of the
namespace/module is long.
For these cases there is another way of importing modules.
Bring a name to the current scope with
`from math import cos` statement.
Now we can use it without the namespace specifier: `cos(1)`.

Several names can be imported to the current scope with
`from math import name1, name2, ...`
Or even all names of the module with `from math import *`
The last form is sensible only in few cases, normally it just
confuses things since the user may have no idea what names
will be imported.

#### • Module lookup

When we try to import a module `mod` with the import
statement, the lookup proceeds in the following order:

* Check if it is a builtin module
* Check if the file `mod.py` is found in any of the folders in
the list `sys.path`. The first item in this list is the current
folder

When Python is started, the `sys.path` list is initialised with
the contents of the `PYTHONPATH` environment variable

#### • Module hierarchy

The standard library contains hundreds of modules.
Hence, it is hard to comprehend what the library includes.
The modules therefore need to be organised somehow.
In Python the modules can be organised into hierarchies using
*packages*.
A package is a module that can contain other packages and
modules.
For example, the `numpy` package contains subpackages `core`,
`distutils`, `f2py`, `fft`, `lib`, `linalg`, `ma`, `numarray`, `oldnumeric`,
`random`, and `testing`.
And package `numpy.linalg` in turn contains modules `linalg`,
`lapack_lite` and `info`.

#### • Importing from packages

The statement `import numpy` imports the top-level package `numpy`
and its subpackages. 

* `import numpy.linalg` imports the subpackage only, and
* `import numpy.linalg.linalg` imports the module only

If we want to skip the long namespace specification, we can
use the form

```python
from numpy.linalg import linalg
```

or

```python
from numpy.linalg import linalg as lin
```

if we want to use a different name for the module. The following command imports the function `det` (computes the determinant of a matrix) from the module linalg, which is contained in a subpackage linalg, which belongs to package numpy:
```python
from numpy.linalg.linalg import det
```

Had we only imported the top-level package `numpy` we would have to refer to the `det` function with the full name `numpy.linalg.linalg.det`.

Here's a recap of the module hierarchy:

```
numpy    package
  .
linalg   subpackage
  .
linalg   module
  .
 det     function
```

#### • Correspondence between folder and module hierarchies

The packages are represented by folders in the filesystem.
The folder should contain a file named `__init__.py` that
makes up the package body. This handles the initialisation of
the package.
The folder may contain also further folders
(subpackages) or Python files (normal modules).

```
a/
    __init__.py
    b.py
    c/
        __init__.py
        d.py
        e.py
```
![package.svg](https://github.com/csmastersUH/data_analysis_with_python_2020/blob/master/package.svg?raw=1)

#### • Contents of a module

Suppose we have a module named `mod.py`.
All the assignments, class definitions with the `class` statement,
and function definitions with `def` statement will create new
attributes to this module.
Let’s import this module from another Python file using the
`import mod` statement.
After the import we can access the attributes of the module
object using the normal dot notation: `mod.f()`,
`mod.myclass()`, `mod.a`, etc.
Note that Python doesn’t really have global variables that are
visible to all modules. All variables belong to some module
namespace.

One can query the attributes of an object using the `dir` function. With no
parameters, it shows the attributes of the current module. Try executing `dir()` in
an IPython shell or in a Jupyter notebook! After that, define the following attributes, and try running `dir()`
again:

```python
a=5
def f(i):
    return i + 1
```

The above definitions created a *data attribute* called `a` and a *function attribute* called `f`.
We will talk more about attributes next week when we will talk about objects.

Just like other objects, the module object contains its
attributes in the dictionary `modulename.__dict__`
Usually a module contains at least the attributes `__name__` and
`__file__`. Other common attributes are `__version__`,
`__author__` and `__doc__` , which contains the docstring of the
module.
If the first statement of a file is a string, this is taken as the
docstring for that module. Note that the docstring of the module really must be the first non-empty non-comment line.
The attribute `__file__` is always the filename of the module.

The module attribute `__name__` has value `“__main__”` if we in are the main program,
otherwise some other module has imported us and name
equals `__file__`.

In Python it is possible to put statements on the top-level of our module `mod` so that they don't belong to any function. For instance like this:

```python
for _ in range(3):
    print("Hello")
```

But if somebody imports our module with `import mod`, then all the statements at the top-level will be executed. This may be surprising to the user who imported the module. The user will usually say, explicitly when he/she wants to execute some code from the imported module.

It is better style to put these statements inside some function. If they don't fit in any other function, then you can use, for example, the function named `main`, like this:

```python
def main():
    for _ in range(3):
        print("Hello")

if __name__ == "__main__":    # We call main only when this module is not being imported, but directly executed
    main()                    # for example with 'python3 mod.py'
```

You probably have seen this mechanism used in the exercise stubs.
Note that in Python the `main` has no special meaning, it is just our convention to use it here.
Now if somebody imports `mod`, the `for` loop won't be automatically executed. If we want, we can call it explicitly with `mod.main()`. 

```python
for _ in range(3):
    print("Hello")
```

## Interactive ways of representing your data 📊 📈

The visualization of our results is as important as the process of analysis.There are several Python libraries for data visualization, but we will focus on one of the most used and essential: *matplotlib*. [Matplotlib](https://matplotlib.org/stable/index.html) is highly customizable and is compatible with a great number of files formats as *png, tiff, jpeg, etc*. Additionally, this library is already installed in Colab, so we only need to import the module to the current document. Let's see some examples!



First we need to import the `matplotlib.pyplot` module and the complete module with shorthands.

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl

We'll create a simple example to illustrate how to use the `matplotlib.pyplot`. Let's see what happens with our Python programming skills if we spend time doing the exercises of the Appendix in IBM3202 tutorials.

In [None]:
# Data
time = [0, 10, 20, 300]
python_level = [0, 1, 2, 30]

plt.plot(time, python_level)
plt.xlabel('Time (hr)')
plt.ylabel('Python programming level')

We can change the style of the plots easily with the `plt.style.use` method (you can check the available styles [here](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html)). Additionally, the plots can be saved to a file with the function `savefig`. The *format* will be infered from the filename and the size of the plot can be change with the *dpi*, *width* and *height* arguments. 




In [None]:
plt.style.use('dark_background')

plt.plot(time, python_level)
plt.xlabel('Time (hr)')
plt.ylabel('Python programming level')

plt.savefig('First_plot.png', dpi = 100)

As you can see, we have added all the elements of the plot just using methods included in the `Matplotlib.pyplot` module. The `plot` function allow us to visualize the relationship of the variables with lines and markers, but there is a lot of options available (histograms, barplots, boxplots, etc). Plots are highly customizable. We can change the color, line width, transparency, line style, etc, just adding a few parameters to the function call. If you are passionate about data visualization, we recommend you to review the Matplotlib documentation (you can find it [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)) to get the most benefit from this tool.

In [None]:
fig, axs = plt.subplots(2, 2, sharex=True)
axs[0, 0].plot(time, python_level, linestyle = '--', color = 'darkred')
axs[0, 0].set_title('Plot 1')
axs[0, 1].plot(time, python_level, linestyle = '-', color = 'mediumaquamarine', linewidth = 2, marker = "*", markersize = 16)
axs[0, 1].set_title('Plot 2')
axs[1, 0].plot(time, python_level, linestyle = ':', color = 'darkgreen', marker = "d", markersize = 12, alpha = 0.5)
axs[1, 0].set_title('Plot 3')
axs[1, 1].plot(time, python_level, linestyle = '', color = 'olivedrab', marker = '$\phi$', markersize = 15)
axs[1, 1].set_title('Plot 4')


Matplotlib is an excellent library for static data visualization, however, Google Colab is an interactive platform with great potential, so let's take advantage of that. The **ipywidgets** library contains a set of useful tools to enchance the user experience. For example:

In [None]:
import ipywidgets as widgets

def say_my_name(name):
    print(f'My name is {name}')
     
widgets.interact(say_my_name, name=["Walter", "White", "Heisenberg", "Mr. White"]);

In [None]:
slider = widgets.IntSlider(20, min=0, max=100)
slider

Even better, we can use some data visualization libraries with interactive options. The syntax is highly similar to what we have seen in Matplotlib ;) 

In [None]:
# load an example dataset
from vega_datasets import data
cars = data.cars()

# plot the dataset, referencing dataframe column names
import altair as alt
alt.Chart(cars).mark_bar().encode(
  x=alt.X('Miles_per_Gallon', bin=True),
  y='count()',
)

In [None]:
# load an example dataset
from vega_datasets import data
cars = data.cars()

# plot the dataset, referencing dataframe column names
import altair as alt
alt.Chart(cars).mark_point().encode(
  x='Horsepower',
  y='Miles_per_Gallon',
  color='Origin'
).interactive()

In [None]:
import altair as alt
import ipywidgets as widgets
from vega_datasets import data

source = data.stocks()

stock_picker = widgets.SelectMultiple(
    options=source.symbol.unique(),
    value=list(source.symbol.unique()),
    description='Symbols')

# The value of symbols will come from the stock_picker.
@widgets.interact(symbols=stock_picker)
def render(symbols):
  selected = source[source.symbol.isin(list(symbols))]

  return alt.Chart(selected).mark_line().encode(
      x='date',
      y='price',
      color='symbol',
      strokeDash='symbol',
  )

# Appendix B 🤪 Python Deep Dive


#### Interleave

Write function `interleave` that gets arbitrary number of lists as parameters. You may assume that all the lists have equal length. The function should return one list containing all the elements from the input lists interleaved.
Test your function from the `main` function of the program.

Example:
`interleave([1,2,3], [20,30,40], ['a', 'b', 'c'])`
should return
`[1, 20, 'a', 2, 30, 'b', 3, 40, 'c']`.
Use the `zip` function to implement `interleave`. Remember the `extend` method of list objects.
<hr/>

### Functions
A function is defined with the `def` statement. Let's do a doubling function.

In [None]:
def double(x):
    "This function multiplies its argument by two."
    return x*2
print(double(4), double(1.2), double("abc")) # It even happens to work for strings!

The double function takes only one parameter. Notice the *docstring* on the second line. It documents the purpose and usage of the function. Let's try to access it.

In [None]:
print("The docstring is:", double.__doc__)
help(double)   # Another way to access the docstring

Most of Python's builtin functions, classes, and modules should contain a docstring.

In [None]:
help(print)

Here's another example function:

In [None]:
def sum_of_squares(a, b):
    "Computes the sum of arguments squared"
    return a**2 + b**2
print(sum_of_squares(3, 4))

<div class="alert alert-warning">Note the terminology: in the function definition the names a and b are called <strong>parameters</strong> of the function; in the function call, however, 3 and 4 are called <strong>arguments</strong> to the function.
</div>

It would be nice that the number of arguments could be arbitrary, not just two. We could pass a list to the function as a parameter.

In [None]:
def sum_of_squares(lst):
    "Computes the sum of squares of elements in the list given as parameter"
    s=0
    for x in lst:
        s += x**2
    return s
print(sum_of_squares([-2]))
print(sum_of_squares([-2,4,5]))

This works perfectly! There is however some extra typing with the brackets around the lists. Let's see if we can do better:

In [None]:
def sum_of_squares(*t):
    "Computes the sum of squares of arbitrary number of arguments"
    s=0
    for x in t:
        s += x**2
    return s
print(sum_of_squares(-2))
print(sum_of_squares(-2,4,5))

The strange looking argument notation (the star) is called *argument packing*. It packs all the given positional arguments into a tuple `t`. We will encounter tuples again later, but it suffices now to say that tuples are *immutable* lists. With the `for` loop we can iterate through all the elements in the tuple.

Conversely, there is also syntax for *argument unpacking*. It has confusingly exactly same notation as argument packing (star), but they are separated by the location where used. Packing happens in the parameter list of the functions definition, and unpacking happens where the function is called:

In [None]:
lst=[1,5,8]
print("With list unpacked as arguments to the functions:", sum_of_squares(*lst))
# print(sum_of_squares(lst))    # Does not work correctly

The second call failed because the function tried to raise the list of numbers to the second power. Inside the function body we have `t=([1,5,8])`, where the parentheses denote a tuple with one element, a list.

In addition to positional arguments we have seen so far, a function call can also have *named arguments*. An example will explain this concept best:

In [None]:
def named(a, b, c):
    print("First:", a, "Second:", b, "Third:", c)
named(5, c=7, b=8)

Note that the named arguments didn't need to be in the same order as in the function definition.
The named arguments must come after the positional arguments. For example, the following function call is illegal `named(a=5, 7, 8)`.

One can also specify an optional parameter by giving the parameter a default value. The parameters that have default values must come after those parameters that don't. We saw that the parameters of the `print` function were of form `print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)`. There were four parameters with default values. If some default values don't suit us, we can give them in the function call using the name of the parameter:

In [None]:
print(1, 2, 3, end=' |', sep=' -*- ')
print("first", "second", "third", end=' |', sep=' -*- ')

We did not need to specify all the parameters with default values, only those we wanted to change.

Let's go through another example of using parameters with default values:

In [None]:
def length(*t, degree=2):
    """Computes the length of the vector given as parameter. By default, it computes
    the Euclidean distance (degree==2)"""
    s=0
    for x in t:
        s += abs(x)**degree
    return s**(1/degree)
print(length(-4,3))
print(length(-4,3, degree=3))

With the default parameter this is the Euclidean distance, and if $p\ne 2$ it is called [p-norm](https://en.wikipedia.org/wiki/P-norm).

We saw that it was possible to use packing and unpacking of arguments with the * notation, when one wants to specify arbitrary number of *positional arguments*. This is also possible for arbitrary number of named arguments with the `**` notation. We will talk about this more in the data structures section.

####  • Visibility of variables
Function definition creates a new *namespace* (also called local scope). Variables created inside this scope are not available from outside the function definition. Also, the function parameters are only visible inside the function definition. Variables that are not defined inside any function are called `global variables`.

Global variable are readable also in local scopes, but an assignment creates a new local variable without rebinding the global variable. If we are inside a function, a local variable hides a global variable by the same name:

In [None]:
i=2           # global variable
def f():
    i=3       # this creates a new variable, it does not rebind the global i
    print(i)  # This will print 3    
f()
print(i)      # This will print 2

If you really need to rebind a global variable from a function, use the `global` statement. Example:

In [None]:
i=2
def f():
    global i
    i=5       # rebind the global i variable
    print(i)  # This will print 5
f()
print(i)      # This will print 5

Unlike languages like C or C++, Python allows defining a function inside another function. This *nested* function will have nested scope:

In [None]:
def f():            # outer function
    b=2
    def g():        # inner function
        #nonlocal b # Without this nonlocal statement,
        b=3         # this will create a new local variable
        print(b)
    g()
    print(b)
f()

Try first running the above cell and see the result. Then uncomment the nonlocal stamement and run the cell again. The `global` and `nonlocal` statements are similar. The first will force a variable refer to a global variable, and the second will force a variable to refer to the variable in the nearest outer scope (but not the global scope).

### Data structures
The main data structures in Python are strings, lists, tuples, dictionaries, and sets. We saw some examples of lists, when we discussed `for` loops. And we saw briefly tuples when we introduced argument packing and unpacking. Let's get into more details now.

#### Sequences
A *list* contains arbitrary number of elements (even zero) that are stored in sequential order. The elements are separated by commas and written between brackets. The elements don't need to be of the same type. An example of a list with four values:

In [None]:
[2, 100, "hello", 1.0]

A *tuple* is fixed length, immutable, and ordered container. Elements of tuple are separated by commas and written between parentheses. Examples of tuples:

In [None]:
(3,)               # a singleton
(1,3)              # a pair
(1, "hello", 1.0); # a triple

<div class="alert alert-warning">Note the difference between `(3)` and `(3,)`. Because the parentheses can also be used to group expressions, the first one defines an integer, but the second one defines a tuple with single element.</div>

As we can see, both lists and tuples can contain values of different type.

List, tuples, and strings are called *sequences* in Python, and they have several commonalities:

* their length can be queried with the `len` function
* `min` and `max` function find the minimum and maximum element of a sequence, and `sum` adds all the elements of numbers together
* Sequences can be concatenated with the `+` operator, and repeated with the `*` operator: `"hi"*3=="hihihi"`
* Since sequences are ordered, we can refer to the elements of a sequences by integers using the *indexing* notation: `"abcd"[2] == "c"`
* Note that the indexing begins from 0
* Negative integers start indexing from the end: -1 refers to the last element, -2 refers to the second last, and so on

Above we saw that we can access a single element of a sequence using *indexing*. If we want a subsequence of a sequence, we can use the *slicing* syntax. A slice consists of elements of the original sequence, and it is itself a sequence as well. A simple slice is a range of elements:

In [None]:
s="abcdefg"
s[1:4]

Note that Python ranges exclude the last index. The generic form of a slice is
`sequence[first:last:step]`. If any of the three parameters are left out, they are set to default values as follows: first=0, last=len(L), step=1. So, for instance "abcde"[1:]=="bcde". The step parameter selects elements that are step distance apart from each other. For example:

In [None]:
print([0,1,2,3,4,5,6,7,8,9][::3])

#### • Modifying lists
We can assign values to elements of a list by indexing or by slicing. An example:

In [None]:
L=[11,13,22,32]
L[2]=10          # Changes the third element
print(L)

Or we can assign a list to a slice:

In [None]:
L[1:3]=[4]
print(L)

We can also modify a list by using *mutating methods* of the `list` class, namely the methods `append`, `extend`, `insert`, `remove`, `pop`, `reverse`, and `sort`. Try Python's help functionality to find more about these methods: e.g. `help(list.extend)` or `help(list)`.

<div class="alert alert-warning">Note that we cannot perform these modifications on tuples or strings since they are *immutable*</div>

#### • Generating numerical sequences
Trivial lists can be tedious to write: `[0,1,2,3,4,5,6]`. The function `range` creates numeric ranges automatically. The above sequence can be generated with the function call `range(7)`. Note again that then end value is not included  in the sequence. An example of using the `range` function:

In [None]:
L=range(3)
for i in L:
    print(i)
# Note that L is not a list!
print(L)

So `L` is not a list, but it is a sequence. We can for instace access its last element with `L[-1]`. If really needed, then it can be converted to a list with the `list` constructor:

In [None]:
L=range(10)
print(list(L))

<div class="alert alert-warning">Note that using a range consumes less memory than the corresponding list. This is because in a list all the elements are stored in the memory, whereas the range generates the requested elements only when needed. For example, when the for loop asks for the next element from the range at each iteration, only a single element from the range exists in memory at the same time. This makes a big difference when using large ranges, like range(1000000).</div>

The `range` function works in similar fashion as slices. So, for instance the step of the sequence can be given:

In [None]:
print(list(range(0, 7, 2)))

#### •  Sorting sequences

In Python there are two ways to sort sequences. The `sort` *method* modifies the original list, whereas the `sorted` *function* returns a new sorted list and leaves the original intact. A couple of examples will demonstrate this:

In [None]:
L=[5,3,7,1]
L.sort()      # here we call the sort method of the object L
print(L)
L2=[6,1,7,3,6]
print(sorted(L2))
print(L2)

The parameter `reverse=True` can be given (both to `sort` and `sorted`) to get descending order of elements:

In [None]:
L=[5,3,7,1]
print(sorted(L, reverse=True))

#### • Zipping sequences

The `zip` function combines two (or more) sequences into one sequence. If, for example, two sequences are zipped together, the resulting sequence contains pairs. In general, if `n` sequences are zipped together, the elements of the resulting sequence contains `n`-tuples. An example of this:

In [None]:
L1=[1,2,3]
L2=["first", "second", "third"]
print(zip(L1, L2))               # Note that zip does not return a list, like range
print(list(zip(L1, L2)))         # Convert to a list

Here's another example of using the `zip` function.

In [None]:
days="Monday Tuesday Wednesday Thursday Friday Saturday Sunday".split()
weathers="rainy rainy sunny cloudy rainy sunny sunny".split()
temperatures=[10,12,12,9,9,11,11]
for day, weather, temperature in zip(days,weathers,temperatures):
    print(f"On {day} it was {weather} and the temperature was {temperature} degrees celsius.")

# Or equivalently:
#for t in zip(days,weathers,temperatures):
#    print("On {} it was {} and the temperature was {} degrees celsius.".format(*t))

If the sequences are not of equal length, then the resulting sequence will be as long as the shortest input sequence is.

#### • Enumerating sequences

In some other programming languages one iterates through the elements using their indices (0,1, ...) in the sequence. In Python we normally don't need to think about indices when iterating, because the `for` loop allows simpler iteration through the elements. But sometimes you really need to know the index of the current element in the sequence. In this case one uses Python's `enumerate` function. In the next example we would like find the second occurrence of integer 5 in a list.

In [None]:
L=[1,2,98,5,-1,2,0,5,10]
counter = 0
for i, x in enumerate(L):
    if x == 5:
        counter += 1
        if counter == 2:
            break
print(i)

The `enumerate(L)` function call can be thought to be equivalent to `zip(range(len(L)), L)`.

#### • Dictionaries
A *dictionary* is a dynamic, unordered container. Instead of using integers to access the elements of the container, the dictionary uses *keys* to access the stored *values*. The dictionary can be created by listing the comma separated key-value pairs in braces. Keys and values are separated by a colon. A tuple (key,value) is called an *item* of the dictionary.

Let's demonstrate the dictionary creation and usage:

In [None]:
d={"key1":"value1", "key2":"value2"}
print(d["key1"])
print(d["key2"])

Keys can have different types even in the same container. So the following code is legal:
`d={1:"a", "z":1}`. The only restriction is that the keys must be *hashable*. That is, there has to be a mapping from keys to integers. Lists are *not* hashable, but tuples are!

There are alternative syntaxes for dictionary creation:

In [None]:
dict([("key1", "value1"), ("key2", "value2"), ("key3", "value3")]) # list of items
dict(key1="value1", key2="value2", key3="value3");

If a key is not found in a dictionary, the indexing `d[key]` results in an error (*exception* `KeyError`). But an assignment with a non-existing key causes the key to be added in the dictionary associated with the corresponding value:

In [None]:
d={}
d[2]="value"
print(d)

In [None]:
# d[1]   # This would cause an error

Dictionary object contains several non-mutating methods:
```
d.copy()
d.items()
d.keys()
d.values()
d.get(k[,x])
```

Some methods mutate the dictionary:
```
d.clear()
d.update(d1)
d.setdefault(k[,x])
d.pop(k[,x])
d.popitem()
```

Try out some of these in the below cell. You can find more info with `help(dict)` or `help(dict.keys)`.

In [None]:
d=dict(a=1, b=2, c=3, d=4, e=5)
d.values()

#### • Sets
Set is a dynamic, unordered container. It works a bit like dictionary, but only the keys are stored. And each key can be stored only once. The set requires that the keys to be stored are hashable. Below are a few ways of creating a set:

In [None]:
s={1,1,1}
print(s)
s=set([1,2,2,'a'])
print(s)
s=set()  # empty set
print(s)
s.add(7) # add one element
print(s)

A more useful example:

In [None]:
s="mississippi"
print(f"There are {len(set(s))} distinct characters in {s}")

The `set` provides the following non-mutating methods:

In [None]:
s=set()
s1=set()
s.copy()
s.issubset(s1)
s.issuperset(s1)
s.union(s1)
s.intersection(s1)
s.difference(s1)
s.symmetric_difference(s1);

The last four operation can be tedious to write to create a more complicated expression. The alternative is to use the corresponding operator forms: `|`, `&`, `-`, and `^`. An example of these:

In [None]:
s=set([1,2,7])
t=set([2,8,9])
print("Union:", s|t)
print("Intersection:", s&t)
print("Difference:", s-t)
print("Symmetric difference", s^t)

There are also the following mutating methods:
```
s.add(x)
s.clear()
s.discard()
s.pop()
s.remove(x)
```

And the set operators `|`, `&`, `-`, and `^` have the corresponding mutating, augmented assignment forms: `|=`, `&=`, `-=`, and `^=`.

### Compact way of creating data structures
We can now easily create complicated data structures using `for` loops:

In [None]:
L=[]
for i in range(10):
    L.append(i**2)
print(L)

Because this kind of pattern is often used, Python offers a short-hand for this. A *list comprehension* is an expression that allows creating complicated lists on one line. The notation is familiar from mathematics:

$\{a^3 : a \in \{1,2, \ldots, 10\}\}$

The same written in Python as a list comprehension:

In [None]:
L=[ a**3 for a in range(1,11)]
print(L)

The generic form of a list comprehension is:
`[ expression for element in iterable lc-clauses ]`.
Let's break this syntax into pieces. The iterable can be any sequence (or something more general). The lc-clauses consists of zero or more of the following clauses:

* for elem in iterable
* if expression

A more complicated example. How would you describe these numbers?

In [None]:
L=[ 100*a + 10*b +c for a in range(0,10)
                    for b in range(0,10)
                    for c in range(0,10) 
                    if a <= b <= c]
print(L)

#### • Miscellaneous stuff

To find out whether a container includes an element, the `in` operator can be used. The operator returns a truth value. Some examples of the usage:

In [None]:
print(1 in [1,2])
d=dict(a=1, b=3)
print("b" in d)
s=set()
print(1 in s)
print("x" in "text")

As a special case, for strings the `in` operator can be used to check whether a string is part of another string:

In [None]:
print("issi" in "mississippi")
print("issp" in "mississippi")

Elements of a container can be unpacked into variables:

In [None]:
first, second = [4,5]
a,b,c = "bye"
print(c)
d=dict(a=1, b=3)
key1, key2 = d
print(key1, key2)

In membership testing and unpacking only the keys of a dictionary are used, unless either values or items (like below) are explicitly asked.

In [None]:
for key, value in d.items():
    print(f"For key '{key}' value {value} was stored")

To remove the binding of a variable, use the `del` statement. For example:

In [None]:
s="hello"
del s
# print(s)    # This would cause an error

To delete an item from a container, the `del` statement can again be applied:

In [None]:
L=[13,23,40,100]
del L[1]
print(L)

In similar fashion `del` can be used to delete a slice. Later we will see that `del` can delete attributes from an object.

If one needs only to iterate through the list once, it is more memory efficient to use a *generator expression* instead. The only thing that changes syntactically is that the surrounding brackets are replaced by parentheses:

In [None]:
G = ( 100*a + 10*b + c for a in range(0,10)
                       for b in range(0,10)
                       for c in range(0,10) 
                       if a <= b <= c )
print(sum(G))   # This iterates through all the elements from the generator
print(sum(G))   # It doesn't restart from the beginning, so all elements are already consumed

<div class="alert alert-warning">Note above that one can only iterate through the generator once.</div>

Similary a *dictionary comprehension* creates a dictionary:

In [None]:
d={ k : k**2 for k in range(10)}
print(d)

And a *set comprehension* creates a set:

In [None]:
s={ i*j for i in range(10) for j in range(10)}
print(s)

#### • Creating strings
A string is a sequence of characters commonly used to store input or output data in a program. The characters of a string are specified either between single (`'`) or double (`"`) quotes. This optionality is useful if, for example, a string needs to contain a quotation mark:
"I don't want to go!". You can also achieve this by *escaping* the quotation mark with the backslash: 'I don\\'t want to go'.

The string can also contain other escape sequences like `\n` for newline and `\t` for a tabulator. See [literals](https://docs.python.org/3/reference/lexical_analysis.html#literals) for a list of all escape sequences.

In [None]:
print("One\tTwo\nThree\tFour")

A string containing newlines can be easily given within triple double or triple single quotes:

In [None]:
s="""A string
spanning over
several lines"""
s

Although we can concatenate strings using the `+` operator, for effiency reasons, one should use the `join` method to concatenate larger number of strings:

In [None]:
a="first"
b="second"
print(a+b)
print(" ".join([a, b, b, a]))   # Here we introduce the join function


Sometimes printing by concatenation from pieces can be clumsy:

In [None]:
print(str(1) + " plus " + str(3) + " is equal to " + str(4))
# slightly better
print(1, "plus", 3, "is equal to", 4)

The multiple catenation and quotation characters break the flow of thought. *String interpolation* offers somewhat easier syntax.

There are multiple ways to do sting interpolation:

* Python format strings
* the `format` method
* f-strings

Examples of these can be seen below:

In [None]:
print("%i plus %i is equal to %i" % (1, 3, 4))     # Format syntax

print("{} plus {} is equal to {}".format(1, 3, 4)) # Format method

print(f"{1} plus {3} is equal to {4}")             # f-string

The `i` format specifier in the format syntacs corresponds to integers and the specifier `f` corresponds to floats. When using f-strings or the `format` method, integers use `d` instead. In format strings specifiers can usually be omitted and are generally used only when specific formatting is required. For example in f-strings `f"{4:3d}"` would specify the number 4 left padded with spaces to 3 digits.

It is often useful to specify the number of decimals when printing floats:

In [None]:
print("%.1f %.2f %.3f" % (1.6, 1.7, 1.8))               # Old style
print("{:.1f} {:.2f} {:.3f}".format(1.6, 1.7, 1.8))     # newer style
print(f"{1.6:.1f} {1.7:.2f} {1.8:.3f}")                 # f-string

The specifier `s` is used for strings. An example:

In [None]:
print("%s concatenated with %s produces %s" % ("water", "melon", "water"+"melon"))
print("{0} concatenated with {1} produces {0}{1}".format("water", "melon"))
print(f"{'water'} concatenated with {'melon'} produces {'water' + 'melon'}")

Look [here](https://pyformat.info/#number) for more details about format specifiers, and for comparison between the old and new style of string interpolation.

Different ways of string interpolation have different strengths and weaknesses. Generally choosing which to use is a matter of personal preference. On this course examples and model solutions will predominantly use f-strings and the `format` method.