Fill this form to be invited to the shared folder!

https://forms.gle/askYJH8mkKNTJH4j8


# Why Google Colab (Colaboratory)

**Deep Learning** has recently allowed to achieve outstanding performance in several domains, ranging from speech recognition to visual object detection and recognition.
Deep *convolutional neural networks* have proven effective in processing images, video, speech and audio, whereas *recurrent neural networks* are particularly suited for sequential data such as text and speech. 

Deep Neural Networks, with *many* layers of information processing between input and output, are able to represent data with multiple levels of abstraction. Training such deep architectures in reasonable time was made possible in the last decade by the advent of fast graphic processing units (**GPU**).





**Google Colab (Colaboratory)** is a free open-source environment that runs in the Cloud and stores its files in Google Drive. It allows to develop Python applications using **notebooks**, and to execute them exploiting **free GPU** (Nvidia Tesla K80).  The environment requires no setup and is particularly well suited for the development of deep learning applications, since ad-hoc libraries, such as **TensorFlow** and **Keras**, are already available on the Colab virtual machine.

 


Google Colab started as part of **Project Jupyter**, an organization created with the aim of supporting *interactive data science and scientific computing* via the development of open-source software.
The "**Notebook**" term may refer both to the notebook document and the web-based environment used to create it. Jupyter Notebook can connect to an **IPython kernel** to allow interactive programming in Python. It uses an internal library for converting the document in HTML and allows visualization and editing in the browser.

The document you are reading, with extension .ipynb, is a notebook and consists in an ordered **list of cells** which can contatin code, text, mathematics, visualization of output and plots.

The **key features** of a Jupyter Notebook are the following:

- it consists in *executable Python code* enriched with *text-editing capabilities*;
- it allows interactive development: small pieces of code (cells) can be executed independently.





**To sum up:**

- Google Colab(oratory) is an environment for writing Jupyter Notebooks and executing Python code;
- Execution runs entirely in the cloud using adequately configured virtual machines and relying on free GPU made available by Google;
- Notebooks are stored in Google Drive: it allows you to share, comment, and collaborate on the same document with multiple people.


## Two types of Cells
A notebook is a list of cells. There are two types of cell:
- Code cells:  contain executable code.
- Markdown cells: contain explanatory text.




#### Code cell
A code cell contains executable code and displays its output just below.
The subsequent cells are examples of code cell: execute them clicking the play button or using Ctrl+Enter.

In [None]:
# this is an executable code cell. 

# first python snippet
a = 2
b = 4
a*b

8

In [None]:
print(a*3)

6


System aliases: use exclamation mark for terminal operation

In [None]:
# Jupyter includes shortcuts for common operations
!which python
!python --version

/usr/local/bin/python
Python 3.6.9


In [None]:
# what is in this directory? Using command line (unix only)
!ls -alh #all files, long format, human-readable size

total 16K
drwxr-xr-x 1 root root 4.0K Oct 28 16:30 .
drwxr-xr-x 1 root root 4.0K Nov  6 07:52 ..
drwxr-xr-x 1 root root 4.0K Nov  3 17:17 .config
drwxr-xr-x 1 root root 4.0K Oct 28 16:30 sample_data


#### Text cell 
This is a **text cell**! Text cells use *markdown syntax*:  it consists in plain text formatting syntax that enables the creation of rich text that can be converted to HTML!

You can include well-formatted text, formulas, and fancy images too!

![image logo](https://prod-discovery.edx-cdn.org/media/course/image/493b81e0-eb2e-4a41-acf4-dc39273c16cf-006892747a54.small.jpg)

To learn more, see  the [markdown guide](/notebooks/markdown_guide.ipynb)


**Practice with some COLAB shortcuts**

ctrl+Enter: run the selected cell

A: insert code cell above

B: insert code cell below

ctrl+M and then D: delete cell

ctrl+M and then M: convert to text

ctrl+M and then Y: convert to code

ctrl+F9: run all cells

TOOLS --> COMMAND PALETTE / KEYBOARD SHORTCUTS

# Notebook Under the Hood 
- [jupyter documentation](https://jupyter.readthedocs.io/en/latest/projects/architecture/content-architecture.html)



## Terminal IPython
The original IPython interface runs an interactive shell built with Python. The name *shell* indicates that it is the outermost layer around the kernel (i.e. the engine capable of executing code) and allows user to access it through a user interface. 
It does something like this:
```python
while True:
  code = input(">>> ")  # prompt the user for some code
  exec(code)        # execute it
```

This model is often called a REPL, or Read-Eval-Print-Loop: 




## Notebook and IPython Kernel
Several interfaces (e.g. Notebooks) use the IPython Kernel. The IPython Kernel is a separate process which is responsible for running user code, and things like computing possible completions. Notebook frontend communicate with the IPython Kernel using JSON messages sent over the sockets provided by a messagging library named ZeroMQ. Sockets are objects for sending and receiving data between local processes (*unix domain sockets*) or in a computer network (*network sockets*). 

WebSocket is a communication protocol that allow client-server communication in both directions and simultaneously.

![notebook image](https://jupyter.readthedocs.io/en/latest/_images/notebook_components.png)





## Notebook files
Furthermore, the Notebook frontend stores code and output, together with markdown notes, in an editable document called a notebook. 

When you save it, this is sent from your browser to the notebook server, which saves it on disk as a JSON file with a .ipynb extension. JSON stands for JavaScript Object Notation: it is an open-standard file format that uses human-readable text to transmit data objects consisting of attribute-value pairs and array data types. Try to download this notebook and look at it with a text editor!

```
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "1_Introduction.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "metadata": {
        "id": "MB7xR0lO6TNt",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Why Google Colab (Colaboratory)"
      ]
    },
   ...
   ]
   ...
}
```



The notebook server, not the kernel, is responsible for saving and loading notebooks, so you can edit notebooks even if you don’t have the kernel for that language—you just won’t be able to run code. The kernel doesn’t know anything about the notebook document: it just gets sent cells of code to execute when the user runs them.



# Tensorflow and Keras

## Tensorflow

Tensorflow is an open-source software library, written in Python and C++, used for expressing and executing machine learning algorithms. It was developed by Google Brain and released on November 2015.

Abstract from **Abadi, Martín, et al. 2015**:

*TensorFlow is **an interface for expressing machine learning algorithms, and an implementation for executing such algorithms**. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from **mobile devices** such as phones and tablets up to **large-scale distributed systems** of hundreds of machines and thousands of computational devices such as **GPU cards**. The system is flexible and can be used to express a wide variety of algorithms, including **training and inference algorithms for deep neural network models**, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including **speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery**. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an **open-source package** under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org*

On September 30th 2019, **Tensorflow 2.0** has been released: according to the authors it is an easier-to-use, more flexible and powerful platform for the development and the deployment of Machine Learning applications.
Read the Tensorflow [blog post](https://medium.com/tensorflow/tensorflow-2-0-is-now-available-57d706c2a9ab) for further information.


## Keras
- https://keras.io/

\\
Keras is a **high-level API** to build and train deep learning models and it is **capable of running on Tensorflow**, CNTK and Theano. 
   

API stands for "Application Programming Interface."; in general, it is a set of *commands*, *functions*, *protocols*, and *objects* that programmers can use to create software or interact with an external system. It makes it easier to develop a computer program by providing all the building blocks.


Keras, in particular, is used for fast prototyping, advanced research, and production, with three key advantages:
- *User friendly*: Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.
- *Modular and composable*: Keras models are made by connecting configurable building blocks together, with few restrictions.
- *Easy to extend*: Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

Furthermore:
- it allows the same code to run seamlessly on CPU and GPU
- it has built-in support for Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)
- it supports arbitrary network architectures (functional API)

### Easy prototyping of Deep Learning models
Typical workflow for developing with Keras:

1. Define training data (input tensor and target tensor)
2. Define a model (network as a combination of layers) that maps input tensors to target tensors
3. Configure the learning process by choosing a loss function, an optimizer, and a metric to monitor
4. Train your model on your data
5. Eventually, test your model.

### tf.keras

- https://www.tensorflow.org/guide/keras

One of the main novelties of Tensorflow 2.0 is the tight integration of Keras into Tensorflow. Actually, the first integration of keras as a submodule in Tensorflow dates back to Tensorflow 1.10.0

Instead of resorting to an external library (`keras`) we can directly refer to a tensorflow module (`tf.keras`) that implements the same Keras high-level API for Tensorflow.


Release notes for Keras 2.3.0 on September 17th, 2019, by Francois Chollet  (Keras creator):


- ***This release brings the API in sync with the tf.keras API as of TensorFlow 2.0.*** *However note that it does not support most TensorFlow 2.0 features, in particular eager execution. If you need these features, use tf.keras.*

- *This is also the last major release of multi-backend Keras. Going forward, **we recommend that users consider switching their Keras code to tf.keras in TensorFlow 2.0**.*

**More about it:**
There exist many other software frameworks and libraries for Deep Learning. An exhaustive list [here](https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software).

# Some relevant FAQs from [here](https://research.google.com/colaboratory/faq.html).
**Seems too good to be true. What are the limitations?**
>Colab resources are not guaranteed and not unlimited, and the usage limits sometimes fluctuate. This is necessary for Colab to be able to provide resources for free. For more details, see Resource Limits.
**[More on Resource Limits](https://research.google.com/colaboratory/faq.html#resource-limits)**

**What is the difference between Jupyter and Colaboratory?**
>Jupyter is the open source project on which Colaboratory is based. Colaboratory allows you to use and share Jupyter notebooks with others without having to download, install, or run anything on your own computer other than a browser.

**Where are my notebooks stored, and can I share them?**
>All Colaboratory notebooks are stored in Google Drive. Colaboratory notebooks can be shared just as you would with Google Docs or Sheets. Simply click the Share button at the top right of any Colaboratory notebook.

**Where is my code executed? What happens to my execution state if I close the browser window?**
>Code is executed in a virtual machine dedicated to your account. Virtual machines are recycled when idle  (*not being used*) for a while, and have a maximum lifetime enforced by the system.

**How can I get my data out?**
>You can download any Colaboratory notebook that you’ve created from Google Drive following [these instructions](https://support.google.com/drive/answer/2423534), or from within Colaboratory’s File menu. All Colaboratory notebooks are stored in the open source Jupyter notebook format ( .ipynb).



> **IMPORTANT: Remember to terminate session and disable GPU if you are not using it** to avoid frequent downtime

# Optional
Try to download this notebook and run it on your local machine!
Suggestion: download and install Anaconda from https://www.anaconda.com/download.

It is a package manager, an environment manager, a Python distribution, and a collection of over 1,000+ open source packages. Among the available packages:
1. **jupyter**;
2. **numpy**:  fundamental package for scientific computing with Python;
3. **matplotlib**: a Python plotting library;
4. **scikit-learn**: a Machine Learning library in Python;
5. **NLTK** (Natural Language Toolkit): platform for building Python programs to work with human language data.

See the [cheat sheet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf) for further details


# References

- Abadi, Martín, et al. "Tensorflow: a system for large-scale machine learning." OSDI. Vol. 16. 2016. 

**email contact**: alessandro.renda@unifi.it