# The Cloud, Part II: Virtualization

## Front Matter
### October 21st 2021 - Version 2.1.0

### Contact Details
<div class="alert alert-warning">

 - Dr. James Percival
 - Room 4.85 RSM building (ask first)
 - email: j.percival@imperial.ac.uk
 - Teams: <code>@Percival, James R</code> in module Team, or DM me.
</div>

### Learning Objectives

- Understand the basic difference between an emulator, a virtual machine and a container
- Know how to provision your own virtual machines on Azure
- Know the basics of building your own Docker container
- Understand the basics using SQL in Python

## Virtualization & Containerization

### Compatibility layers/Emulation

Before considering full virtualization, there are compatibility layers (such as the Windows Subsystem for Linux version 1, WSL1, which allows linux problems to run on windows, or WINE, which does the reverse). Very similar are the various translation layers released for Mac OS X during hardware transitions, from powerPC to Intel chips and from 32 to 64 bit software. It is likely that similar software exists as Apple transition from Intel to ARM/Nvidia chips in their latest range.

 These software layers act at the level of application, which allows software created for one operating system to run on another operating system by translating commands intended for one interface into commands intended for another. Very similar is emulation, in which software is created to play the part of hardware not physically present in the machine.

### Virtual Machines

Servers in data centres are often designed to maximize the number of processor cores available on every motherboard, with multiple CPU sockets and high core counts on the individual dies. However the number of cores users actually need varies, often as low as 1 core for simple, low priority serial tasks. Virtualization allows Cloud providers to portion out the cores of a particular server board between multiple users, with each of them experiencing behaviour as if they had access to an entire machine.

Conceptually, virtualization is fairly simple: Computer software has developed in layers, which communicate with each other via standard interfaces. This allows equipment from multiple manufacturers to work together. In a physical computer the hardware & firmware (e.g. the BIOS) sit below the operating system. In a virtual machine a second operating system sits above a software layer running on the host operating which provides an interface to some of the underlying hardware, typically only a subset of the real hardware available to it. 

### A Concrete example: Buying time on an Azure Virtual Machine

Azure provides virtual machine images in Windows and linux. The first exercise talks through setting up your own linux VM via the portal GUI. In short, the instructions are

1. Sign into the [Azure portal](https://portal.azure.com)
2. From the **All services** blade search for and select **Virtual machines** and then click **+ Add** and chose **+ Virtual machine**
3. Set up the machine with appropriate settings in terms of base image and access protocol.

Remember that while the machine is running, it will burn through your subscription credit, so always shut down and delete VMs you no longer want.

## Azure Web services & Web Apps

One of the key questions with cloud services is which protocol to use to access them. For Azure services their are three major options:

- Remote Desktop Protocol (RDP), to access Windows (and some linux) virtual machines and to use them in the same manner as a desktop.
- Secure Shell (SSH), to access a terminal on VMs (or apps on linux through X forwarding)
- Hypertext Transfer Protocol (HTTP/HTTPS) to access services via the web, whether through a browser, or another application.

Lets look further how that can work:

##### RDP

The remote desktop protocol (RDP) allows you to connect to a remote Windows machine via a network and use it as if it were on your desk. Client is avaialble on Windows, Mac & linux, as well as on Android and iOS.

##### SSH

[SSH](https://v4.software-carpentry.org/shell/ssh.html), the secure shell provides a cryptographically secured shell (i.e. prompt, terminal or command line interface) connection from one computer to another via a network. It supports login via username & password, or via the exchange of cryptographic keys (that is, via [public key cryptography](). Implementations, both of the server and the client, are available on Windows, linux and Mac OS X.

<div class="alert alert-warning">
Public key cryptography is based on the idea that some mathematical operations are hard to "undo" without secret information. Perhaps the most famous alorithm, [RSA]() uses results from prime numbers, namely that given two large prime numbers, $p$ and $q$ then if

$$ e\cdot d = 1 \mod (p-1)(q-1) $$ 
with $e$ and $(p-1)(q-1)$ coprime it's true that
$$ m^{ed} = m \mod pq $$.

So if I *pick* $e$ and work out a suitable $d$, I can keep $d$ secret (my private key), but send you $e$ and $n=pq$.

Now **you** can send **me** a message $m$ by calculating
$$c = m^e \mod n $$
which I read by calculating
$$ c^d \mod n = m^{de}\mod n = m \mod n.$$
Similarly I can send you a message which you know comes from me
by calculating
$$c = m^d \mod n $$
which you read by calculating
$$ c^e \mod n = m^{de}\mod n = m \mod n.$$

As far as we know, the only way of breaking this code is to factorise $n = p*q$, which gets expensive.

</div>

Using a command in the form `ssh <user>@<server name>`, e.g. `ssh jrper@sshgw.ic.ac.uk` one can open a connection to a remote machine using the default port number (22).

### HTTP/HTML

By now you are probably sick of hearing about HTTP, so I won't say any more here.

### Using Azure App Services to serve apps

[Azure Web Apps Services](https://azure.microsoft.com/en-gb/services/app-service/web/) delivers http based (especially Flask based) Apps direct from GitHub. We'll do a live demonstration of putting an app on the web from Flask code we put on GitHub.

## Containers

Containers represent a lightweight form of virtualization, where a system runs one operating system kernel, but potentially with many userspaces and filesystems (run from file images on the host system) on top of them. By virtualizing at such a high level tens or hundreds of containers can run simultaneously on the same machine.

### Docker

The most famous and widely used container system is [Docker](https://docker.com). The most used version of this tool uses the Linux kernel to produce a (somewhat) sandboxed userspace connected to its own filesystem. In Docker terminology, scripts called Dockerfiles are used to configure bundles called "images" containing a frozen system which can be copied and unfrozen as a container, including a default executable to run. A simple example Dockerfile might look like the following:

_Dockerfile_

```
# set base image to build on
FROM python:3.8

# set/create current working directory inside container
WORKDIR /example

# copy a file from the host to the container
COPY requirements.txt .

# run a shell command
RUN pip install -r requirements.txt

# default command to run when container starts
ENTRYPOINT python 
```

As with Bash shell commands `#` is used to start a comment in Docker and is not interpreted when building an image. The genuine commands start with an instruction to Docker (written in capital letters  here), followed by arguments to that instruction. The full list of possible commands is available [here](https://docs.docker.com/engine/reference/builder/), along with a smaller cheat sheet [here](https://www.docker.com/sites/default/files/d8/2019-09/docker-cheat-sheet.pdf).

As a short summary:

|Command| Example | Usage |
|:-----:|:-------:|:-----:|
|FROM | `FROM python:3.8`| Specify base image to start from.|
| COPY | `COPY file.txt /home` | Copy a local file to the container image.|
| ADD  | `ADD https://example.com/test.zip .` | Copy local or remote file to container image. |
| WORKDIR | `WORKDIR /home/user` | Switch to/create directory in container.|
| RUN     | `RUN python myfile.py`   | Execute command in container image. |
| CMD     | `CMD echo "hello"       | set default command for `docker run` on image.|
| ENTRYPOINT | `ENTRYPOINT echo` | set default command for `docker run` on image (accepts command line arguments). |

For the Dockerfile given above, we can tell amachine with Docker running to build the instructions in the Dockerfile into a container image using a command with the syntax

```
docker build [OPTIONS] PATH
```

inside the directory containing the Dockerfile and `requirements.txt` file. For example:

```
docker build --tag example .
```
Docker then processes the file line by line into an image, storing each stage as an intermediate checkpoint. When the image is complete it can be run as a container using a command of the form

```
docker run [OPTIONS] IMAGE [COMMAND]
```

The deafult command is set with the CMD or ENTRYPOINT instruction. So to run our Python example interactively connected through the terminal we run

```
docker run -it example
```

This will give you access to the python interpreter _inside_ the container, with only system libraries and the packages from the `requirements.txt` available.

Docker container images can be uploaded and downloaded to a site called [Dockerhub](https://hub.docker.com/), which acts as a repository for them in the same way GitHub does for code. This allows docker images to be shared easily. If you attempt to run an image that is not available as a tag name on your computer, docker will try to download it from dockerhub.

### Azure container instances

Azure provides a service to run web apps and commands packaged into containers. This provides a simple, testable way to run your programs as a web app.


<div class="alert alert-warning">

### Local Python GUIs

There exist a number of GUI Toolkits compatible with Python, including [TK](https://docs.python.org/3/library/tk.html), [GTK+](https://python-gtk-3-tutorial.readthedocs.io/en/latest/) and [QT5](https://www.riverbankcomputing.com/static/Docs/PyQt5/). We'll give an example of the use of the last one, since it interacts well with Anaconda.

The following requires the `qtpy` package.

```bash
pip install qtpy
```

When run, this script creates a basic windoxbox, with two buttons. The "Greet" button directs a greeting to your console, the "Close" button closes the window. Although small, this toy example demonstrates the use of Python to generate and control a widget, and can easily be extended.

Note that this code is written to work locally in a terminal. If you are attempting to run it in a Jupyter session then:
1. The session will have to be running on a local system or one which you connect to via a windowing system (e.g. RDP, or with a suitable SSH connection with X forwarding).
2. You will need to use the `%gui qt` iPython magic (or whichever is appropriate for your choice of GUI toolkit).
    
</div>

In [1]:
%gui qt

from qtpy import QtWidgets, QtCore
import sys

class MainWindow(QtWidgets.QMainWindow):
    
    def __init__(self, parent=None):

        super().__init__()
        self.setWindowTitle("Hello world!")
        
        widget = QtWidgets.QWidget()
        self.setCentralWidget(widget)

        layout = QtWidgets.QVBoxLayout(self)
        widget.setLayout(layout)
        
        self.label = QtWidgets.QLabel("A qt GUI", self)
        self.label.setAlignment(QtCore.Qt.AlignCenter)
        layout.addWidget(self.label)
        
        self.greet_button = QtWidgets.QPushButton("Greet", self)
        self.greet_button.clicked.connect(self.greet)
        layout.addWidget(self.greet_button)
        
        self.close_button = QtWidgets.QPushButton("Close", self)
        self.close_button.clicked.connect(self.close)
        layout.addWidget(self.close_button)
        
    def greet(self, widget, callback_data=None):
        print("Greetings!")
        
    def quit(self):
        self.app.exit()
        
app = QtWidgets.QApplication(sys.argv)
win = MainWindow(app)
win.show()

Greetings!


Qt is installed with Anaconda, however there are a large number of other GUI toolkits out there, including pyGTK, for the GTK toolkit and  [`tkinter`](https://docs.python.org/3/library/tkinter.html) for tcl/tk. The last of these is part of the standard Python library.


## Security and the Cloud

### Firewalls

In general, computers and services connected to the internet for a significant time should expect to be attacked by malicious users, whether in order to gain illicit access to the system to suborn it to their own purposes, or to deny it to others via [denial of service]() attacks, whether from a single location, or from a distributed network. One protection against this is to use [firewalls]() to limit access to systems to come from from [IP addresses]() from which requests are accepted.

Azure in particular provides controls on network interfaces to [limit the ports and services](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/nsg-quickstart-portal) which are available over the network. Default options (and the safest option) usuall denies access unless it is specifically permitted.

### App Authentication & Authorization

#### Single Sign On (SSO)

Understanding of how to deal with passwords has improved over the years, but it is still very easy to make a mistake. On the other hand, as a technically trained person it's possible that it's something you will one day be asked to organize (or manage). Current best practice is at or above the following protocols:

1. Use HTTPS for your initial communication.
2. When a user picks a password, add a "salt" to it, and then apply a cryptographic hashing algorithm.
3. Store the salt & hashed password along with your immutable user key (not necessarily username) as your password database. Forget the clear text password as soon as possible.
4. When user logs in (sending the clear text password) apply the same algorithm as in step 2 and then compare the results.
5. Regardless, secure your database and only grant access on a need to know basis.

In terms of password strength

All this is complicated, both for you and the user, and it would often be easier to make it someone else's problem. Single Sign On (SSO) makes this possible by redirecting authentication requests to a single large provider, who then responds with short lived "tokens" which assert the user's identity to the third party website. The full path of communication is shown in the image below.

There are many providers of SSO services, including famous names such as Google, Facebook, Twitter & Weibo. Many of these use a common framework called Open Authentication version 2 (also known as OAuth2).

A variety of SSO helper packages exist for Python. For Azure & Microsoft Active directory, the relevant package is called `msal`. An example use case, leveraging another package called `flask-login` looks something like the following:

_login.py_:

In [None]:
import os
import secrets

import msal

from flask import Flask, request, flash, redirect,\
    url_for, render_template, session
from flask_login import LoginManager, current_user, UserMixin,\
    login_user, logout_user, login_required

app = Flask(__name__)

__all__ = ['login', 'logout']

login_manager = LoginManager()
login_manager.login_view = 'login'
login_manager.init_app(app)

client_id = os.environ.get('CLIENT_ID', None)
client_secret = os.environ.get('CLIENT_SECRET', None)
tenant_id = os.environ.get('TENANT_ID', None)

csrf_token = secrets.token_urlsafe()

authority = f'https://login.microsoftonline.com/{tenant_id}'

aad = msal.ConfidentialClientApplication(client_id,
                                         client_secret,
                                         authority)

class User(UserMixin):

    def __init__(self, user_id):
        global aad
        self.id = user_id
        print('account', aad.token_cache._cache)

    @property
    def username(self):
        return self.id.split('@')[0]
            
    @property
    def is_authenticated(self):
        global aad
        account = aad.get_accounts(self.id)
        print('is_authenticated', account)
        if account:
            return 'access_token' in aad.acquire_token_silent([], account[0])
        return False

@login_manager.user_loader
def load_user(user_id):
    print(user_id)
    return User(user_id)

@app.route('/login')
def login():

    if current_user.is_authenticated:
        return redirect(url_for('index'))

    
    code = request.args.get('code')
    if code:
        if request.args.get('state') != csrf_token:
            flash('CSRF error!')
            return(url_for('login'))
        response = aad.acquire_token_by_authorization_code(code,
                                                           [])
        if response and 'access_token' in response:
            user = User(response['id_token_claims']['preferred_username'])
            login_user(user)
            flash('Logged in successfully via AAD.')
            return redirect(url_for('index'))
        
    return redirect(aad.get_authorization_request_url([], state=csrf_token))

@app.route('/logout')
def logout():
    global aad
    account = aad.get_accounts(current_user.get_id())
    if account:
        aad.remove_account(account[0])
    logout_user()

    ms_uri = 'https://login.microsoftonline.com/common/oauth2/v2.0/logout'
    site = 'https://localhost:5050'
    
    return redirect(ms_uri+f'?post_logout_redirect_uri={site}'+url_for('index'))

To use this pattern we must create an application secret inside the Active directory blade in the Azure portal, as well as looking up the relevant Tenant ID (the hash which identifies which user directory we are going to be using). These are read from local environment using the `os.environ` object. This is a very common pattern to use for **secret data which should never be stored inside code repositories**.

#### Multifactor Authentication (MFA)

Currently the gold standard for authentication involves 2 factor authentication (or more). Under this philosophy, a user needs to present at least 2 responses from two different categories out of:

1. Something you know (e.g. a password)
2. Something you have (e.g. your phone)
3. Something you are (e.g. your fingerprint).

The idea is that a bad actor needs to steal several things from you in order to obtain unauthorised access. The most common implementation on the web uses a passcode system sent via text message. On cost and convenience grounds it is frequently only used when additional security is required (for dangerous behaviour or when permanently modifying profiles).

#### The GDPR and other legal requirements

The UK and the EU countries have all passed similar data protection law, normally called the General Data Protection Regulation ([GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation)) which protects the personal data of those living in the European Economic Area. The law allows individuals to access and correct their identifiable personal data when stored in easily searchable forms such as on computer. It also places constraints on the forms this information should be stored in, whether they can transfer outside the EEA and the rules over who can access them.

Also computational science is less affected than, for example, medicine, it is still possible that they (or their successor regulations) will one day apply to you. Although the core Articles are relatively complex they boil down to the idea that identifable personal information (individual records linked names, addresses, phone numbers etc, or to personal descriptions) should only be kept for as long as strictly necessary, and only be accessible by those that need to access it for the reason it was originally collected.

Individuals have the right to request a copy of the records held on them by companies (or other bodies engaged in "economic activity"), and to correct any wrong information which is being stored. 

<div style="alert alert-warning">

## Other Azure Cloud Services

### Azure Functions

[Azure Functions](https://azure.microsoft.com/en-gb/blog/introducing-azure-functions/) is a service which allows a Python function to be accessed directly from the web via parameters passed through a URL. An example will be shown in the lecture.

## Data

Azure has several systems available to store data, depending on its format. This might be unstructured binary data, structured databases or something in between

### Blob Storage

To quote Microsoft, [blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is designed to hold:

> - Serving images or documents directly to a browser.
> - Storing files for distributed access.
> - Streaming video and audio.
> - Writing to log files.
> - Storing data for backup and restore, disaster recovery, and archiving.
> - Storing data for analysis by an on-premises or Azure-hosted service.

The data is accessed via a network interface, with charges depending on how frequent access is expected to be and the volume of data transferred. In general a URL is assigned to each item, which can be used in multiple ways, including those listed above, to access the blob object.

### SQL

Azure provides a number of ways to access data in databases. Most of them are built around the [SQL database language](https://www.codecademy.com/articles/sql-commands). SQL, which dates back to 1974, follows a hierarchical approach, with a database server holding databases, each of which can hold multiple tables holding records each of which has multiple values in multiple columns. A useful mental reference is to multiple spreadsheet files (e.g. Excel) each containing multiple sheets with rows with data in multiple columns. However as so often with scriptable text interfaces, access is more powerful, although difficult for newcomers.

Python comes with inbuilt support for SQL in SQLite format, in which individual databases are stored in local files, via the builtin package `sqlite3`. To use a full fat SQL server on Azure appropriate additional software should be downloaded [e.g the MySql connector](https://docs.microsoft.com/en-us/azure/mysql/connect-python). However the basic syntax to connect to, read and update individual databases remains similar.

In [None]:
import sqlite3

#Connect to/create db file
conn = sqlite3.connect('my_db.sqlite')

cur = conn.cursor()
try:
    cur.execute("CREATE TABLE fruit(id INTEGER PRIMARY KEY AUTOINCREMENT, name VARCHAR(50), price INTEGER)")
    print("Table created")
except sqlite3.OperationalError:
    print("Table exists")

# Write some data
cur.execute("INSERT INTO fruit (name, price) VALUES (?,?);", ("apple", 300))

# Read some data
cur.execute("SELECT * FROM fruit;")
rows = cur.fetchall()

for row in rows:
    print(row)

cur.execute("SELECT price FROM fruit WHERE id=?;", "1")
row = cur.fetchone()
print('Price:', row)

conn.commit()
cur.close()
conn.close()

For complicated interactions, packages such as Pandas or SQLAlchemy which wrap together Python types to SQL more closely may be more useful.

#### Azure ML Service

[This product](https://azure.microsoft.com/en-gb/services/machine-learning/#product-overview) provides a service somewhere between PaaS and SaaS allowing you to develop black box machine learning solutions in classification and prediction. To understand what's going on, you should probably wait until later in the course, but feel free to study independently.

## Summary

You should now:
- Understand the basic difference between an emulator, a virtual machine and a container
- Know how to provision your own virtual machines on Azure
- Know the basics of building your own Docker container
- Understand the basics using SQL in Python

## Further Reading

- The [Azure documentation pages](https://docs.microsoft.com/en-us/azure/?product=featured), particularly:
  - The [pages for Python Developers](https://docs.microsoft.com/en-us/azure/developer/python/)
  - The [pages for Virtual Machines](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/)
  - The [pages for container instances](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-quickstart)
- The other Azure Fundamentals [walkthroughs](https://microsoftlearning.github.io/AZ-900T0x-MicrosoftAzureFundamentals/).
- Docker's [documentation](https://docs.docker.com/) pages.
- The Docker [tutorials](https://www.docker.com/101-tutorial).
- The `msal` [documentation](https://github.com/AzureAD/microsoft-authentication-library-for-python).
- More information on [SQL and Python](https://realpython.com/python-sql-libraries/).