# Lecture 1: Tools for Scientific Computing 1 (1.5 Hours) #

### ABSTRACT ###

In this Lecture, we will do stuff remotely and reversibly. **todo**

---

**TODO**:

- Note that ``ssh`` will prompt for host public keys when you try to log in for the first time.

## Secure Shell for Remote Computing (40 Minutes) ##

In Lecture 0, we covered the use of the command line. Here, we'll take that farther by using the SSH (secure shell) protocol to run command lines on remote machines. This is useful for a range of different tasks, but is perhaps most commonly used in scientific computing to run computations on high-performance computing (HPC) resources such as clusters. In this Lecture, we'll also show how to use SSH together with version control to really make our lives simpler.

Let's start by running a new command line session, as described in Lecture 0. We've created temporary user accounts on an example remote server, such that we have something to SSH into. In the new command line session (either PowerShell or bash), run the ``ssh`` command below, replacing ``<user>`` by the username provided on your slip of paper:

```bash
$ ssh <user>@epqis.cgranade.com
```

You will then be prompted for the password associated with this username. Type it in and press **enter** to finish starting your SSH session on the remote machine ``epqis.cgranade.com``. The new SSH session will be heavily encrypted, as is critical for security.

*NB: the ``ssh`` command will not echo your password to you, so the password prompt will look blank. On macOS / OS X, Terminal.app may or may not place a 🔑 emoji in the command line to indicate that echoing is turned off, depending on your version of macOS / OS X.*

Were this an actual useful server, and not (as is actually the case) a Raspberry Pi running in an apartment somewhere, you could then run HPC applications from the comfort of your laptop. Sadly, reality is a little more boring at the moment, so instead feel free to pretend by running a few commands of your choice. Just don't set anything on fire, cause problems with law enforcement, or otherwise act in a very rude fashion with this example server.

Once you're satisfied, run ``exit`` or press **Ctrl-D** to exit the SSH session and return to your own computer's command line. Though this was a useful exercise, it's somewhat limited by one critical problem: we needed to type in our password. This will become very annoying if we have to use SSH as part of an automated process, as will be the case when we learn Git later in this lecture. To solve this, we rely on the *public key infrastructure* (PKI) built into SSH.

Very roughly, in PKI, one can create *keypairs* of a matching private and public key. Anyone with the public key can encrypt a message that can only be decrypted with the matching private key. In SSH, this concept is used to provide a much better alternative to passwords. If you have provided an SSH server with your public key, then the server can ask you to decrypt something in order to prove you have the matching private key. Using PKI in this way means that your password does not have to be sent over the network, greatly reducing your attack surface. The security of SSH's PKI can be further enhanced by using a *passphrase* with your private key. Effectively, a private key with a passphrase cannot be used except by someone who knows that passphrase. Critically, this passphrase is **never** sent across the network, but is only used by your local machine to reconstruct the full private key.

With this bit of handwavy theory out of the way, let's jump in by generating an SSH key.

*NB: Please skip these steps if you already have an SSH key.*

The SSH client we installed in Lecture 0 comes with a handy command ``ssh-keygen`` for doing generating keys.

```bash
$ ssh-keygen
```

You will be prompted where to store the new private key. Press **enter** to accept the default of ``~/.ssh/id_rsa``. You will then be asked to choose a passphrase. Pick something approximately as long as an English sentence, and press **enter** to confirm it. You'll be asked to type it again to help prevent errors. Since this passphrase provides part of the entropy for your private key, it must be quite long compared to traditional passwords. Note that this passphrase cannot be recovered if you forget it— this is by design.

*NB: you should **never** enter your passphrase over a network, as it should **only** be used locally to unlock your private key.*

To tell an SSH server about your *public* key, named ``~/.ssh/id_rsa.pub`` by default, use the ``ssh-copy-id`` command. This will copy the public key to your ``~/.ssh/authorized_keys`` file on the server, which the server then use the next time you try to log in. You should be prompted for your password on the server for the last time in this process.

```bash
$ ssh-copy-id <user>@epqis.cgranade.com
$ ssh <user>@epqis.cgranade.com
```

If all went well, you will instead by prompted by your local machine for the passphrase that you used when generating your key.

A particularly astute and snarky observer may object at this point that we seem to have done something far less convienent than passwords, while justifying our pursuit with convienence as a goal. Indeed, now we need to manage keys *and* type in a far longer string of characters each time we wish to use SSH to do, well, anything. Thankfully, we're not done yet, as we have yet to use an SSH *agent*. An agent is a piece of software running on our local machine that manages SSH keys on our behalf, such that once a key is unlocked by our passphrase, we need not use that passphrase again until the agent decides to lock the key based on its security policy. On macOS / OS X, Keychain acts as an SSH agent and is built into the operating system, such that we should not need to do any further work. Similarly, on Ubuntu, an SSH agent called ``ssh-agent`` is provided by default when we install ``ssh`` using ``apt-get``, as in Lecture 0.

On Windows, the story is a little more complicated, but thankfully we installed everything we need back in Lecture 0. In particular, the problem we run into is that the port of OpenSSH to Windows is in progress, such that even though it supports ``ssh-agent``, this support is not in a form that most Windows programs can make use of. Thus, we will instead use a command-line program ``plink.exe`` that comes with the PuTTY SSH client. This program uses Pageant, the Windows-style SSH agent provided with PuTTY, rather than ``ssh-agent``.

**TODO**:

- Add Pageant to startup menu
- Convert private key to PuTTY format
- Add private key to Pageant
- Run plink
- Add ``GIT_SSH``

- SSH forwarding?

## Version Control with Git (50 Minutes) ##

Version control systems, roughly speaking, provide a structured way of managing *changes* to a project over time, and across a set of collaborators. This allows for telling who changed what and when, undoing changes, and sharing changes with collaborators. Many version control systems, such as Subversion or CVS, are *centralized* in that all changes are uploaded to, downloaded from, and tracked by a single server. This imposes a lot of restrictions however (try using Subversion when you're offline, working on a plane!), so here we'll learn a bit about *decentralized* version control, and will see how to use Git in particular.

To start off, let's look a bit at how Git stores and manages changes. Effectively, Git is a tool for managing directed acyclic graphs (DAGs), where each node is a set of files that comprises a project. Each edge is then the difference between two sets of files, and describes how to reconstruct each node, given the state of a project at some other node.
A graph of this form is called a *repository*, and each node is called a *commit*. Probably the easiest way to get a handle on the concept is to jump right in.

```bash
$ cd ~
$ mkdir epqis16-tmp-repo
$ cd epqis16-tmp-repo
$ git init
Initialized empty Git repository in C:/Users/cgranade/epqis16-tmp-repo/.git/
```

This makes a new, empty *repository* at ``~/epqis16-tmp-repo``. Before anything else, there's one critical command we need to know about: ``git status``. This command tells us a brief summary of what the current status of a repository is, along with other useful information that we'll explore in more detail later.

```bash
$ git status
On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)
```

Importantly, ``git status`` has told us that there's nothing to commit, so let's go on and change that by adding some files.

```bash
$ echo "foobar" > a.txt
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        a.txt

nothing added to commit but untracked files present (use "git add" to track)
```

Now ``git status`` tells us that there's a file that's *untracked*. This means that it isn't part of any commit, and thus isn't a part of the repository yet. Let's go on and make a new commit to hold ``a.txt``:

```bash
$ git add a.txt
$ git commit -m "Added an important new file."
[master (root-commit) 3fd06ad] Added an important new file.
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 a.txt
$ git status
On branch master
nothing to commit, working tree clean
$ git log
commit 3fd06ad19d9ace8ffd5334bf277d8ee523e60b1a
Author: Chris Granade <cgranade@cgranade.com>
Date:   Thu Oct 27 12:03:45 2016 +1100

    Added an important new file.
```

In the above example, we used the ``-m`` flag (short for "message") to give a short description of the changes in our new commit. Doing so helps out your collaborators (such as future you) figure out what your commit did, so please be polite to yourself and others by writing short but descriptive commit messages.

At any rate, since ``a.txt`` is now in the repository, ``git status`` tells us that there's nothing left to commit at the moment. Suppose, though, that we decide that ``a.txt`` is actually entirely wrong, and need to change it to something else.

```bash
$ echo "baz" > a.txt
$ cat a.txt
baz
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   a.txt

no changes added to commit (use "git add" and/or "git commit -a")
```

Now, ``a.txt`` is in the repo, but our changes to it are not. Helpfully, ``git status`` guides us a bit here, and suggests ``git add a.txt`` will do what we need. Let's do that and check ``git status`` one more time.

```bash
$ git add a.txt
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   a.txt
```

We can finish the commit as before by running ``git commit``.

```bash
$ git commit -m "Fixed important bug with a.txt."
```

Before moving on, we stress that everything we've done so far has been *offline*. We haven't used any server at *all* to do this, because ``git init`` just made a new repository for us entirely. All of our commits have been to that repository on our local machine. In practice, however, Git really shines when it works with one or more servers (being decentralized, we needn't limit ourselves to just one), as you can then *pull* and *push* changes between your repository and external servers. To do this, then, we can either run our own server (which is not a bad idea, but is well beyond the scope of this workshop), or you can use a service which provides Git hosting for you. Thankfully, several such services exist, including GitHub, Bitbucket, and GitLab.

For now, we'll use GitHub as an example, owing to its relative popularity, but most of this example should work on other hosting providers with at most minor modifications. Roughly speaking, we'll proceed in four steps:

**todo** fill these out

- Sign up for an account (and any academic discounts)
- Upload SSH *public* keys to new account
- Make a new repository
- Clone the new reposistory to our laptop

Now that we have a repository that's synced with a server, called a *remote*, let's take a step back and look at the how everything works in terms of the directed acyclic graph (DAG) structure.

- DAGs
- Edges as diffs, mention latexdiff
- GitHub, Bitbucket
- Configuring Git to use SSH with Plink (Windows-only), Nano (OS X and Windows only)
- Integration with Sublime Text, VS Code