# DSCI 521 - Computing Platforms for Data Science


## Lecture 3 - More Git, as well as mark-up languages, webpage basics and GitHub Pages


### 2018-09-12

# Lecture learning goals

## By the end of the lecture, students should be able to:


1. [Tell Git to ignore irrelevant files using a `.gitignore` file](#1.-Tell-Git-to-ignore-irrelevant-files-using-a-.gitignore-file)
2. [Get a copy of someone else's repo on GitHub by forking it](#2.-Get-a-copy-of-someone-else's-repo-on-GitHub-by-forking-it)
3. [Catch up to a GitHub repo you forked once you fall behind](#3.-Catch-up-to-a-GitHub-repo-you-forked-once-you-fall-behind)
4. [Use GitHub pages to create and host a website](#4.-Use-GitHub-pages-to-create-and-host-a-website)
5. [Set up keys for SSH for use with GitHub](#5.-Set-up-keys-for-SSH-for-use-with-GitHub)
6. [Use Markdown, HTML tags and LaTeX to format text in literate code documents.](#6.-Use-Markdown,-HTML-tags-and-LaTeX-to-format-text-in-literate-code-documents.)

# 1. Tell Git to ignore irrelevant files using a `.gitignore` file

You may have encountered this before:

```
git status
```

```
On branch timberst-master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.ipynb_checkpoints/
	.DS_Store

no changes added to commit (use "git add" and/or "git commit -a")
```

Git is letting us know about untracked files (ones we have never committed before). We don't care about these files. We'd prefer not to have them clutter our view (so we can pay attention to files we do want to track). What do we do?

## Create a `.gitignore` file

Using the plain text editor of your choice (mine is Atom) create a file called `.gitignore` inside your Git repo. To do this with Atom, I would type:

```
atom .gitignore
```

Inside the text file, list the files and folders you would like to ignore, one per line. For example:

```
.ipynb_checkpoints/
.DS_Store
```

Save the file, and `add` and `commit` it with Git. Then try `git status` again. You should see:

```
On branch timberst-master
nothing to commit, working tree clean
```

## `.gitignore` tips and tricks

- append `**/` to the beginning of any file/folder names listed in the `.gitignore` file to have them ignored in subdirectories within the repo as well
- create a [global `.gitignore` file](https://help.github.com/articles/ignoring-files/#create-a-global-gitignore) so that you do not have to create the same `.gitignore` for all your homework repos

Let's create a `gitignore` file in our 521 lab 2 repo. 

### Steps to follow: 
1. Use a text editor (e.g., Atom, nano, Jupyter) to create a file called `.gitignore` in your 521 lab 2 repo
2. Add `**/.ipynb_checkpoints/` to that file and save it
3. `add` and `commit` it with Git
4. Type `git status` and see if you no longer see `.ipynb_checkpoints/` as a untracked file

# 2. Get a copy of someone else's repo on GitHub by forking it

Usually you do not directly write to someone else's GitHub repo. Instead you [Fork](https://help.github.com/articles/fork-a-repo/) a copy for yourself on GitHub and then edit that copy. That way you have write access to your fork/version of the repo which you can edit. 

Let's fork the repo for this course. Then you can all have a copy that you can edit/add notes too. 

### Steps to follow: 
1. From the [course repo](https://github.ubc.ca/MDS-2018-19/DSCI_521_platforms-dsci_students) click on "Fork" (upper right-hand-side)
2. Once you have your own copy of the repo, clone it. 
3. Prove to yourself that you have write access of that copy by editing a file locally and then `add`ing, `commit`ing and `push`ing your changes back to GitHub.


# 3. Catch up to a GitHub repo you forked once you fall behind

As changes are made to the original repo, you will fall behind (those changes are not immediately added to your copy of the repo). To avoid this, you need to actively pull in those changes. 

First time you need to catch up, you have to tell your computer where the "upstream"/original repo is:

```
git remote add upstream <original_repo_URL>
```

Then to catch up this time (and any other time) you type:

```
git fetch upstream
git merge upstream/master
```

These two commands together are like a git pull from the repo you forked.

Now use `git push` to send these changes (which currently only exist locally on your laptop) to your copy of the GitHub repo.

Now I will make a change to the course repo, and then you can try to catch up!

### Steps to follow: 
1. Add the course repo as the remote by typing `git remote add upstream https://github.ubc.ca/MDS-2018-19/DSCI_521_platforms-dsci_students.git` (assuming you cloned with HTTPS) at the command line
2. Get the changes I made in the original repo by typing the following at the command line:
    ```
    git fetch upstream
    git merge upstream/master
    ```
3. Type `git push` to send the changes to your copy of the repo on GitHub.
4. Go to your copy of the repo on GitHub and see if you can see the changes.

# 4. Use GitHub pages to create and host a website

In the lab we will be using GitHub pages to create and host a blog. However, it is useful to know how to use GitHub pages to create a website in general so that you can use them to:

- share documentation about a tool you built
- share information for an event you are organizing
- share information for a class you are teaching
- for fun?
- others?

Let's all create our own version of a website from a silly little GitHub repo I made.

### Steps to follow: 
1. Fork [this repo](https://github.com/ttimbers/hello/blob/master/README.md).

2. Make some changes to the README using the pen tool to personalize it.

3. Enable this GitHub repository to be viewed as a website by selecting the master branch as the source in the GitHub Pages panel in the Settings Page, and then clicking "Save".

4. Wait a few minutes and then visit the URL. Can you see your website?

5. Go back to the GitHub repo for the website and commit another change to the README. Wait a few minutes and then visit the URL agin. Can you see your changes?

6. Experiment with changing the website theme (using the "Choose a theme" button in the GitHub Pages panel in the Settings Page)
7. (Optional) share your website with the random Slack channel

# 5. Set up keys for SSH for use with GitHub

## Remotely accessing another computer using SSH 

Let's start with some definitions:

### Definitions

**Secure SHell (SSH)** - a common method for remote login to another computer which is secure.

**server** - a machine you are SSHing into. The server sits and waits to be contacted.
 
**client** - usually your machine. The client initiates contact with the server.


## Review - password-based authentication

* Passwords are short and tend to be somewhat easy to "break" (guess).
  * Say your password contains 12 characters
  * each character is one of 26 uppercase letters, 26 lowercase letters, 10 digits, or ~10 special characters
  * total probably around ~70 possibilities per character, so $70^{12}\approx 10^{22}$
  * This is a HUGE number, except that there are patterns within passwords that make them easier to guess
  * More on this in DSCI 541

## SSH key-based authentication

Two components: 

1. public key
2. private key

These files have an asymmetrical relationship:

- the public key CANNOT decrypt messages generated by the private key
- the private key CAN decrypt messages generated by the public key


## Understanding public key private key concepts

- Think of a public key, not as a key, but as a padlock that you can make copies of and put anywhere you want.
- To put your ‘padlock’ on an another machine, you would copy it to `authorized_keys` in the `~/.ssh` folder.
- Think of a private key as an actual key, it can open the padlock that is stored on the other machine.
![alt tag](imgs/keys_1.png)
*source: http://blakesmith.me/2010/02/08/understanding-public-key-private-key-concepts.html*

## How the lock works

- Keys are generated using `ssh-keygen`, to make private key (usually called `id_rsa`) and a public key (usually called `id_rsa.pub`) 
- You can make copies of `id_rsa.pub` (public key/padlock) and distribute them to other machines
- The other machine uses the public key to encrypt a challenge message to you
- You need to show that you can decrypt the message to demonstrate that you are in possesion of the associated private key

## One private key can open many locks

![alt tag](imgs/keys_2.png)
*source: http://blakesmith.me/2010/02/08/understanding-public-key-private-key-concepts.html*

## Keeping your private key safe

- `ssh-keygen` allows you to put a password or passphrase on the private key
- this should be shared with NO ONE!
- if your private key does fall into the wrong hands, the person must still know the password or passphrase to use the private key 

![alt tag](imgs/password_strength.png)
*source - https://xkcd.com/936/*

## Why SSH keys over passwords

SSH keys use the [RSA cryptosystem](https://en.wikipedia.org/wiki/RSA_(cryptosystem)

The private key is much longer than a password. A standard now is 4096-bit keys, which means $> 10^{1200}$ possibilities. This makes it harder for a hacker to break (guess) the password. More on this when we discuss binary numbers next week.

Aside: Quantum computers will be able to break RSA encryption. It is very hard to predict whether this is years or decades away.

### Authentication vs. encryption

* The system described above is purely for _authentication_
  * the client needs to prove to the server that the client is authorized to access the server
  * Someone with authority has put the public key in `~/.ssh/authorized_keys` on the server
  * The server now grants access to anyone possessing a private key matching one of these public keys
  
* This is separate from _encryption_ of the data flowing between the client and server.
  * This prevents eavesdroppers from listening to client-server communications

## Take home exercise 1 - Set up SSH key-based identification for your Github account

1. Connect with Github using an SSH key as opposed to HTTPS ([instructions here](https://help.github.com/articles/generating-an-ssh-key/))
2. After you have added your new SSH key to Github, change the remote URL on an one of your existing Github repos from HTTPS to SSH ([instructions here](https://help.github.com/articles/changing-a-remote-s-url/#switching-remote-urls-from-https-to-ssh)), and make a small change that you can push to Github using SSH.

# 6. Use Markdown, HTML tags and LaTeX to format text in literate code documents.

Start working on [lab 2](https://github.ubc.ca/MDS-2018-19/DSCI_521_platforms-dsci_students/blob/master/release/lab2/lab2.ipynb)!

## Attribution 
1. [Happy Git and GitHub for the useR by Jenny Bryan and the STAT 545 TAs](http://happygitwithr.com/)
2. [Software Carpentry](https://software-carpentry.org/), specifically the Git lessons