# DSCI 100 - Introduction to Data Science


## Lecture 5 - Collaboration with version control

<img src="https://media.giphy.com/media/0Av9l0VIc01y1isrDw/giphy.gif" width=400>

Source: <https://media.giphy.com/media/0Av9l0VIc01y1isrDw/giphy.gif>

# Housekeeping 
- Group projects posted
- Project contract due this next week's Saturday (Oct 12)
- No Tutorial assignment this week.
- The midterm will be on Oct 16 during the tutorial. 
    - The midterm's duration will be 70 minutes.


## Course policy on plagiarism.

- The quiz format is closed-book, you can only consult the [Python Reference Sheet](https://canvas.ubc.ca/courses/153793/modules/items/7177701); 
    - No need to print this, you will have during the exam;

- You can find more information on what happens if you violate academic integrity here: https://science.ubc.ca/students/blog/academic-integrity.
    - This also applies to worksheets/tutorials


## What is version control?

- **Version control:** the process of keeping a record of changes to documents, including when the changes were made and who made them
- lets you view earlier versions and revert changes
- facilitates resolving conflicting edits
- originally for software development, but is now used for many tasks (e.g. data analysis!)


## Why do we need tools to help us collaborate? 

No big deal. Just send files to your teammates in emails. Right?

<img align="left" src="http://www.phdcomics.com/comics/archive/phd101212s.gif" width="500" />

Problems:
- which version is the newest?
- who made edits, when where they made?
- what were you working on? (when you revisit the project 3 months from now)
- can't easily revert changes (if something breaks)
- no sane way to discuss todo items, issues, etc.


## Why do we need tools to help us collaborate? 

OK, fine. Then let's just share and edit files on dropbox/google drive. Right?

<img src="http://www.geekyedge.com/wp-content/uploads/2015/08/Dropbox-to-Google-Drive.jpg" width="500" />

These solve *only* the problem of knowing which version is the newest

Still:
- can't tell who made edits, when they were made
- can't tell what you were working on when you revisit the project 3 months from now
- can't easily revert changes
- no sane way to discuss todo items, issues, etc

(and honestly, you still usually end up with `final_revision_v3_Oct2020_final.docx`...)


### Git and GitHub

In this course we use two major tools for version control

<img src="img/logo.png" width=400>

**Git:** 
- keeps track of files in a **repository** (a folder that you tell Git to pay attention to)
- responsible for keeping track of changes, sharing files with others, handling conflict resolution, etc
- Git runs on your (and your teammates') machine

**GitHub:**
- a service that hosts your repository in the cloud
- helps manage permissions (who can view your project, who can edit it)
- provides tools for project-specific communication (organized into *issues*)
- can be used to build and host websites/blogs

Git - works on your local computer (e.g., JupyterHub workspace or your laptop)

GitHub - remote repository hosting service (stores a copy of your work on the cloud)

### Key version control concepts and commands

<img src="img/git_intro/Slide1.jpeg" width=1000>

Now we will introduce key version control concepts and commands. 4 "places" we need to know about to understand how these work are: your working directory, the staging area, the hidden`.git` directory, and the remote repository. Only the staging area is not a real location on your computer, it is a conceptual/abstract place that acts as a holding area. We will learn more about this in a minute.


### Key version control concepts and commands

<img src="img/git_intro/Slide2.jpeg" width=1000>

Here we made changes to three files, however, we only want to share the changes to `README.md` and `analysis.ipynb` as `notes.txt` is our own private notes file that we are not quite ready to share yet (or maybe it's a file we will always keep private).

### Key version control concepts and commands

<img src="img/git_intro/Slide3.jpeg" width=1000>

To tell Git which files' changes we would like to log as part of our version control, we tell Git what files we want to **add**. This moves the changes to a abstract place called the "staging area".

### Key version control concepts and commands

<img src="img/git_intro/Slide4.jpeg" width=1000>

Then, to actually log the changes in our version control history, we tell git that we'd like to **commit** the changes, and you'll see later in our demo, that when we do that, we will also a provide a relevant message that gets stored with the changes - allowing us to later understand what those changes were about.

These changes get archived in a hidden `.git` folder. This special folder contains all the changes we ever logged, as well as who logged them, the messages associated with them, and the address of the remote reposiotry (if one exists).

### Key version control concepts and commands

<img src="img/git_intro/Slide5.jpeg" width=1000>

Finally, we tell Git to **push** our changes. When we do this, git uses the address in the hidden `.git` folder to send the changes from our local computer to the remote repository (e.g., on GitHub).

### Key version control concepts and commands

<img src="img/git_intro/Slide7.jpeg" width=1000>

Sometimes changes exist on the remote repository, but you don't yet have them on your local computer. This can happen because you edited a file directly using GitHub's web interface, or a collaborator pushed changes to the remote repository.

### Key version control concepts and commands

<img src="img/git_intro/Slide8.jpeg" width=1000>

To get these changes on your local computer, you need to tell Git to **pull** these changes. This will bring the changes into your working directory and the version control history log in the hidder `.git` folder.

## Demo time!

Show the students how to:

1. create a public GitHub repo with a README
    - Use the template repo since this is what students will use https://github.com/UBC-DSCI/dsci-100-project_template
    - The advantage for the demo is that checkpoints files do not show up as untracked which makes it less confusing.
2. edit a file there using the pen tool
3. clone that repo to the JupyterHub using the Jupyter Git extension
4. create a new Jupyter notebook that does something simple (like print hello world) and put it under version control (add and commit)
5. push the committed changes to GitHub
    - You also need to create a PAT https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
6. visit GitHub and see the changes (ooohhh ahhh!)

### What did we learn?

- 
- 
- 
