# [CPSC 222](https://github.com/GonzagaCPSC3222) Intro to Data Science
[Gonzaga University](https://www.gonzaga.edu/)

[Gina Sprint](http://cs.gonzaga.edu/faculty/sprint/)
# Git and Github

What are our learning objectives for this lesson?
* Understand the purpose of version control
* Create a GitHub repository
* Learn the basics of the Git tools in VS Code

Content used in this lesson is based upon information in the following sources:
* [Git](https://git-scm.com)
* [GitHub](https://github.com/)
* [Hubspot Git/GitHub Tutorial](http://product.hubspot.com/blog/git-and-github-tutorial-for-beginners)

## Warm up Task(s)
1. If you haven't already:
    * Install [Git](https://git-scm.com/downloads), install [Anaconda Python Distribution](https://www.anaconda.com/products/individual), and install [VS Code](https://code.visualstudio.com/) 
    * Make a [Github](https://github.com/) account
1. Open VS Code
1. Let's get to know some people! What are 4 non-obvious things your group has in common?

## Today
* Announcements
    * MA1 intro to Python handout is due next class 
    * MA2 HelloWorldClassroom Github repo URL due to Canvas Wednesday night
    * Congrats on finishing week 1 (already!)
* Some quick notes on why Python for data science
    * IEEE Spectrum article about top programming languages: https://spectrum.ieee.org/top-programming-languages-2024
* Welcome questionnaire results and [demo](https://github.com/GonzagaCPSC222/U0-Introduction/tree/master/WelcomeQuestionnaireDemo)
    * "I want to understand everything lower level to do with data science. I want to understand how and why things work, because I believe that will allow me to do the higher level work more effectively in an actual job."
    * "I want to know how to code in python as I think it is a valuable skill, I also want to know how APIs work and how to code APIs. I want to be well educated as I want to be a financial analyst and know that python is an essential skill. I want to go above and beyond in this class to learn as much as I can about python"
    * "I am a little apprehensive about learning a new coding language, but I've heard that it'll be easy because I already know one (C++)."
    * "I am a black belt in taekwondo."
    * "I helped lead my school's soccer team to back to back 14-0 seasons."
    * "I am a singer songwriter :)"
    * "I am very interested in warhammer 40k."
    * "Something interesting about me is I am very good at Blackjack (I have won a tournament and profited about $1000 from playing on a cruise ship) and I have grown up playing an abundance of games with my family."
* Git and Github
    * VS Code example with your personal Github account
    * Note: MA2 has you follow along with a VS Code example with Github Classroom (how you will submit DAs -- data assignments)

## Overview
This lesson is going to be a collection of online resources for learning about Git and Github. While it is important to learn the fundamentals of these tools by reading the following materials, the best way to learn is to actually play with these tools. Follow along on your own machine where appropriate.

## Version Control
From the [Git website](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control):
>What is "version control", and why should you care? Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. 

>If you are a graphic or web designer and want to keep every version of an image or layout (which you would most certainly want to), a Version Control System (VCS) is a very wise thing to use. It allows you to revert files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover. In addition, you get all this for very little overhead.

[Git](https://git-scm.com) is a popular VCS that is the version control software that GitHub is built on. [GitHub](https://github.com/) is a code hosting platform for version control and collaboration. People from all over the world (or in the exact same place) can work together on projects. Code that is hosted on GitHub can be either public or private.

## GitHub
First, make a free [GitHub](https://github.com/) account.

Next, follow this Hello World tutorial on Github: [https://guides.github.com/activities/hello-world/](https://guides.github.com/activities/hello-world/). This tutorial walks you through the following:

1. Create a repository. A repository is a folder that organizes your GitHub project. Files and folders in your repository are version controlled. You should always include a README.md file in each of your repositories. README files are markdown files (yay!) that describe your repository. Read more about README.md files [here](https://help.github.com/articles/about-readmes/). 
1. Create a branch. A branch is a version of the repository that someone is working on. The "master" branch of your repository is the main branch. When you are going to work on a new feature of the software, you will make a new branch off of the master branch. This new branch is a copy of the master branch. When you are done implementing your new feature, you merge your branch back into master.
1. Create a commit. A commit is a change to file(s) in the repository that you want to save.
1. Create a pull request. When you make a pull request, you are asking someone to pull in your proposed changes from your branch to be merged with another branch (usually master). 
1. Merge a pull request.

## Common Git Commands
So far, we have been using GitHub via its browser interface for simple non-code edits. More commonly, you will be writing code locally on your machine. You will then want to push the code changes from your local repository to the repository on GitHub via a graphical user interface (GUI) tool or the command line interface. In this class, we will use the VS Code GUI tools; however, if you know how to use the command line, I highly recommend you use the command line for great practice and more control over your repository.

1. Create a git repository (`init` command)
    1. In VS code, open the version control panel on the left side of the app 
    1. Click "Initialize Repository"
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_init.png" width="400">

1. Stage a file (`add` command)
    1. This step tells Git which files you want to package into a commit.
    1. You can see the files with changes in the version control panel. You can choose to commit all the files (easiest) or individual files)
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_changes.png" width="400">

    1. With VS Code, this step can be combined with the next step (commit)
1. Create a commit (`commit` command)
    1. A `commit` is a group of code changes and descriptions of the changes
        1. Author
        1. Description message
        1. Timestamp
        1. Branches
        1. Identifier
    1. Type a commit message in the textbox, then press cmd + enter (Mac) or ctrl + enter (Windows) to make the commit (or press the checkmark icon). 
    1. The first time you do this, you will see the following pop up. You can choose "Yes" or "Always" to stage and commit all changes automatically. For beginning Git users, this is the best approach.
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_commit.png" width="400">

1. Create Github repo on Github website
    1. Note: If planning to push existing local repo to remote
        1. Don’t initialize with README or .gitignore or license 
    1. Copy the URL for the repo you just made. You will need it to set up the Github repo as your local Git repo's remote.
1. Add a local repository to GitHub (`remote add` command)
    1. Click the three dots icon and "Remote", then "Add Remote"
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_remote_add.png" width="400">

    1. Paste the Github repo URL
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_remote_add_url.png" width="400">

    1. Name the remote "origin" (this is the convention typically used)
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_remote_add_origin.png" width="400">

1. Push local changes to GitHub (`push` command)
    1. Under the same three dots menu icon, click "Pull, Push" then "Push to"
    1. Choose your remote called origin
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_push_to.png" width="400">

    1. Now you whenever you want to push, you can simple press the push/pull icon in the lower left side of the status bar 
<img src="https://github.com/GonzagaCPSC222/U1-Git-Github/raw/master/figures/vscode_push_pull_status.png" width="400">

1. Update your local repository with the most up to date code on GitHub (`pull` command)
    1. Under the same three dots menu icon, click "Pull, Push"
1. Clone remote Github repo to create local repo (`clone` command)
    1. Note: can only push to this repo if you have write privileges (e.g. you are owner/collaborator)

## (Optional) Command Line Git Interface
First, install Git onto your machine: [Installing Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).

Next, follow this Git/GitHub tutorial: http://product.hubspot.com/blog/git-and-github-tutorial-for-beginners. This tutorial walks you through the following (I’ve added example commands for working with a repository called Test):

1. Create a git repository (`init` command)
    1. Example:
```
echo “# Test” >> README.md
git init
```

1. Stage a file. Using the `add` command, this step tells Git which files you want to package into a commit.
    1. Example: `git add README.md`
    1. (or `git add -A` or `git add .` to add all files to staging area)
1. Create a commit (`commit` command).
    1. A `commit` is a group of code changes and descriptions of the changes
        1. Author
        1. Description message
        1. Timestamp
        1. Branches
        1. Identifier
    1. Example: `git commit -m "initial commit"`
1. Create a branch (`branch` command).
    1. A branch is a sequence of code commits
    1. Master branch is main code (always ready for production)
    1. To develop new feature, make a new development branch
        1. Example: `git branch development`
    1. Merge development branch back into master when you are done implementing the new feature
        1. Example:
```
git checkout master
git merge development
```
1. Create Github repo on Github website
    1. Note: If planning to push existing local repo to remote
        1. Don’t initialize with README or .gitignore or license 
        1. Will have to use `git push -u origin master --allow-unrelated-histories` if you try to push local repo to remote repo that is non-empty
    1. Note: If planning to clone remote repo to initialize local repo
        1. Initialize with README and/or .gitignore and/or license
        1. These will be copied down you clone
1. Add a local repository to GitHub (remote add command).
    1. Example: `git remote add origin https://github.com/gsprint23/Test.git`
1. Push local changes to GitHub (push command).
    1. Example: `git push -u origin master`
1. Update your local repository with the most up to date code on GitHub (pull command).
    1. Example: `git pull`
1. Clone remote Github repo to create local repo
    1. `git clone https://github.com/gsprint23/Test.git`
    1. Note: can only push to this repo if you have write privileges (e.g. you are owner/collaborator)


Here is great graphic that summarizes the Git commands learned so far (no need to worry about most of these details for now).

<img src="http://assets.osteele.com/images/2008/git-transport.png" width="600">

## Git Ignore Files
Files that end in .gitignore contain rules for files/folders that git should ignore, meaning not put under version control. This is nice if you typically always add all files for staging (e.g. `git add -A` or `git add .`) instead of individual ones. Here are a few examples of files and folders you might want to have git ignore in this class:
* \_\_pycache\_\_
* .ipynb_checkpoints
* DS_Store
* .vscode

Instead of creating a .gitignore file from scratch, head over to https://github.com/github/gitignore and find a standard .gitignore file for the kind of project/programming language you are working on. Then you can customize for your project. For this class, it is convenient to start with the Python.gitignore: https://github.com/github/gitignore/blob/master/Python.gitignore

I recommend adding a .gitignore to your repository when you create it. Otherwise you'll have to remove files from git using `git rm --cached [filenames]` or `git rm -r --cached .` to remove all files. Then add the .gitignore file and you can add files for tracking again.

## Caching your Github Password
So you don't have to enter your password each time you want to push to Github, follow this resource for your operating system: https://help.github.com/articles/caching-your-github-password-in-git/#platform-all

For Mac, use keychain. For Windows, use the [Git Credential Manager for Windows](https://github.com/Microsoft/Git-Credential-Manager-for-Windows). For Linux, use [libsecret](https://wiki.gnome.org/Projects/Libsecret). Follow these steps from this [Stackoverflow post](https://askubuntu.com/questions/773455/what-is-the-correct-way-to-use-git-with-gnome-keyring-and-https-repos):
1. You can install libsecret and the development libraries with:
    1. sudo apt-get install libsecret-1-0 libsecret-1-dev
1. Then you need to build the credential manager
    1. cd /usr/share/doc/git/contrib/credential/libsecret
    1. sudo make
1. Finally, you should point git to the newly created file in your config:
    1. git config --global credential.helper /usr/share/doc/git/contrib/credential/libsecret/git-credential-libsecret