# [CPSC 322](https://github.com/DataScienceAlgorithms) Data Science Algorithms
[Gonzaga University](https://www.gonzaga.edu/) |
[Sophina Luitel](https://www.gonzaga.edu/school-of-engineering-applied-science/faculty/detail/sophina-luitel-phd-0dba6a9d)


# Git and Github

What are our learning objectives for this lesson?
* Understand the purpose of version control
* Create a GitHub repository
* Learn the basics of the Git command line interface
* Learn the basics of the Git tools in VS Code

Content used in this lesson is based upon information in the following sources:
* [Git](https://git-scm.com)
* [GitHub](https://github.com/)
* [Hubspot Git/GitHub Tutorial](http://product.hubspot.com/blog/git-and-github-tutorial-for-beginners)
* Dr. Gina Sprint's Data Science Algorithm, Fall 2024 course

## Warm up Task(s)
* Make a personal [Github](https://github.com/) account
* Install [Git](https://git-scm.com/downloads), [Docker Desktop](https://www.docker.com/products/docker-desktop), and [VS Code](https://code.visualstudio.com/download)
* Open your command line (e.g. terminal or command prompt)
    * Make sure you can run the command `git` at the command line
        * Windows users may need to add git to their path

    

## Today
* Git and Github
    * Command line example with your personal Github account 
    * Use VS Code and Git
* (if time) Intro to Docker

## Overview
This lesson is going to be a collection of online resources for learning about Git and Github. While it is important to learn the fundamentals of these tools by reading the following materials, the best way to learn is to actually play with these tools. Follow along on your own machine where appropriate.

## Version Control
From the [Git website](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control):
>What is "version control", and why should you care? Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. 

>If you are a graphic or web designer and want to keep every version of an image or layout (which you would most certainly want to), a Version Control System (VCS) is a very wise thing to use. It allows you to revert files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover. In addition, you get all this for very little overhead.

[Git](https://git-scm.com) is a popular VCS that is the version control software that GitHub is built on. [GitHub](https://github.com/) is a code hosting platform for version control and collaboration. People from all over the world (or in the exact same place) can work together on projects. Code that is hosted on GitHub can be either public or private.

## GitHub
First, make a free [GitHub](https://github.com/) account.

Next, follow this Hello World tutorial on Github: [https://guides.github.com/activities/hello-world/](https://guides.github.com/activities/hello-world/). This tutorial walks you through the following:
1. Create a repository. A repository is a folder that organizes your GitHub project. Files and folders in your repository are version controlled. You should always include a README.md file in each of your repositories. README files are markdown files that describe your repository. Read more about README.md files [here](https://help.github.com/articles/about-readmes/). 
1. Create a branch. A branch is a version of the repository that someone is working on. The "master" branch of your repository is the main branch. When you are going to work on a new feature of the software, you will make a new branch off of the master branch. This new branch is a copy of the master branch. When you are done implementing your new feature, you merge your branch back into master.
1. Create a commit. A commit is a change to file(s) in the repository that you want to save.
1. Create a pull request. When you make a pull request, you are asking someone to pull in your proposed changes from your branch to be merged with another branch (usually master). 
1. Merge a pull request.


## Common Git Command
So far, we have been using GitHub via its browser interface for simple non-code edits. More commonly, you will be writing code locally on your machine. You will then want to push the code changes from your local repository to the repository on GitHub via a command line interface. To do this, we will use Git and Git commands.

First, install Git onto your machine: [Installing Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).

Next, follow this Git/GitHub tutorial: http://product.hubspot.com/blog/git-and-github-tutorial-for-beginners. This tutorial walks you through the following (I’ve added example commands for working with a repository called FirstweekTest):
### **Workflow 1: Start Locally**
#### 1. Create a git repository (`init` command)
* Example:
```
echo "# Test" >> README.md 
git init
```

#### 2. Stage a file (`git add`)
Using the `add` command, this step tells Git which files you want to package into a commit. 
* Example: `git add README.md`
* (or `git add -A` or `git add .` to add all files to staging area)
#### 3. Create a commit (`git commit`).
A `commit` is a snapshot of your code changes and includes descriptions of the changes. Each commit contains:
* Author
* Description message
* Timestamp
* Branches
* Identifier
   * Example: `git commit -m "initial commit"`
 
#### 4. Create a branch (`git branch`). (optional)
Branches allow you to work on features independently of the main code.
* `main` (or `master`) branch: Production read code
* Create a development branch to work on new features
    * Example: `git branch development` (creates branch but you have to switch to it before working)
    * `git checkout -b development` (creates a branch and switches to it immediately)
* Merge development branch back into master when you are done implementing the new feature
   * Example:
```
git checkout master
git merge development
```
* Note: When you want to merge a feature branch into `main` (or `master`), you need to switch to the branch you want to merge into first, in this case, `master`
#### 5. Create Github repo on Github website
* Note: If planning to push existing local repo to remote
        1. Don’t initialize with README or .gitignore or license 
        1. Otherwise you may need `git pull origin main --allow-unrelated-histories`
* Note: If planning to clone remote repo to initialize local repo
        1. Initialize with README and/or .gitignore and/or license
        1. These will be copied down you clone
#### 6. Add a local repository to GitHub (`git remote add`).
Connect your local repo to a remote repo
    1. Example: `git remote add origin https://github.com/sluitel2025/FirstweekTest.git`
#### 7. Push local changes to GitHub (`git push`).
Send your commits to the remote repository
* Example: `git push -u origin master`

### **Workflow 2: Start by Cloning a Remote Repo**

#### 1. Clone remote Github repo to create local repo
If you are starting from an existing remote repo 
* `git clone https://github.com/sluitel2025/FirstweekTest.git`
*  Note: can only push to this repo if you have write privileges (e.g. you are owner/collaborator)
#### 2. Make Changes/Edit files 
#### 3. Stage Files (`git add`)  
#### 4. Create Commit (`git commit`)  
#### 5. Create Branch (`git branch` or `git checkout -b`)  
#### 6. Push Changes (`git push`)  
  
#### 7. Update your local repository with the most up to date code on GitHub (`git pull`).
* When you want to sync with remote changes
    

### **Check Repository Status and History**
- Check which files are staged, unstaged, or untracked: 
  `git status`  
- View commit history with messages, author, and timestamps: 
  `git log`  
- Check which branch you are currently on: 
  `git branch`  
  - The current branch will be highlighted with a `*`.  
Here is great graphic that summarizes the Git commands learned so far (no need to worry about most of these details for now).

<img src="http://assets.osteele.com/images/2008/git-transport.png" width="600">

## VS Code: Common Git Commands
I highly recommend you use the command line for great practice and for more control over your git repository; however, you can also use the really good built-in VS Code GUI tools. The steps below walk through how to execute common git commands using VS Code.

1. Create a git repository (`init` command)
    1. In VS code, open the version control panel on the left side of the app 
    1. Click "Initialize Repository"
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_init.png" width="400">

1. Stage a file (`add` command)
    1. This step tells Git which files you want to package into a commit.
    1. You can see the files with changes in the version control panel. You can choose to commit all the files (easiest) or individual files)
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_changes.png" width="400">

    1. With VS Code, this step can be combined with the next step (commit)
1. Create a commit (`commit` command)
    1. A `commit` is a group of code changes and descriptions of the changes
        1. Author
        1. Description message
        1. Timestamp
        1. Branches
        1. Identifier
    1. Type a commit message in the textbox, then press cmd + enter (Mac) or ctrl + enter (Windows) to make the commit (or press the checkmark icon). 
    1. The first time you do this, you will see the following pop up. You can choose "Yes" or "Always" to stage and commit all changes automatically. For beginning Git users, this is the best approach.
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_commit.png" width="400">

1. Create Github repo on Github website
    1. Note: If planning to push existing local repo to remote
        1. Don’t initialize with README or .gitignore or license 
    1. Copy the URL for the repo you just made. You will need it to set up the Github repo as your local Git repo's remote.
1. Add a local repository to GitHub (`remote add` command)
    1. Click the three dots icon and "Remote", then "Add Remote"
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_remote_add.png" width="400">

    1. Paste the Github repo URL
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_remote_add_url.png" width="400">

    1. Name the remote "origin" (this is the convention typically used)
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_remote_add_origin.png" width="400">

1. Push local changes to GitHub (`push` command)
    1. Under the same three dots menu icon, click "Pull, Push" then "Push to"
    1. Choose your remote called origin
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_push_to.png" width="400">

    1. Now you whenever you want to push, you can simple press the push/pull icon in the lower left side of the status bar 
<img src="https://raw.githubusercontent.com/DataScienceAlgorithms/M1_Introduction/main/figures/vscode_push_pull_status.png" width="400">

1. Update your local repository with the most up to date code on GitHub (`pull` command)
    1. Under the same three dots menu icon, click "Pull, Push"
1. Clone remote Github repo to create local repo (`clone` command)
    1. Note: can only push to this repo if you have write privileges (e.g. you are owner/collaborator)

# Classroom Activity
* Fork the repo
* Clone your forked repo
* Edit the file (print your name)
* Add and Commit your change
* Push 
* Create a Pull Request
* Go to your fork on GitHub
* You will see a Compare and pull request button, click it!
* Submit the PR back to the class repo


## Git Ignore Files
Files that end in .gitignore contain rules for files/folders that git should ignore, meaning not put under version control. This is nice if you typically always add all files for staging (e.g. `git add -A` or `git add .`) instead of individual ones. Here are a few examples of files and folders you might want to have git ignore in this class:
* \_\_pycache\_\_
* .ipynb_checkpoints
* DS_Store
* .vscode

Instead of creating a .gitignore file from scratch, head over to https://github.com/github/gitignore and find a standard .gitignore file for the kind of project/programming language you are working on. Then you can customize for your project. For this class, it is convenient to start with the Python.gitignore: https://github.com/github/gitignore/blob/master/Python.gitignore

I recommend adding a .gitignore to your repository when you create it. Otherwise you'll have to remove files from git using `git rm --cached [filenames]` or `git rm -r --cached .` to remove all files. Then add the .gitignore file and you can add files for tracking again.

## Caching your Github Password
So you don't have to enter your password each time you want to push to Github, follow this resource for your operating system: https://help.github.com/articles/caching-your-github-password-in-git/#platform-all

For Mac, use keychain. For Windows, use the [Git Credential Manager for Windows](https://github.com/Microsoft/Git-Credential-Manager-for-Windows). For Linux, use [libsecret](https://wiki.gnome.org/Projects/Libsecret). Follow these steps from this [Stackoverflow post](https://askubuntu.com/questions/773455/what-is-the-correct-way-to-use-git-with-gnome-keyring-and-https-repos):
1. You can install libsecret and the development libraries with:
    1. sudo apt-get install libsecret-1-0 libsecret-1-dev
1. Then you need to build the credential manager
    1. cd /usr/share/doc/git/contrib/credential/libsecret
    1. sudo make
1. Finally, you should point git to the newly created file in your config:
    1. git config --global credential.helper /usr/share/doc/git/contrib/credential/libsecret/git-credential-libsecret