<center>
<img src="./images/00_main_arcada.png" style="width:1400px">
</center>

## Lecture 3: GitHub and collaborative development


## Instructor:
Anton Akusok <br/> 
email:  anton.akusok@arcada.fi<br/>
messages: "Anton Akusok" @ Teams

# Goals for today

* Learn about Git, version control, etc... (boring stuff)
* Learn why you have not used GitHub yet
* Learn about automation and CDP
* Learn about tokens, storing them securely, and using automatically
* Hands-on exercises for all of the above

## Agenda 

1. Intro to Git
    - 1.1 GitHub: the Real Intro to Git
    - 1.2 GitHub demos
2. GitHub workflow
     - 2.1 Exercise 1: GitHub basics
     - 2.2 Exercise 2: GitHub 🫶 Jupyter
3. *break*
4. CDP: Continuous Deployment Pipeline
     - 4.1 GitHub Actions
     - 4.2 Exercise 3: Hello GitHub Actions
5. Secrets: Store sensitive info
     - 5.1 GitHub Secrets
     - 5.2 Exercise 4: Introduction to secret scanning
6. (optional) Hands-on with GitHub Actions
     - 6.1 Secrets in Actions
     - 6.2 The Cat API

# 1. Intro to Git

![git](images/git.png)

Git is an algorithm invented by Linus Torwalds to manage changes done by different programmers at different time.

* "invented by Linus Torwalds" is important: he is a genius of today
* Git internal workings is insanity; it does ultra-complex stuff and FAST
* Git is a worldwide standard  

* We learn to *use* Git like we use a smartphone
* Understand the minimum and work around mistakes

In [4]:
from IPython.display import YouTubeVideo

YouTubeVideo('hwP7WQkmECE', width=800, height=300)

In [5]:
from IPython.display import YouTubeVideo

YouTubeVideo('e9lnsKot_SQ', width=800, height=300)

Local repository vs. local folder

- I can work on new feature A, bug B in production system, and check John's suggestion C
    - these are 3 different views
- all these views exist at once in *repository*, but *folder* shows only 1 view
- a repository has many views, in fact **all** of them!  
    - one commit == one view


- Git itself only cares about the views/commits
- "branch" is a convenient tag
    - but you can use the raw "hash" numbers of specific view/commit
    - "head" and "main" branch are standard tags
- creating a branch is creating a new tag, totally free
- "branch" tag follows the last commit of a commit chain
- switch to the branch by `git checkout <branch>`

## Git is similar to a blockchain

![git hash](images/git_hash.png)

* working directory is a playground
* "staging" are changes prepared for commit, but not committed yet
* when switching branches, "staging" changes travel with you
* when switching branches, non-staged changes will be destroyed

* when switching branches, non-staged changes will be destroyed
    - git does not know what to do with them
    - it will complain and refuse to switch!
    - you can stage the changes, or...
    - you can **stash** the changes (to a pocket :D)
    - `git unstash` at any time, or forget they ever existed

## 1.1 GitHub: the Real Intro to Git

![git-features](images/git_features.jpg)

### GitHub is a *social* thing! (for nerds)

![github-what](images/github_what.jpg)

Git:

- a smart version control system
- command line utility
- weird things

vs. GitHub:

- make things done
- enable collaboration in/outside organization
- help each other
- deploy to production

Why Git is hard to learn?

- git really makes sense for collaboration
- github really makes sense for collaboration
- you have no use for git while studying...
- ... but employers expect you to master git!

Actual usage of GitHub

![git-together](images/allison-horst-jenny-bryan-quote.png)

## 1.2 GitHub demos

## [Scikit-learn GitHub Repository](https://github.com/scikit-learn/scikit-learn)

# 2. GitHub workflow

![workflow](images/workflow.jpeg)

### GitHub parts

* repository clone button
* (new!) Copilot button
* Pull Requests (PRs)
* PR -> message thread
* PR -> Files changed
* PR -> Files changed -> Add line comment
* PR -> Files changed -> Approve button
* Settings -> Branches -> Branch protection rules

### GitHub thinking:

1. I need to fix a bug...
2. (make experimental space)  
`git branch 3512-add-country-code-to-campaigns`  
Make a branch any way you want, but give it a **good** name
2. (make collaboration space)  
Create PR + add name, good description, link to Jira ticket  
Remember: someone need to read and understand it!
3. (get my own sandbox)  
`git fetch && git pull 3512-<tab>`  
Get the branch to my local machine, [Tab] gives autocomplete
4. Do some coding

### GitHub thinking:

5. Do some coding
6. (prepare my changes)  
`git add -A && git commit -m "Load and format country code"`  
Keep changes to few files. One branch == one feature!
7. (publish my changes)  
`git push` and **wait for checks to complete**  
Don't ask for review before checks!  
New commits void approvals, one typo means asking again...
8. (review) Answer questions, make requested changes
9. Re-approve and *squash* merge  
Squash means the whole branch is one commit in `main`
10. Mark Jira ticket as complete

### Important things besides coding

* Clearly write the purpose of Pull Request
* Do one thing only in one Pull Request
* Tell what you want to review
    - A quick approval, please!
    - Can you check the function logic?

Reviewing:
* Read through, seriously...
* Mark potential issues "This will hurt us later", but...
* Be kind, and Be OK with people solving issues their way not your way!
* https://rewind.com/blog/best-practices-for-reviewing-pull-requests-in-github/

### Never work in `main` branch

Nobody serious ever sends code directly into the "main" branch!

Don't skip this rule even for your own 1-person repositories.

PRs help having a *fully working* update before it is merged into the code.  
Also they help testing out new stuff that may not work, or have several proposals in parallel.

Committing directly into `main` branch will leave you with broken code, and a very annoying way back to previous working version. Don't do this to yourself.

(also interviewers will check your GitHub and you don't want them to see no pull requests)


## 2.1 Exercise 1: GitHub basics


## https://skills.github.com  
### --> "Review Pull Requests"

## 2.2 Exercise 2: GitHub 🫶 Jupyter


#### Task: report validation performance in this simple ML notebook.

The researcher used all data for training. Split the data into training+validation, compute and print the model accuracy on validation set.

1. Clone repository in GitHub  
**https://github.com/akusok/github_friendly_jupyter**
2. Pull repository to your machine
3. Make a branch, open Jupyter, and change the code
4. Stage and commit your changes
5. Push back to GitHub and make a PR
6. Check the changelog of your PR

**Hint: replace `github.com` with `github.dev` to get online text editor!**

## JupyText: meaningful changelog for notebooks in Git


### https://jupytext.readthedocs.io

`pip install jupytext`

Also a VS Code extension:
![jupytext](images/jupytext.png)

Working transparently in Jupyter Lab

**Paired notebook** updates real-time with the source notebook

![jupytext-lab](images/jupytext-lab.png)

VS Code runs Jupytext directly with outputs in a separate window

![jupytext-vscode](images/jupytext-vscode.png)

## 2.2 Exercise 2, part 2: GitHub 🫶 Jupyter


#### Task: report validation performance in this simple ML notebook.

The researcher used all data for training. Split the data into training+validation, compute and print the model accuracy on validation set.

7. Make a new branch that replaces Jupyter with Jupytext
8. Make a third branch with validation in Jupytext
9. Create PR of 3rd branch --> 2nd branch  
(PRs do not need to end in `main` branch)
10. Check the changelog

# <3. Break>

![break](images/break.jpeg)

# 4. CDP: Continuous Deployment Pipeline

Also called CI/CD for Continuous Integration / Continuous Delivery

![lego](images/cdp.png)

CDP is some code attached to the Pull Requests

Purpose: smooth collaboration across people and systems!

* code quality validation: formatting, unit tests
* enforce conventions e.g. must bump library version
* acting upon external systems: deploy a service or a pipeline
* publish artefacts: upload new library version
* any custom thing

Large companies probably have their own CDP

GitHub has a CDP called "Actions"

All of them are equally good. This is just a way of running code.

## 4.1 GitHub Actions

![actions](images/lego.jpg)

## But I don't know GitHub Actions!

![copilot](images/copilot.jpeg)

We will use Copilot. Or any other LLM, does not matter - ideally the one that integrates with your IDE.

Writing automation with Copilot is **silly fast**. Ridiculously easy.

Like, it feels almost offensive if you learned any programming thing by reading a book before.

(You still need to read books and understand what you are doing. But from that point on, it became silly fast starting from 2023 A.D.)

## Why Copilot / LLM?

(I say Copilot but really mean "LLM AI coding assistant" from now on)

- Because nobody is a "Sertified GitHub Actions developer"
- You know **what** to do, but don't know **how**
- Copilot knows **how** saving hours of Googling or days of reading books
- It also debugs errors that is majestic when starting with a new tech
- It has a great intergation with VS Code (because Microsoft owns VS Code, GitHub, and basically OpenAI)

* LLMs are really good at telling you "how", but are utter garbage at telling "what".

* **You** have to think. And LLMs won't replace people any time soon. 

* But they save a ton of time learning skills - like you learned to hold a spoon, to ride a bicycle, or to write specific code.

* Usually a person knows many things "a little" and one thing "deeply". This is called a "T-shaped knowledge". LLMs deepen the arms of "T", making you good with stuff you know a little about.

In [6]:
from IPython.display import YouTubeVideo

YouTubeVideo('67_aMPDk2zw', width=800, height=300)

## Automation is someone's computer

- Automation is code running on someone's computer. Just like you run code on a laptop.

- (It probably uses a Docker container - still a computer)

- It runs basic bash scripts. Learning very basics of bash scripts is very helpful, you will understand what a script means.

https://docs.csc.fi/support/tutorials/env-guide/linux-bash-scripts 

![bash_scripts_readme](images/bash_csc.png)

- For automating something, first literally run it on your laptop in a terminal (Linux and Mac computers have terminal, Windows can install a linux terminal too).  

- Then put the same code in automation script.

- There is no difference, both your laptop and an automation virtual machine are a regular computers connected to the Internet.

- Automation needs an "event" that starts the run, like you press "Enter" to run a command in a terminal.

*That's all what is automation - a computer, a script, and an event.*

![github_actions](images/github-actions.png)

*some slides about GitHub Actions that I literally asked ChatGPT to generat for me because I am too lazy to do it myself...*

## GitHub Actions: Automating Your Workflow

### What are GitHub Actions?
- GitHub Actions is a powerful automation tool provided by GitHub.
- It allows you to automate tasks and workflows directly within your GitHub repository.

### Key Features
- **Automation**: Set up workflows to automatically perform tasks such as testing, building, and deploying your code.
- **Event-driven**: Trigger workflows based on various events, such as push, pull request, or issue creation.
- **Customizable**: Define your workflows using YAML syntax and customize them to fit your project's needs.
- **Integration**: Easily integrate with other tools and services, such as testing frameworks, cloud providers, and deployment platforms.

### Why GitHub Actions?
- **Streamlines development**: Automating repetitive tasks saves time and effort, allowing you to focus on writing code.
- **Ensures consistency**: With automated workflows, you can ensure consistent testing, building, and deployment processes across your projects.
- **Facilitates collaboration**: Share and reuse workflows across teams to standardize development practices and improve collaboration.


## Anatomy of a GitHub Actions Workflow

### Workflow File
- Workflows are defined in YAML files stored in the `.github/workflows` directory of your repository.
- Each workflow file contains one or more jobs, which consist of a sequence of steps to be executed.

### Events
- Workflows are triggered by events such as push, pull request, or schedule.
- You can specify the events that should trigger your workflow in the workflow file.

### Jobs and Steps
- Jobs represent a unit of work that can run concurrently.
- Each job consists of one or more steps, which are individual tasks to be executed.
- Steps can include actions, shell commands, or scripts.

### Actions
- Actions are reusable units of code that perform specific tasks within a workflow.
- You can use built-in actions provided by GitHub or create custom actions tailored to your project's needs.


## Getting Started with GitHub Actions

### Creating a Workflow
- To create a new workflow, navigate to the `.github/workflows` directory of your repository and click "New file."
- Name your workflow file with a `.yml` extension and define your workflow using YAML syntax.

### Running Workflows
- Workflows are automatically triggered by events specified in the workflow file.
- You can also manually trigger workflows or schedule them to run at specific times.

### Monitoring Workflows
- View the status and logs of your workflows in the "Actions" tab of your repository.
- Monitor workflow runs, troubleshoot failures, and review logs to ensure your automation is working as expected.

### Example Workflow
```yaml
name: CI

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Install dependencies
      run: npm install
    - name: Run tests
      run: npm test
```

##  4.2 Exercise 3: Hello GitHub Actions

## https://skills.github.com  
### --> "Hello GitHub Actions"

# 5. Secrets: Store sensitive info

![api-keys](images/secrets.png)

## 5.1 GitHub Secrets

**What are GitHub Secrets?**
  - GitHub Secrets are secure storage for sensitive information like API keys, tokens, passwords, etc.
  - They are encrypted and can be used in GitHub Actions workflows.

**Why Use Secrets?**
  - Protects sensitive data from being exposed in your code repository.
  - Ensures that only authorized workflows can access sensitive information.

**Where are Secrets Stored?**
  - Secrets are stored at the repository level, organization level, or environment level within GitHub.

**Steps to Add a Secret:**
  1. **Navigate to Your Repository:**  Go to the GitHub repository where you want to store the secret.
  2. **Access Settings:** Click on the "Settings" tab of the repository.
  3. **Find Secrets:** In the sidebar, click on "Secrets" and then "Actions".
  4. **Add a New Secret:**  
    - Click on "New repository secret".  
    - Enter a name for the secret in the "Name" field.  
    - Enter the secret value in the "Value" field.  
    - Click "Add secret" to save.

**Accessing Secrets in Workflows:**
  - Secrets can be accessed in GitHub Actions using the `secrets` context.
  - Example usage in a workflow:
    ```yaml
    name: CI

    on:
      push:
        branches:
          - main

    jobs:
      build:
        runs-on: ubuntu-latest

        steps:
        - name: Checkout code
          uses: actions/checkout@v2

        - name: Use Secret in Action
          run: echo "Using secret ${{ secrets.MY_SECRET }}"
    ```

- **Key Points:**
  - Replace `MY_SECRET` with the name of your secret.
  - Secrets are automatically masked in logs to prevent exposure.
  - Ensure workflows have the necessary permissions to access secrets.

##  5.2 Exercise 4: Introduction to secret scanning

## https://skills.github.com  
### --> "Introduction to secret scanning"

# 6.1 Secrets in Actions


Task: Save a secret message, print it out in Actions

1. Go to GitHub website
2. Create a new secret
3. Make another branch
4. Ask LLM to load the secret in your Actions script
5. Print the secret value with the help of LLM - make sure it does not exist in plain text
6. Commit and push
7. Check the Actions output

# 6.2 Cat-as-a-Service

https://thecatapi.com

![cat-api](images/cat-api.png)

Get cats with an API call 

![cats](images/cats2.png)

## The Cat API

- An actual service you can register and get an API key

- Real calls, receive real data

- ... but no BS like Google Drive API, and very easy to understand

- Same workflow you would use for automation at work

# Hands-on exercise: Cat-as-a-Service

Task: Load a cat and validate it is actually a cat image
1. Register at the Cats API and get an API key by email
2. Save the API key to GitHub Secrets
3. Make another branch
4. Ask LLM to help you get the cat image with an API call in Python
5. Ask LLM to build a very simple computer vision code to check if an image has a cat in it
6. Make another Action with the help of LLM. Install necessary libraries in the Actions script.
7. Commit and push
8. Make sure it works; debug and fix if not
9. Now add deployment target: Load and pring in terminal a cat image every minute upon merge to master. 
10. Ask LLM how to print an image in terminal with text as graphics.
11. Pass all validations and merge the PR
12. Observe an action actually loading a cat image every minute
13. Stop all actions once done