## MLOps Lifecycle Toolkit Chapter 2 Lab:
* [Basic Git Commands](#writing-math)
* [Creating Feature Branches and Pull Requests](#logistic-classifier)
* [Git Internals](#installing-packages)
* [Cloning Remote Repositories](#bayesian-example)
* [Troubleshooting Git Issues with Stash and Blame](#vectorizing-functions)


## Basic Git Commands <a class="anchor" id="basic-git-commands"></a>


There are several basic git commands used in data science. Let's go through a few. It's recommended to copy each of these commands in a termal within vs code but you can also run them in a notebook.

* Important: If you copy them in a terminal, remove the ! notebook magic at the start of the command * 



In [44]:
!ls

my_first_change.py  my_first_repo  new_file.txt  sample_data


In [45]:
!mkdir my_first_repo/ && ls 

mkdir: cannot create directory ‘my_first_repo/’: File exists


In [46]:
!cd my_first_repo

When you run git init, git creates the .git directory a hidden directory where all of the work is done internally such as storing and updating snapshots.

In [47]:
!git init

Reinitialized existing Git repository in /content/.git/


We can use git status command to take a look at untracked files 

In [48]:
!git status

On branch MLOPS-feature-branch-101
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31msample_data/[m

nothing added to commit but untracked files present (use "git add" to track)


## Create a new branch called MLOPS-feature-branch-101. 

In [49]:
!git checkout -b MLOPS-feature-branch-101

fatal: A branch named 'MLOPS-feature-branch-101' already exists.


In [51]:
!git branch

* [32mMLOPS-feature-branch-101[m


# Let's look at the history of commits

In [52]:
!git log

[33mcommit 7a3684695da79d9633f337f8d4069d6eda76eba5[m[33m ([m[1;36mHEAD -> [m[1;32mMLOPS-feature-branch-101[m[33m)[m
Author: Your Name <you@example.com>
Date:   Sun Mar 19 22:04:46 2023 +0000

    feat: new file

[33mcommit be77e1e22db79e39e50aa383e34e031627db4905[m
Author: Your Name <you@example.com>
Date:   Sun Mar 19 22:02:51 2023 +0000

    feat: my first commit.


## There's no commits yet. Lets make a change to a file before running the git commit to commit your changes.

In [53]:
!touch my_first_change.py && echo "print(Hello World!)" >> my_first_change.py

In [54]:
!git status

On branch MLOPS-feature-branch-101
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   my_first_change.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31msample_data/[m

no changes added to commit (use "git add" and/or "git commit -a")


## Notice, you can see your change in untracked changes. We need to add it to the stage using git add first.

In [55]:
!git add my_first_change.py

In [56]:
!git status

On branch MLOPS-feature-branch-101
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mmodified:   my_first_change.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31msample_data/[m



## Let's try to unstage this file to demonstrate what git add did 

In [57]:
!git rm --cached my_first_change.py && git status

rm 'my_first_change.py'
On branch MLOPS-feature-branch-101
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mdeleted:    my_first_change.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31mmy_first_change.py[m
	[31msample_data/[m



In [58]:
!git add my_first_change.py && git status

On branch MLOPS-feature-branch-101
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mmodified:   my_first_change.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31msample_data/[m



What happened? We added it to the stage. We can folow a standard called conventional commit messaging for the messages.

In [60]:
!git commit my_first_change.py -m 'feat: This is my first commit'

On branch MLOPS-feature-branch-101
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31msample_data/[m

nothing added to commit but untracked files present (use "git add" to track)


## Whoops! We get an error since this is your first time using git you need to set your user & email

In [61]:
  !git config --global user.email "you@example.com"
  !git config --global user.name "Your Name"

## Let's commit one more time

In [62]:
!git commit my_first_change.py -m "feat: my first commit."

On branch MLOPS-feature-branch-101
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.config/[m
	[31msample_data/[m

nothing added to commit but untracked files present (use "git add" to track)


## Let's check if we have any remotes before pushing our changes.

In [63]:
!git remote -v 

## Set upstream changes on your remote branch 

In [64]:
!git branch --set-upstream-to=<remote>/<branch> MLOPS-feature-branch-101

/bin/bash: remote: No such file or directory


## You should try to do a pull before you push as there could be upstream changes. You may need to handle merge conflicts here.

In [65]:
!git pull 

There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=<remote>/<branch> MLOPS-feature-branch-101



In [30]:
!git push -u origin main

error: src refspec main does not match any
[31merror: failed to push some refs to 'origin'
[m

## Let's check the log again, can you see your new commit?

---



In [None]:
!git log

[33mcommit 0ac75c7d59f8a9a2b8bcc316dbf53f68e090720a[m[33m ([m[1;36mHEAD -> [m[1;32mMLOPS-feature-branch-101[m[33m)[m
Author: Your Name <you@example.com>
Date:   Sat Mar 18 23:12:16 2023 +0000

    feat: my first commit.


## Git Internals

## Let's look at what's inside the .git directory. We can see branches, HEAD, hooks, index , refs and objects. All important hidden directories.

In [None]:
!ls -la .git

total 44
drwxr-xr-x 7 root root 4096 Mar 16 00:46 .
drwxr-xr-x 1 root root 4096 Mar 16 00:44 ..
drwxr-xr-x 2 root root 4096 Mar 16 00:44 branches
-rw-r--r-- 1 root root   92 Mar 16 00:44 config
-rw-r--r-- 1 root root   73 Mar 16 00:44 description
-rw-r--r-- 1 root root   41 Mar 16 00:45 HEAD
drwxr-xr-x 2 root root 4096 Mar 16 00:44 hooks
-rw-r--r-- 1 root root   65 Mar 16 00:46 index
drwxr-xr-x 2 root root 4096 Mar 16 00:44 info
drwxr-xr-x 5 root root 4096 Mar 16 00:46 objects
drwxr-xr-x 4 root root 4096 Mar 16 00:44 refs


## Let's try a git porcelain command. This gives output of status command we already know in an easy to read format for scripts.

In [69]:
!git status --porcelain 

?? .config/
?? sample_data/




```
# Cloning remote repositories

We can clone public rpeos such as sckit-learn using the git clone command. Note, you should also create a Github account at this point and try forking this repo and cloning it under your own account.

Instructions for setting up a Github account can be found: https://docs.github.com/en/get-started/signing-up-for-github/signing-up-for-a-new-github-account

In [71]:
!git clone https://github.com/scikit-learn/scikit-learn.git

fatal: destination path 'scikit-learn' already exists and is not an empty directory.


## What did git clone do? Let's list files in our working directory. Do you notice anything different?

In [72]:
!ls 

my_first_change.py  my_first_repo  new_file.txt  sample_data  scikit-learn


# Let's cd (change directory) into the repo in this case it's called scikit-learn

In [73]:
!cd scikit-learn && ls -la

total 164
drwxr-xr-x 13 root root  4096 Mar 19 22:20 .
drwxr-xr-x  1 root root  4096 Mar 19 22:20 ..
drwxr-xr-x  3 root root  4096 Mar 19 22:20 asv_benchmarks
-rw-r--r--  1 root root  9830 Mar 19 22:20 azure-pipelines.yml
drwxr-xr-x  2 root root  4096 Mar 19 22:20 benchmarks
drwxr-xr-x  2 root root  4096 Mar 19 22:20 .binder
drwxr-xr-x  7 root root  4096 Mar 19 22:20 build_tools
drwxr-xr-x  2 root root  4096 Mar 19 22:20 .circleci
-rw-r--r--  1 root root  1392 Mar 19 22:20 .cirrus.star
-rw-r--r--  1 root root   921 Mar 19 22:20 .codecov.yml
-rw-r--r--  1 root root   645 Mar 19 22:20 CODE_OF_CONDUCT.md
-rw-r--r--  1 root root   388 Mar 19 22:20 conftest.py
-rw-r--r--  1 root root  2109 Mar 19 22:20 CONTRIBUTING.md
-rw-r--r--  1 root root  1532 Mar 19 22:20 COPYING
-rw-r--r--  1 root root   150 Mar 19 22:20 .coveragerc
drwxr-xr-x 16 root root  4096 Mar 19 22:20 doc
drwxr-xr-x 33 root root  4096 Mar 19 22:20 examples
drwxr-xr-x  8 root root  4096 Mar 19 22:20 .git
-rw-r--r--  1 root root 

## Troubleshooting Git Issues

In [74]:
!git remote -v 

In [75]:
!git pull origin master 

fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


# Oh no! There's conflicts, we can't pull remote changes. What do we do?

## You will need to stash your changes first.

In [76]:
!git stash

No local changes to save


## Now try popping your changes back

In [77]:
!git pop

git: 'pop' is not a git command. See 'git --help'.

The most similar command is
	log


## Finding who was responsible for introducing a bug to the README file. 

### We can use git blame command which works at the file level.

> Indented block



In [78]:
!git blame README.MD

fatal: no such path 'README.MD' in HEAD


# DVC Version Control

First we need to install the dvc package on PyPi

In [79]:
!pip install dvc 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dvc
  Downloading dvc-2.50.0-py3-none-any.whl (411 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m411.6/411.6 KB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorama>=0.3.9
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting grandalf<1,>=0.7
  Downloading grandalf-0.8-py3-none-any.whl (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 KB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pygtrie>=2.3.2
  Downloading pygtrie-2.5.0-py3-none-any.whl (25 kB)
Collecting dvc-studio-client<1,>=0.5.0
  Downloading dvc_studio_client-0.6.1-py3-none-any.whl (9.8 kB)
Collecting rich>=12
  Downloading rich-13.3.2-py3-none-any.whl (238 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m238.7/238.7 KB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dvc-http
  Downloading dv

Now, we can initialize dvc

In [80]:
!dvc init

Initialized DVC repository.

You can now commit the changes to git.

[31m+---------------------------------------------------------------------+
[0m[31m|[0m                                                                     [31m|[0m
[31m|[0m        DVC has enabled anonymous aggregate usage analytics.         [31m|[0m
[31m|[0m     Read the analytics documentation (and how to opt-out) here:     [31m|[0m
[31m|[0m             <[36mhttps://dvc.org/doc/user-guide/analytics[39m>              [31m|[0m
[31m|[0m                                                                     [31m|[0m
[31m+---------------------------------------------------------------------+
[0m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: <[36mhttps://dvc.org/doc[39m>
- Get help and share ideas: <[36mhttps://dvc.org/chat[39m>
- Star us on GitHub: <[36mhttps://github.com/iterative/dvc[39m>
[0m

Go ahead and run some dvc commands, they are mostly the same as git commands but start with dvc. Try a few on your own!

In [81]:
!dvc status

!If DVC froze, see `hardlink_lock` in <[36mhttps://man.dvc.org/config#core[39m>                                                                      There are no data or pipelines tracked in this project yet.
See <[36mhttps://dvc.org/doc/start[39m> to get started!
[0m

In [82]:
!dvc log

[31mERROR[39m: argument COMMAND: invalid choice: 'log' (choose from 'init', 'queue', 'get', 'get-url', 'destroy', 'add', 'remove', 'move', 'unprotect', 'run', 'repro', 'pull', 'push', 'fetch', 'status', 'gc', 'import', 'import-url', 'config', 'checkout', 'remote', 'cache', 'metrics', 'params', 'install', 'root', 'list', 'ls', 'list-url', 'ls-url', 'freeze', 'unfreeze', 'dag', 'daemon', 'commit', 'completion', 'diff', 'version', 'doctor', 'update', 'git-hook', 'plots', 'stage', 'experiments', 'exp', 'check-ignore', 'machine', 'data')
usage: dvc
       [-q | -v]
       [-h]
       [-V]
       [--cd <path>]
       COMMAND
       ...

Data Version Control

optional arguments:
  -q, --quiet
    Be quiet.
  -v, --verbose
    Be verbose.
  -h, --help
    Show this help message and exit.
  -V, --version
    Show program's version.
  --cd <path>
    Change to directory before executing.

Available Commands:
  COMMAND
    Use `dvc COMMAND --help` for command-specific help.
    init
    Initial