![IE](../img/ie.png)

# Sessions 1 & 2: git and `PYTHONPATH`

### Juan Luis Cano Rodríguez <jcano@faculty.ie.edu> - Master in Business Analytics and Big Data (2019-03-25)

## Introduction to git

git is a **version control system** that helps us track changes in our code (and actually any text files), allowing the user to go back in time at any previous state and compare two given states.

### References

* Pro Git https://git-scm.com/book/en/v2/
* Changing history, or How to Git pretty http://justinhileman.info/article/changing-history/

### Glossary

* **Repository**: Directory tracked by git, contains a `.git` folder and it's created by `$ git init`
* **Commit**: State or snapshot of the repository, they are created by `$ git commit`
* **Branch**: A parallel or separate line of development, the default one is `master` and they are created by `$ git branch` or `$ git checkout -b`

![Branches](https://git-scm.com/book/en/v2/images/advance-master.png)

### Linux command line 101

* `whoami` (who am I)
* `pwd` (print working directory)
* `ls` (list): display contents of current directory
  - `ls --color`: show color
  - `ls -a`: show all files, also hidden ones (those starting with `.`)
  - Two special directories: `.` (current) and `..` (parent)
* `touch`: create empty file
* `nano`: edit a file from the command line
  - Advanced alternative: `vim`
* `cat` (concatenate): print file contents

### Workflow

To be done only once: https://help.github.com/en/articles/setting-your-username-in-git#setting-your-git-username-for-every-repository-on-your-computer

1. Create a directory `$ mkdir test_project` and navigate there `$ cd test_project`
2. Init a git repository `$ git init`
3. Check status `$ git status` ("on branch master, no commits yet, nothing to commit")
4. Create some files `$ nano README.txt`
5. Stage the files `$ git add README.txt`
6. Commit the changes `$ git commit -m "First commit"`

Summary:

![Workflow](https://git-scm.com/book/en/v2/images/lifecycle.png)

### Branching

1. Create **and** checkout to new branch `$ git checkout -b new-branch`
2. Commit there (see above)
3. Go back to main branch `$ git checkout master`
4. Merge changes `$ git merge new-branch`
5. Delete branch `$ git branch -d new-branch` (don't forget this step!)

Normally, the `git merge` step happens online using [pull requests](https://help.github.com/en/articles/about-pull-requests) or [merge requests](https://docs.gitlab.com/ee/user/project/merge_requests/index.html), which are **not** git concepts, but GitHub/GitLab concepts.

### Merging

Two types of git merging:

* **Fast-forward merge**: There is no diverging history, and git just "advances the pointer" of the current branch
  - `$ git merge new-branch --ff-only` will fail if a fast-forward merge is not possible
* **Non fast-forward merge**: The history diverged, and git will create a merge commit (hence ask for a commit message) with two parents that combines the two branches
  - `$ git merge new-branch --no-ff` always creates a merge commit even if a fast-forward merge is possible

Non fast-forward merges can end up in conflicts. In that case, git will halt the merge operation and leave traces in the affected files like this:

```
$ cat README.txt
If you have questions, please
<<<<<<< HEAD
open an issue
=======
ask your question in IRC.
>>>>>>> branch-a
```

* To abort a merge `$ git merge --abort` (useful if we are scared and don't know what to do)
* To merge overriding everything with the upcoming branch `$ git merge new-branch --strategy-option theirs`
* To merge overriding everything with the current branch `git merge new-branch --strategy-option ours`

**Be careful** while editing files that are in conflict. [Use your editor](https://www.jetbrains.com/help/pycharm/resolving-conflicts.html).

### Other

* Ignoring files: `$ nano .gitignore` (this file has to be committed to the repository as well), better to use https://www.gitignore.io/
* Amend the last commit: `$ git commit --amend` (for more information, check out the flow chart below)
* Show pretty history: `$ git log --graph --oneline --decorate --all`
* Configuring git aliases: `$ git config --global alias.lg "log --graph --oneline --decorate"` (and now you have `$ git lg`!)

![git flow chart](http://justinhileman.info/article/git-pretty/git-pretty.png)

## Python execution model

### Importing scripts

Python code is normally written in `.py` scripts. For example:

```
$ cat model.py
print("Hello, world!")
```

These scripts can be imported in the same way that any model or package from the [standard library](https://docs.python.org/3/library/index.html) can:

```
$ python3
>>> import math  # Works, because it's in stdlib
>>> import numpy as np  # Works if you `pip install numpy`'ed in advance
>>> import model  # Works if you are in the same directory
Hello, world!
>>> 
```

When the user imports a script, **python runs the script**. That's the way all the possible functions and classes inside it are available.

### The `PYTHONPATH`

However, importing our code only works from the same directory:

```
$ ls
model.py README.txt
$ cd ..
$ ls
test_project
$ python3
>>> import math  # Still works
>>> import model
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'model'
```

Why? Python looks in some predefined locations to know where to find what we want to import, called the "PATH":

```
>>> import sys
>>> sys.path
['', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages']
```

Therefore, there are two ways of making our code **globally importable**:

1. Modify the "PATH"
2. Put our code inside a location predefined in the "PATH"

The first option can be achieved like this:

```
>>> sys.path.insert(0, "/home/juanlu/test_project")
>>> import model  # Works!
Hello, world!
>>>
```

Or, alternatively, from outside of the interpreter:

```
$ export PYTHONPATH=/home/juanlu/test_project
$ python3
>>> import sys
>>> sys.path  # Notice the change!
['', '/home/juanlu/test_project', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/dist-packages', '/usr/lib/python3/dist-packages']
>>> import model  # Now it works!
>>>
```

However, **both are bad practices and should be avoided**. In future sessions we will see [the right way to distribute Python code](https://packaging.python.org/tutorials/packaging-projects/).