---
---
# Updating a repo's `.git/info/exclude` file to exclude large files
---

## Two possible ways, only one taken...

There are two way to exclude files (for whatever reasons) within a folder that a git repo:  
1. Use a `.gitignore` file
2. Use the `exclude` file in the `.git/info` folder

I have chosen the second option to implement the automatic exclusion of files with size greater than the GitHub limit (currently at 100MB).  
The reasons are logic and portability:  
* The .git folder is unique, at least under _normal usage_ of code versioning with git, and won't work in any other folder.  
* The .gitignore file can be copy in multiple subfolders within a repo, and amended differently therein; the desire outcome might work but it will depend on your keeping track of precedence hierarchy git uses to apply the exclusions. Here is a small portion of what the offical [Git docs say](https://git-scm.com/docs/gitignore):
>When deciding whether to ignore a path, Git normally checks gitignore patterns from multiple sources, with the following order of precedence, from highest to lowest (within one level of precedence, the last matching pattern decides the outcome):

> * Patterns read from the command line for those commands that support them.
> * Patterns read from a `.gitignore` file in the same directory as the path, or in any parent directory, with patterns in the higher level files (up to the toplevel of the work tree) being overridden by those in lower level files down to the directory containing the file. These patterns match relative to the location of the `.gitignore` file. A project normally includes such `.gitignore` files in its repository, containing patterns for files generated as part of the project build.
> * Patterns read from `$GIT_DIR/info/exclude`.
> * Patterns read from the file specified by the configuration variable `core.excludesFile`.

---
## Implementation:

I only use the `pathlib.Path` and `sys` libraries. The latter is used in a refinement of the update message if the code was run from a Jupyter notebook (lab or classic).

I have written 3 functions:

1. `find_bigfiles`(data_folder, gte_size=100, verbose=False)
```
    Return a list of file paths (truncated from the name part of data_folder.name)
    for all file exceeding gte_size MB.
```
2. `is_git_repo`(folder_path)
```
    Return True if folder contains a .git folder, else False.
```
3. `update_git_info_exclude`(top_folder_path, data_folder_name)
```
    Update the $GIT_DIR/info/exclude file with paths of 
    found big files, so that separate .gitignore file 
    stays generic (portable).
```

## Module: `git_info_exclude.py`

Although I am going to add the functions in my noetebook template, I've put them in a module, which also contains the following 'test' function to determine whether the python code is running in a notebook:
```
def test_ipkernel(verbose=False):
    found = 'ipykernel_launcher.py' in sys.argv[0]
    if verbose:
        which = 'IS' if found else 'IS NOT'
        msg = f'Code *{which}* running in Jupyter platform (notebook, lab, etc.)'       
        print(msg)
    return found
```
Note: I've only tested the function in JupyterLab and VS Code, it might not work in Spyder.

In [1]:
from pathlib import Path

import git_info_exclude

In [2]:
repo = Path.cwd()

git_info_exclude.update_git_info_exclude(repo, 'data')

Updated .git/info/exclude file.
Enter `%load .git/info/exclude` in a cell to verify.


Uncomment the next cell to verify the file:

In [None]:
#%load .git/info/exclude