# Python Environment for Data Science
> An oppinionated way to package Python projects properly with Poetry

- toc: true 
- badges: true
- comments: true
- categories: [python, shell, git, git-hooks, poetry, pyenv, black, mypy, flake8, isort]
- image: poetry_pyenv_imgs/precommit_pipeline.png
- permalink: /poetry/

# Python Environment

## Interpreter

If you have programs that require different python versions or programs that depend on different versions of the same third-party module and you want to switch between those programs seamlessly?

> Pyenv will help you doing that!
    
You can install Pyenv by

```shell
curl https://pyenv.run | bash
```

After that, add the following lines to your .bashrc (same for .zshrc) to have pyenv available in your terminal

```shell
export PATH="~/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
```

I had to restart my terminal afterwards

On Ubuntu install the following ones to not run into problems

```shell
sudo apt-get install build-essential libsqlite3-dev sqlite3 bzip2 libbz2-dev zlib1g-dev libssl-dev openssl libgdbm-dev libgdbm-compat-dev liblzma-dev libreadline-dev libncursesw5-dev libffi-dev uuid-dev
```

Now, to install a python interpreter just do

```shell
pyenv install VERSION_YOU_WOULD_LIKE_TO_INSTALL
```

You can list out all versions available via pyenv

```shell
pyenv install --list
```

To make it concrete, let’s install python 3.8.2 and make it your default global-interpreter

```shell
pyenv install 3.8.2
```

```shell
pyenv global 3.8.2
```

## Dependency Management via [poetry](https://python-poetry.org/)

The way the authors recommend installing poetry is

```shell
curl -sSL https://raw.githubusercontent.com/sdispater/poetry/master/get-poetry.py | python
```

Another way is using pip and pyenv-virtualenv.

Create a virtual environment called tools that is based on 3.8.2

```shell
pyenv virtualenv 3.8.2 tools
```

Install poetry into the tools virtual env

```shell
pyenv activate tools
```

```shell
python -m pip install poetry
```


Check installed poetry version

```shell
poetry --version
```

Leave the virtual env 

```shell
pyenv deactivate
```

Add your tools virtual env to the globally available ones

```shell
pyenv global 3.8.2 tools
```

I had to restart my terminal afterwards

You can start using poetry

```shell
poetry --version
```

Before using poetry I recommend configuring it, such that it creates your project’s virtual environment in a .venv folder inside the project directory.

```shell
poetry config virtualenvs.in-project true
```

Initialze a new project

```shell
poetry new ml-project
cd ml-project
```

Add modules and create virtual environment.

```shell
poetry add pandas --extras all
```

### Consistent Formatting and Readability

We add black as a development dependency with --dev as we don't
need it when it comes to production

```shell
poetry add --dev black
```

I’d rather maintain the recommended 79 character length. I just need to configure my *pyproject.toml* to *line-length=79* and everything is all set. Here’s my *.toml* file for configuring black:

```
[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
/(
    \.git
  | \.hg
  | \.mypy_cache
  | \.tox
  | \.venv
  | _build
  | buck-out
  | build
  | dist
)/
'''
```

```shell
poetry add --dev flake8
```

Insight of your project's dependencies

```shell
poetry show --tree
poetry show --latest
```


In order for black to work nicely with [flake8](https://flake8.pycqa.org/en/latest/), we need to list down some error codes to ignore. *tox.ini* configuration below:

```shell
touch tox.ini .
```

```
# Flake8 Configuration
[flake8]
ignore = E203, D203, E266, E501, W503, F403, F401
exclude =
    .tox,
    .git,
    __pycache__,
    docs/source/conf.py,
    build,
    dist,
    tests/fixtures/*,
    *.pyc,
    *.egg-info,
    .cache,
    .eggs
max-line-length = 79
max-complexity = 18
select = B,C,E,F,W,T4,B9
format = ${cyan}%(path)s${reset}:${yellow_bold}%(row)d${reset}:${green_bold}%(col)d${reset}: ${red_bold}%(code)s${reset} %(text)s
```

### [mypy](https://mypy.readthedocs.io/en/stable/) Type-Correctness

Through type annotations, your code becomes better to understand, maintain, and less prone to errors. Why less prone to errors? Because you can statically check if the types of your variables and functions match the expected ones.

```shell
poetry add --dev mypy
```

Running mypy might create a lot of errors. You can configure it to only warn you about the things you are interested in. You do that by adding a mypy.ini file to your project and refer to the [documentation](https://mypy.readthedocs.io/en/latest/config_file.html) for more details.

### [Isort](https://github.com/PyCQA/isort)

Isort is a Python utility / library to sort imports alphabetically, and automatically separated into sections and by type.

```shell
poetry add --dev isort
```

### [Pre-commit](https://pre-commit.com/)

Pre-commit is a tool that executes checks before you commit code to your repository. When those checks fail, your commit will be rejected. With that, your repository will never see not formatted code, or none type-checked one, or anything else depending on the checks you are going to include.

You can either install it directly into your project using [poetry](https://python-poetry.org/docs/) or install it on your local machine. I prefer the latter, as pre-commit is only used locally and not on a CI/CD server. In contrast, black and mypy should run on a CI/CD server, thus, it makes sense to add them to the project’s dev dependencies. Here is how one could install it making use of the already existing tool virtual environment.

Install pre-commit into the tools virtual env

```shell
pyenv activate tools
```

```shell
python -m pip install pre-commit
```

```shell
pyenv deactivate
```

```shell
pre-commit --version
```

To use it, you first need to add a config file called *.pre-commit-config.yaml* to the top-level folder of your project. In that file, you configure all the hooks that should run.

```shell
touch .pre-commit-config.yaml .
```

```
repos:
    -   repo: https://github.com/psf/black
        rev: 20.8b1
        hooks:
        - id: black
          language_version: python3.8
    -   repo: https://github.com/pre-commit/mirrors-mypy
        rev: v0.782
        hooks:
        - id: mypy
    -   repo: https://gitlab.com/pycqa/flake8
        rev: 3.8.3  # pick a git hash / tag to point to
        hooks:
        - id: flake8
    -   repo: https://github.com/pycqa/isort
        rev: 5.5.1
        hooks:
        - id: isort
```

In the top level folder run

```shell
pre-commit install
```

It is recommended to manually run pre-commit on all files as is it only touches the files that have been changed since the last commit.

```shell
pre-commit run --all --show-diff-on-failure
```

Now, the hooks will run on every commit. The black hook will not only check for formatting issues but also format the files accordingly. Whenever you add a new hook, so also at the very beginning, it is recommended to manually run pre-commit on all files as is it only touches the files that have been changed since the last commit

# Results

![](poetry_pyenv_imgs/precommit_pipeline.png)

So what we have is a pipeline that safeguards project against wrongly-formatted code &mdash; now we can focus on content.

**Credits to**:

- [Simon Hawe](https://towardsdatascience.com/how-to-setup-an-awesome-python-environment-for-data-science-or-anything-else-35d358cc95d5)
- [LJ MIRANDA](https://ljvmiranda921.github.io/notebook/2018/06/21/precommits-using-black-and-flake8/)