# Python Formatters

Python has a number of formatters which can be used to improve code quality. Popular formatters are:

* autopep8
* isort
* black
* ruff

## Categorize Identifiers

This notebook will use the following functions ```dir2```, ```variables``` and ```view``` in the custom module ```categorize_identifiers``` which is found in the same directory as this notebook file. ```dir2``` is a variant of ```dir``` that groups identifiers into a ```dict``` under categories and ```variables``` is an IPython based a variable inspector. ```view``` is used to view a ```Collection``` in more detail:

In [1]:
from categorize_identifiers import dir2, variables, view

## Poorly Formatted Python Code:

The following Python file will be written, notice that the spacing is quite sloppy, mimicing code written by someone new to Python:

In [2]:
%%writefile script.py
var1= 'Hello'
var2 ="World"
import numpy as np
x=np.array([0,1,2,3,4])
y=np.array([0,2,4, 6 ,8])
import pandas as pd
df=pd.DataFrame({'x':x,"y":y})
import datetime
now=datetime.datetime(year = 2023,month=12 ,day=1)
hour=datetime.timedelta(hours=1)
import collections
counts=collections.Counter([1, 2,2 ,2,3,3])
import itertools
cycle=itertools.cycle([1,2,3])
import sys, os
sys.getsizeof(cycle)
os.environ['USERPROFILE']
num1 = 0xabb4ab8a
import string

Writing script.py


The script file however executes without problem:

In [3]:
%run script.py

The variables are created as expected:

In [4]:
variables()

Unnamed: 0_level_0,Type,Size/Shape,Value
Instance Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
var1,str,5,Hello
var2,str,5,World
x,ndarray,"(5,)","[0, 1, 2, 3, 4]"
y,ndarray,"(5,)","[0, 2, 4, 6, 8]"
df,DataFrame,"(5, 2)","[x, y]"
now,datetime,,2023-12-01 00:00:00
hour,timedelta,,1:00:00
counts,Counter,3,"Counter({2: 3, 3: 2, 1: 1})"
num1,int,,2880744330


## AutoPEP8 Formatter

The [PEP8 Python Style Guide](https://peps.python.org/pep-0008/) makes a number of recommendations for formatting Python code to make it more readible.

The autopep8 formatter automatically adjust codes to meet this requirement:

* Moves all imports to the top of the script file grouping:
    * grouping Python standard modules
    * grouping Python third-party modules
* Uses spacing 
    * to emphasis the delimiter in a collection or a function call 
    * to emphasise an operator outwith a function call


In [5]:
!autopep8 script.py

import os
import sys
import string
import itertools
import collections
import datetime
import pandas as pd
import numpy as np
var1 = 'Hello'
var2 = "World"
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
df = pd.DataFrame({'x': x, "y": y})
now = datetime.datetime(year=2023, month=12, day=1)
hour = datetime.timedelta(hours=1)
counts = collections.Counter([1, 2, 2, 2, 3, 3])
cycle = itertools.cycle([1, 2, 3])
sys.getsizeof(cycle)
os.environ['USERPROFILE']
num1 = 0xabb4ab8a


The changes aren't made in place and the inplace flag ```-i``` can be used:

In [6]:
!autopep8 -i script.py

The changes can be viewed using, the line magic ```%load```. In this case:

```python
%load script.py
```

Note the output will display with the line magic commented out when the script file is loaded. The old output from this command is not deleted when the kernel is restarted. If going through this notebook cell by cell, the contents in the cell below can be manually deleted and the ipython magic reused to generate the current contents of the script file.

In [7]:
# %load script.py
import os
import sys
import string
import itertools
import collections
import datetime
import pandas as pd
import numpy as np
var1 = 'Hello'
var2 = "World"
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
df = pd.DataFrame({'x': x, "y": y})
now = datetime.datetime(year=2023, month=12, day=1)
hour = datetime.timedelta(hours=1)
counts = collections.Counter([1, 2, 2, 2, 3, 3])
cycle = itertools.cycle([1, 2, 3])
sys.getsizeof(cycle)
os.environ['USERPROFILE']
num1 = 0xabb4ab8a


## Import Sort Formatter

The PEP8 Python style guide recommends grouping modules by standard modules and third-party modules respectively however does not otherwise state what order to group the modules in. It is quite common to group each category alphabetically and this can be done with the import sort isort:

In [8]:
!isort script.py

Fixing C:\Users\phili\OneDrive\Documents\GitHub\python-notebooks\formatters\script.py


This occurs inplace by default. The changes can be viewed using:

In [9]:
# %load script.py
import collections
import datetime
import itertools
import os
import string
import sys

import numpy as np
import pandas as pd

var1 = 'Hello'
var2 = "World"
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
df = pd.DataFrame({'x': x, "y": y})
now = datetime.datetime(year=2023, month=12, day=1)
hour = datetime.timedelta(hours=1)
counts = collections.Counter([1, 2, 2, 2, 3, 3])
cycle = itertools.cycle([1, 2, 3])
sys.getsizeof(cycle)
os.environ['USERPROFILE']
num1 = 0xabb4ab8a


Note that isort does not work well unless autopep8 has previously been run on the script.

## Black Formatter

The PEP8 Python style guide does not recommend use of a quotation style in Python and thus var1 and var2 are inconsistent using single and double quotes respectively. The keys in df are also inconsistent. The opinionated formatter black can be used:

In [10]:
!black script.py

reformatted script.py

All done! \u2728 \U0001f370 \u2728
1 file reformatted.


Note that changes are made inplace. The changes can be viewed using:

In [11]:
# %load script.py
import collections
import datetime
import itertools
import os
import string
import sys

import numpy as np
import pandas as pd

var1 = "Hello"
var2 = "World"
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
df = pd.DataFrame({"x": x, "y": y})
now = datetime.datetime(year=2023, month=12, day=1)
hour = datetime.timedelta(hours=1)
counts = collections.Counter([1, 2, 2, 2, 3, 3])
cycle = itertools.cycle([1, 2, 3])
sys.getsizeof(cycle)
os.environ["USERPROFILE"]
num1 = 0xABB4AB8A


Note that black does not work well unless autopep8 and isort have been previously used.

Unfortunately the opinionated choices enforced by black are inconsistent with the Python interpretter itself which preferences single-quotations:

In [12]:
"hello"

'hello'

Unless a string literal is included:

In [13]:
"The string is 'hello'"

"The string is 'hello'"

## Ruff Formatter

Note Rust Fast Formatter Ruff is not preinstalled by Anaconda and is only available when using a custom Python environment containing packages from the ```conda-forge``` channel. On Windows the Windows Terminal needs to be initialised, in order to find the ruff command. For more details see the earlier tutorial on installing Anconda and creating a Python environment.

Ruff formats similarly to black by default but can easily be configured. If ruff is installed in the Python environment, the following command can be used:

In [14]:
!ruff format script.py

1 file left unchanged


Note that changes are made inplace (no changes are made as it formats identically to black by default). The changes can be viewed using:

In [15]:
# %load script.py
import collections
import datetime
import itertools
import os
import string
import sys

import numpy as np
import pandas as pd

var1 = "Hello"
var2 = "World"
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
df = pd.DataFrame({"x": x, "y": y})
now = datetime.datetime(year=2023, month=12, day=1)
hour = datetime.timedelta(hours=1)
counts = collections.Counter([1, 2, 2, 2, 3, 3])
cycle = itertools.cycle([1, 2, 3])
sys.getsizeof(cycle)
os.environ["USERPROFILE"]
num1 = 0xABB4AB8A


Note that ruff does not work well unless autopep8 and isort have been previously used. It is independent of black.

## Ruff Configuration File

A ruff.toml file can be created, which specifies a single quotes option:

In [16]:
%%writefile ruff.toml
[format]
# Use single quotes for strings.
quote-style = "single"

Writing ruff.toml


Now when ruff is run on the script:

In [17]:
!ruff format script.py --config ruff.toml

1 file reformatted


The formatting will favour single quotations consistent to the IPython kernel:

In [18]:
# %load script.py
import collections
import datetime
import itertools
import os
import string
import sys

import numpy as np
import pandas as pd

var1 = 'Hello'
var2 = 'World'
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
df = pd.DataFrame({'x': x, 'y': y})
now = datetime.datetime(year=2023, month=12, day=1)
hour = datetime.timedelta(hours=1)
counts = collections.Counter([1, 2, 2, 2, 3, 3])
cycle = itertools.cycle([1, 2, 3])
sys.getsizeof(cycle)
os.environ['USERPROFILE']
num1 = 0xABB4AB8A


Ruff can also be used to check the file for problems using the linters:

In [19]:
!ruff check script.py

[1mscript.py[0m[36m:[0m5[36m:[0m8[36m:[0m [1;31mF401[0m [[36m*[0m] `string` imported but unused
Found 1 error.
[[36m*[0m] 1 fixable with the `--fix` option.


Since this option is fixable it can be fixed using:

In [20]:
!ruff check script.py --fix

Found 1 error (1 fixed, 0 remaining).


The changes can be seen using, notice the unused import is removed:

In [21]:
# %load script.py
import collections
import datetime
import itertools
import os
import sys

import numpy as np
import pandas as pd

var1 = 'Hello'
var2 = 'World'
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
df = pd.DataFrame({'x': x, 'y': y})
now = datetime.datetime(year=2023, month=12, day=1)
hour = datetime.timedelta(hours=1)
counts = collections.Counter([1, 2, 2, 2, 3, 3])
cycle = itertools.cycle([1, 2, 3])
sys.getsizeof(cycle)
os.environ['USERPROFILE']
num1 = 0xABB4AB8A


The ```script.py``` and ```ruff.toml``` file will be deleted so the notebook can be rerun:

In [22]:
!del script.py
!del ruff.toml

[Return to Python Tutorials](../readme.md)