This guide covers code style and best practices, when writing Python at GA. It is meant as a reference for writing and reviewing Python code.
Although specific rules regarding style and best practices are usually based on sound arguments, they will often, to some degree, be subjective and arbitrary. So to avoid too much bikeshedding and pointless discussion, this guide is based on a small set of other guides that are considered to have some authority in the wider Python community:
Each chapter takes the form of a list of rules, followed by subsections of clarifications for each rule.
Finally, there are sometimes good reasons for breaking the rules, so use your best judgement.
-
PEP8 is law, never break PEP8. The only exceptions are:
- lines are allowed to be up to 99 characters.
- functions with long argument list should prefer Black style
# PEP8 recommendation def long_function_name( var_one, var_two, var_three, var_four): print(var_one) # Black style formatting def long_function_name( var_one, var_two, var_three, var_four ): print(var_one)
-
Use a linter like
flake8
to make sure that your code follows PEP8. -
Consider running
flake8
as a Github CI check. -
Decide on, and be consistent about, style issues not covered by PEP8, like:
- Usage of
"
vs'
for string literals - multi-line function definitions / calls
- Usage of
-
Make names descriptive, but concise.
-
Symbols that are internal to a module or class, should start with
_
. -
Avoid redundant naming, e.g.
models.UserModel
. -
Use new-style format strings, don't use old style
%
format strings or string concatenation with+
. -
Public functions must be documented with docstrings
-
Keep comments to a minimum, generally only needed when they are necessary for understanding the code.
-
Never commit commented-out code. Just delete it, Git remembers.
-
Remove dead code as early as possible. Don't leave it around just in case we need it, Git remembers.
-
Follow the boy scout rule, leave code better than you found it.
PEP8 forms the basis of this style guide.
Familiarize yourself with the basic rules of PEP8, but don't memorize it.
Instead, use a linter like flake8
in your editor, to make sure that your code is compliant.
Although PEP8 says to keep lines to 79 columns, this seems to be somewhat outdated today. The primary motivation for limiting line length is the ability to have 2 editor windows next to each other without wrapping lines. Since most people use 16:9 monitors today, a max line length of 99 should be fine.
Running a linter in you CI chain (Travis etc.) makes reviews a lot quicker, as the reviewer can immediately see if the code is PEP8 compliant.
PEP8 leaves certain choices open, such as how to break function declarations across multiple lines
and whether to use '
or "
for string literals. Decide what to do on a per-repository basis, but be consistent within each repository.
Keep names as short as possible, while still being descriptive. Avoid abbreviations for anything which is not already an acronym (e.g. HTTP).
Certain abbreviations are well known and widely used within the Python community, e.g. np
for Numpy, dt
for datetime. These are okay to use.
Also, resist the temptation to put the type of the variable into the name, e.g. user_info_dict
, permission_list
, etc.
Functions should have names that clearly describe what they do. If you find yourself struggling to come up with a descriptive name for a function, it might be that the function is trying to do too much. If so, split the function up. Functions should do only 1 thing.
Example of appropriate naming, for a variable holding the number of days between 2 events:
# Too long
number_of_days_between_events = get_dt_between_events().days
# Not descriptive
dbe = get_dt_between_events().days
# Okay
days_between_events = get_dt_between_events().days
Avoid redundant naming.
Redundant naming is when the name of the type or module that a symbol belongs to, is included in the name of the symbol.
For example, if we have models
module, with a User model inside, redundant naming would be calling the user model UserModel
.
Instead, it should simply be called User
. Capitalization indicates that User
is a class, it's location (inside models),
indicate that it's a model.
In cases where this would cause ambiguity or uncertainty when reading the code, try to avoid importing the symbols directly, for example, if we want the User
model from the models submodule of the database package:
from database.models import User
# May not be clear what User is, in this context
my_user = User()
from database import models
# Less ambiguous, we can clearly see where User comes from
my_user = models.User()
The idea is that with models.User()
it's easier to understand what User
is in this context.
When writing docstrings, use one of the existing standards for docstrings, like Numpy docstrings or whatever your automatic documentation generator expects.
Comments are okay, when you:
- Copied some strange logic fom StackOverflow, that you don't quite understand (Link to SO page).
- You just have to do some kind of sick hack as a workaround, for performance, etc. (famous example from Quake 3)
- There is just no way to write the code in a clear and understandable form.
It should be obvious that the situations above should be rare and if you feel the urge to write a comment, always consider if the code could be written in a cleaner, more easily understandable form. Comments also have a tendency to become obsolete or incorrect with time, in which case they are misleading at best. Example of bad comments:
# Get all frobs for users <-- Redundant, stating the obvious
user_frobs = get_user_frobs(user_id)
# Request contains user_id, access_level and name <-- Will become outdated/misleading
request = get_request()
Don't ever commit commented-out code. It creates a lot of noise in the code and it tends to just sit around forever. Likewise, projects have a tendency to accumulate dead code with time. As soon as you identify a piece of code as being dead, delete it. If you remove a call to a piece of code (function, class, etc.) and that code is not used anywhere else, just remove it.
- Don't write Python 2, unless you are changing legacy code.
- Use Python's exceptions for error handling, don't hesitate to write your own exceptions if needed.
- Never return
None
as an indication of error. Raise an exception instead. - Consider using the typing system in Python 3, check types in your CI chain.
- But don't constrain functions unnecessarily, use abstract types, use duck-typing as an advantage, not a liability.
- Python has a rich standard library, you are highly encouraged to use it whenever possible.
- Don't be afraid to use abstractions like list comprehensions, generators, iterators, context managers etc. where appropriate.
- Don't use mutable types as default arguments.
- Prefer pure functions over object-oriented programming, but use OOP when appropriate.
- When writing classes, don't create getters and setters unless necessary, just access the attribute directly.
Python's error handling is built around exceptions. Use them.
Raise exceptions in your code when encountering an invalid state with no clear way to recover from, e.g. when a caller tries to use a function in a way that is not well-defined
def get_persons_by_age(age):
if age < 0:
raise ValueError('age cannot be negative')
...
But in general, you should never check the type of arguments passed to the function. Python's type system is built around the concept of duck-typing. Meaning that as long as an object behaves like it should, i.e. defines the needed interface, it doesn't matter what the type actually is. Checking the type would then limit the generalizability of the function.
As an example, consider a function that needs to iterate over a collection of objects and do something for each of them. This function could take any type as argument, as long as the type is an iterable of some sort. Implementing a check to see if the type is a list would be pointless and actually harmful.
If a caller passes an incompatible object as an argument, the function should eventually fail, often in way where it's not obvious that it was due to an incompatible argument. That is unfortunately the price we pay for having duck-typing.
When running code that might raise exceptions, we should catch exceptions that are expected and that we know how to handle in a good way
def load_config(conf_path):
try:
conf_file = open(conf_path)
except IOError:
print('Could not load config, loading default config')
conf_file = open(DEFAULT_CONFIG_PATH)
...
You should however, only catch exceptions that you know how to handle. If an exception would be unrecoverable, or if it would be an indication of a bug in your code, there is no point in catching it. For this reason, you should only catch specific exceptions. Don't ever do this
try:
user = get_user(user_id)
except Exception:
# Oh no, something bad happened
...
This would catch any exception, even ones caused by bugs in our own code. Even exceptions that we may not even know existed and which we have no way of recovering from.
This is even worse
try:
user = get_user(user_id)
except:
# Oh no, something bad happened
...
As it will even catch exceptions that are not even errors, like KeyboardInterrupt
(user presses ctrl+c).
The only possible cases where catching Exception
is okay, is when we need to log that an exception happened,
but in these cases, the exception should be re-raised immediately
try:
user = get_user(user_id)
except Exception as e:
logging.error(e)
raise
A bare except:
might be okay if we are writing an application and it needs to clean something up before exiting.
Returning None
to indicate an error is never acceptable, as it is now the callers responsibility to check if an error happened.
They most likely won't do that, so now the error just propagates onwards, away from where it actually happened.
None
also doesn't tell us or the user anything about what actually went wrong.
Note however that there can be legitimate reasons to return None
from a function.
If a function would normally return some scalar value, which might not exist and if the non-existence of that value is not an error,
the function may return None
. The documentation of the function should state that it might return None
.
Pythons built-in exceptions are often enough, but sometimes it can be useful for us to define our own exceptions.
One example where this could be useful, is when we want to pass low-level errors up to a higher layer of abstraction. Maybe we are writing a client library for a database and we don't want to expose the library users to database exceptions.
import sqlalchemy.orm.exc as exc
import db
import models
class InvalidUser(Exception):
pass
def get_user(user_id):
try:
user = db.session.query(models.User).filter(models.User.uid == user_id).one()
except exc.NoResultFound:
raise InvalidUser(f'user with id {user_id} does not exist')
Python is a duck-typed language, meaning that any object can be used anywhere, as long as it behaves as it should, that is, as long as it exposes the right methods. Consider the following function
def find_squanched_frob(frobs):
for frob in frobs:
if frob.is_squanched():
return frob
return None
This function works as long as frobs
is an iterable, it doesn't matter if it's a list, dict, generator, etc.
One downside of duck-typing is that many errors that could have been prevented, by having a static type system,
will only reveal themselves at runtime. For example, passing a non-iterable to find_squanched_frobs
by mistake
squanched_frob = find_squanched_frobs(5) # Will raise TypeError
Of course, if the code was well covered with tests, we would find this error before deploying,
but in Python 3, it is also possible to use type hints from the typing
module.
Note that these types are not checked as part of running the code, neither when parsing or at runtime.
So when using types, it is necessary to run a static type checker like mypy
.
This can be run in the editor when linting and as part of the CI chain.
When using type hints, use generic types whenever possible, to retain duck-typing.
from typing import Iterable, List
# Bad
def find_squanched_frob(frobs: List[Frob]) -> Frob:
...
# Good
def find_squanched_frob(frobs: Iterable[Frob]) -> Frob:
...
Always consider if your code could be made more generic.
For example, maybe there are other things than a Frob
that can be squanched.
Perhaps Frob
inherits from the Squanchable
base class or maybe it implements the Squanchable
protocol.
def find_squanched_things(squanchables: Iterable[Squanchable]) -> Squanchable:
...
Whether or not to use type hints should be considered on a per-repository basis. However, you should probably use type hints if:
- You are writing a library/package that others may import and use in their code.
- You are writing code that will run in production.
- You are writing code for which either of the above might be true, at some point in the future.
Finally, there are a few specific rules about typing:
- Parameters that might legally be
None
, should be typed asOptional[]
- Constants should be typed as
Final[]
- Don't use the
Any
type, it defeats the purpose of using types.
Python contains a lot of high-level abstractions that are often considered advanced features of the language. These include things like decorators, context managers, generators, list comprehensions, etc.
Don't be afraid to use these features because you think they might make the code hard to read or too complex. There is of course a right time and place for all of these, but when used right, they often make the code easier to read and less complex.
For example, for building up a list
# Okay
user_emails = []
for user in users:
user_emails.append(user.email)
# Better
user_emails = [user.email for user in users]
The list comprehension immediately expresses the semantics of what is going on, we are building a list. The for-loop does not, as it can be used for many things, not just building lists.
When code is simpler, it is also often less error prone. It's impossible to forget closing a file, with a context manager, for example
# Bad
f = open('studios.csv')
studios_data = f.read()
f.close()
# Good
with open('studios.csv') as f:
studios_data = f.read()
Don't be shy about implementing your own context manager, decorator etc. when appropriate. For example, decorators are very useful for validating inputs.
Finally, Python has a very large and well-documented standard library. If you are finding yourself writing something generic like a config-file parser or secure hashing algorithm (okay, don't ever do that yourself), you should take a look in the standard library, there is probably a module for it.
An important lesson from functional programming, is that mutable state should be avoided. It makes programs more complex, harder to reason about and more difficult to change, without introducing bugs.
In general we should use pure functions when we are implementing the logic of our codebase. A pure function is one that doesn't have any side-effects and is deterministic, meaning it doesn't change any external objects or variables and it always returns the same output, when given the same inputs.
frobs = get_frobs()
# Mutates external state
def squanch_frobs(frobs):
for frob in frobs:
frob.squanch()
# Mutates external state and doesn't say which state
def squanch_frobs():
for frob in frobs:
frob.squanch()
# Pure - doesn't mutate external state
def squanch_frobs(frobs):
return [frob.squanch() for frob in frobs]
As with most other things in this guide, there are exceptions to the rule. Building a new list is of course more expensive than mutating the items in the existing list, so when operating with large amounts of data or if performance is very critical, it is fine to do like this
def squanch_frobs(frobs):
for frob in frobs:
frob.squanch()
return frobs
frobs = get_frobs()
frobs = squanch_frobs(frobs)
When we actually need to do side-effects, like write a file, send an HTTP request etc. We should make sure that our function does just that. In general, functions should do only one thing
# Bad
def calculate_results_and_write_to_file(studio_data, output_path):
# Do some computation here
result_data = []
for row in studio_data:
new_row = ...
result_data.append(new_row)
# Write to file
with open(output_path, 'w') as f:
f.write(result_data)
# Good
def calculate_results(studio_data):
result_data = []
for row in studio_data:
new_row = ...
result_data.append(new_row)
return result_data
def write_results(result_data, output_path):
with open(output_path, 'w') as f:
f.write(result_data)
Python is a multi-paradigm language and has support for all the common object-oriented concepts like classes, inheritance, polymorphism, etc. But there is often no need to use OOP. We can often get by with just a set of pure functions arranged in modules.
That doesn't mean that OOP is not a useful tool. Classes can be incredibly useful for
- Storing related data together, e.g. a User class with name, email, user_id, etc. Check out dataclasses
- Encapsulating concepts that are inherently stateful, like database connections, UI elements, data structures.
- Inheritance can be useful for Polymorphism. This works especially well with the typing system.
As an example of 3.
, imagine that you are writing a library that needs to be able to interact
with 2 kinds of data backends, Druid and SQL.
You could then write an abstract DataBackend class, defining the methods that all data
backends should implement.
Then write 2 classes, DruidBackend and SQLBackend, that inherit from this class,
and use DataBackend as a type for any function that needs to be able to interact with both.
A protocol may sometimes be a better fit.
The Pythonic way of accessing class attributes in general, is not to use getters and setters.
Instead, the user simply accesses the attribute directly. Any attribute which is considered internal
to the class, should be prefixed with _
.
In case it is necessary to run code whenever the user sets or gets an attribute,
use the @property
decorator.
Note that if you start out by using a simple attribute, you can later change it to a @property
decorated getter method
without changing the class interface at all. The users of the class will be none the wiser.
It is also considered good practice to use the @classmethod
decorator for factory methods and @staticmethod
for any static methods.
When writing functions with default arguments, don't use mutable default values like
def squanch_frobs(frobs, excluded_frobs=[]):
...
Intuitively, one would think that the list assigned to excluded_frobs
would be re-initialized as
empty, everytime the function is called. But it is actually constructed once, at import time, so
instead becomes a form of mutable global state. If we defined squanch_frobs
like this
def squanch_frobs(frobs, excluded_frobs=[]):
excluded_frobs.append(1)
print(excluded_frobs)
Then called it multiple times, we would get
>>> squanch_frobs(my_frobs)
[1]
>>> squanch_frobs(my_frobs)
[1, 1]
When you don't need the argument to be mutable inside the function, you could simply use a non-mutable alternative, like a tuple instead of a list.
If you do need the argument to be mutable, the Pythonic way of handling it would be to set it to None
and then initialize in the function body
def squanch_frobs(frobs, excluded_frobs=None):
if excluded_frobs is None:
excluded_frobs = []
...
Proper test coverage is always important, more so in dynamically typed languages like Python, where even trivial type errors may not be discovered until runtime. In general, if you are writing a library or writing code that will run in production or a live environment, you must have proper test coverage.
There are many different test philosophies and methodologies, but one of the most popular testing methods is unit testing. Good unit tests have the following properties:
- Small. Test only 1 thing per test.
- Thorough. Make sure all branches are covered.
- Independent. Don't rely on the existence or execution order of other tests.
- Fast. If your test relies on potentially large input data, try make a small, but realistic sample of data for the test.
There are many different testing frameworks for Python. The most popular one appears to be pytest.
When dealing with external entities and services, like an SQL database, or AWS, it is usually a good idea to mock these entities. Many services even have their own Python libraries, specifically for mocking tests. For example, the popular boto3 library for interacting with AWS, has a mock version called moto3.
It is the responsibility of the developer implementing a feature to make sure that new and changed code is adequately covered with tests. Remember to take tests into account when estimating time for a project.
It is highly recommended to integrate services like codecov or coveralls into your CI chain. A per-repository policy, requiring PRs to have positive coverage diff, can also be a good idea.
Python installations are system-wide, by default. The same goes for Python packages, installed with pip install
.
Since different projects might need different versions of the same package, or even different versions of Python,
it is necessary to separate projects into virtual environments.
Many different virtual env tools exist for Python. Python has a built-in tool called venv, but this tool is very low-level, so it might be a good idea to install a more high-level tool, like virtualenv.
Somewhat recently, a couple of new tools called Poetry and Pipenv have appeared. Both of these tools are intended to replace Pip and offer both virtual environment support and dependency management.
One benefit of using these over pip, is that they promise to make deterministic environments, something that pip unfortunately cannot do. This means that you can install dependencies in your dev environment, test that everything works, deploy to production and be certain that the exact same versions of your dependencies will be installed there, even without pinning the version numbers.
Both offer more or less the same features, except that Poetry makes packaging considerably easier, as it needs only the pyproject.toml
file,
as discussed in the next section.
You should always consider whether your project should be an installable Python package.
If you are writing a library that needs to be imported into other Python projects, your project needs to be a package.
There are other good reasons to package your project. For example, if your project comes with one or more CLI tools,
users could simply install your package, like pip install flask
and immediately be able to run your CLI in the shell
~$ flask --help
Usage: flask [OPTIONS] COMMAND [ARGS]...
This shell command acts as general utility script for Flask applications.
It loads the application configured (through the FLASK_APP environment
variable) and then provides commands either provided by the application or
Flask itself.
If you are building a service or web-app, that is going to be deployed to some infrastructure, you typically don't need to make a package out of your code. But it might still be useful for versioning or deployment considerations.
When developing on a package, it's a good idea to install the package as editable. This tells your package manager that code should be runned or imported directly from the source folder rather than from installed packages. This way you won't have to re-install the package everytime you want to test some changes to the code. Root packages are installed as editable by default in Poetry, here is how to do it in Pipenv and Pip.
Unfortunately, packaging in Python has always been rather messy and lacking in standardization. The recommended recipe has, until recently, been to use setuptools and write a setup.py file.
Recently however, an attempt to standardize packaging has been made, with PEP 517
and PEP 518.
This enables package authors to specify what tools are required to build their package, through the pyproject.toml
file,
without having to implicitly require that the user installs a specific build system, like setuptools.
Python packaging is a very active area of development in the Python community and you are strongly
encouraged to use the latest accepted standards which, at the time of writing, is using pyproject.toml
.
Poetry is it's own build system, so only needs the pyproject.toml
file in order to build both wheels and source distributions.
If you don't use Poetry, you are free to use any other build system, but most users will probably want to use setuptools.
This article contains an explanation of how to use the pyproject.toml
file with setuptools.
Once a package has been built into a wheel, this artifact can then be distributed and installed anywhere, by any package manager.
If you are building a package, especially if you are building a library, you need to version the package. You are free to use any versioning scheme, as long as you are consistent about what the version numbers mean.
A popular versioning standard is semantic versioning or SemVer.
Every version is denoted by a triplet MAJOR.MINOR.PATCH
.
Increment PATCH
when making backwards compatible bug-fixes.
Increment MINOR
when adding functionality that is backwards compatible.
Increment MAJOR
when adding functionality that is not backwards compatible.
A MAJOR
version of 0, indicates that the package is still under development and that anything may change at any time.