Skip to content

Latest commit

 

History

History
299 lines (234 loc) · 10.2 KB

why_you_want_formal_dependency_injection_in_python_too.md

File metadata and controls

299 lines (234 loc) · 10.2 KB

Why you want formal dependency injection in Python too

In other languages, e.g., Java, explicit dependency injection is part of daily business. Python projects however very rarely make use of this technique. I'd like to make a case for why it might be useful to rethink this approach.

Let's say you have a class implementing some business logic, which depends on external data:

# customer.py
from typing import NamedTuple

class Customer(NamedTuple):
    name: str
    value_in_dollars: int
# domain.py
from customer import Customer

class DomainLogic:
    def get_all_customers(self):
        # Imagine some database query here.
        return [Customer('SmallCorp', 1000),
                Customer('MegaCorp', 1000000)]

    def most_valuable_customer(self):
        return max(self.get_all_customers(),
                   key=lambda customer: customer.value_in_dollars)

The function for retrieving the data (get_all_customers) might need some state, e.g., a database connection.

Since you apply the single-responsibility principle, you don't want to manage the connection in DomainLogic. Instead, you'll use an additional data access object, encapsulating it:

# database_connection.py
from typing import List

from customer import Customer

class Connection:
    def __init__(self) -> None:
        # URL, credentials etc. might be retrieved from some config here first.
        print('Connecting to database.')

    def get_all_customers(self) -> List[Customer]:
        # Imagine some database query here.
        return [Customer('SmallCorp', 1000),
                Customer('MegaCorp', 1000000)]
# domain.py
from customer import Customer
from database_connection import Connection

class DomainLogic:
    def __init__(self) -> None:
        self.data_access = DomainLogic.init_data_access()

    @staticmethod
    def init_data_access() -> Connection:
        return Connection()

    def most_valuable_customer(self) -> Customer:
        return max(self.data_access.get_all_customers(),
                   key=lambda customer: customer.value_in_dollars)

Now you need to unit test the complex implementation of your business logic. For this, you'll want to replace data_access with some mock. In Python, there are multiple options to do so. A simple and common one, using monkey patching, is to replace the DomainLogic.init_data_access with something that returns a mock-data source:

# test.py
import unittest
from typing import List

from customer import Customer
from database_connection import Connection
from domain import DomainLogic

class MockConnection(Connection):
    def __init__(self):
        self.customers = [Customer('SmallTestCorp', 100),
                          Customer('BigTestCorp', 200)]

    def get_all_customers(self) -> List[Customer]:
        return self.customers

def create_mock_connection() -> Connection:
    return MockConnection()

class DomainLogicTest(unittest.TestCase):
    def test_most_valuable_customer(self) -> None:
        setattr(DomainLogic, 'init_data_access', create_mock_connection)
        domain_logic = DomainLogic()
        mvc = domain_logic.most_valuable_customer()
        self.assertEqual('BigTestCorp', mvc.name)

This works, but it has three big disadvantages:

First, the fact that DomainLogic has a dependency at all (in our case a Connection object) is totally hidden. One has to look into the implementation of DomainLogic to actually find out about it. Such surprises make it harder for users of your class DomainLogic i.e., other developers, including your future self, to use and test it properly.

Second, you can no longer safely use a unit-testing framework running multiple tests concurrently, because test cases might perform different monkey patchings on the same classes, and thus step on each other's toes.

Third, in the case of multiple dependencies, not all tests might need to patch the same number of things in the class. Then, even with non-concurrent test execution, this can end up in not correctly resetting the "state" of the class in some place after one test, and subsequently invalidating other tests. The overall success of the test suite will depend on the order of execution of the test cases if not all developers take special care. Yikes!

In general, one runs into the same problems as usual when manually mutating (and depending on) some global state from different places. In this case, it's just not some normal runtime value but the class itself is mutated.

Having a globally accessible service locator, actively called from within our DomainLogic constructor to get the required Connection instance, would also not solve the problem. The dependencies would still depend on mutatable global state, and be hidden instead of being visible on our classes' APIs.

So the sane approach is to use inversion of control to make the dependency explicit by letting the constructor of DomainLogic take the data-access object to use as a parameter instead of monkeying around with the class itself:

# domain.py
from database_connection import Connection

class DomainLogic:
    def __init__(self, data_access: Connection):
        self.data_access = data_access

    def most_valuable_customer(self):
        return max(self.data_access.get_all_customers(),
                   key=lambda customer: customer.value_in_dollars)

(To adhere to the dependency-inversion principle, __init__ could be implemented against an interface instead of a concrete Connection implementation.)

Since we now cleanly separated the concerns of creating and using the Connection, we can instantiate DomainLogic with whatever data source is suitable for any given situation:

from database_connection import Connection
from domain import DomainLogic

if __name__ == '__main__':
    business = DomainLogic(Connection())
class DomainLogicTest(unittest.TestCase):
    def test_most_valuable_customer(self) -> None:
        domain_logic = DomainLogic(create_mock_connection())

So this already solves the three pain points mentioned earlier.


Now it's time to make using this technique as convenient as possible.

With larger, real-world dependency trees, the shown manual injection can become quite cumbersome, as the following example, creating a hypothetical Controller, shows:

some_repository = SomeRepository()
my_controller = Controller(
                    SomeService(
                        some_repository
                    ),
                    SomeOtherService(
                        some_repository,
                        SomeOtherRepository()
                    )
                )

Also, some things, like repositories, might need to be a singleton instance, i.e., we only want to instantiate them once in our whole application. Manually taking care of this in the code adds an additional burden.

One possible solution is to externalize these tree definitions, e.g., into a (maybe XML-like) configuration file, and have some framework taking care of wiring the instances at runtime.

But fortunately, since we are using static type hints for our constructor parameters, we can avoid this manual work completely. Instead, an appropriate framework can automatically deduce the graph. We can then write something very simple:

business = assemble(DomainLogic)

and let a provided assemble function do all the work.

assemble can recognize, the types clearly stating that domain.DomainLogic needs a database_connection.Connection, thus it can construct and inject the dependency automagically.

One framework, providing such functionality, is enterprython. Using it, we only need to annotate our "service" (database_connection.Connection) with @component(). Then assemble, which is provided by the library, can already do its thing, i.e., create our "client" (domain.DomainLogic).

# database_connection.py
from enterprython import component

@component()
class Connection:
    ...
# domain.py
from database_connection import Connection

class DomainLogic:
    def __init__(self, data_access: Connection):
        ...
from enterprython import assemble
from domain import DomainLogic

if __name__ == '__main__':
    business = assemble(DomainLogic)

If the service (database_connection.Connection in that case) would itself depend on other things, they also would be auto-created (singleton style) in the libraries DI container, and then injected, rinsing and repeating until the full dependency tree is resolved. In case something is missing, an appropriate exception will be raised.

In addition to this minimal example, with enterprython we can also:

  • work with abstract base classes, thus only depend on abstractions, which enables the use of different implementation for different environments, e.g., test vs. prod.
  • decide if services should be singletons or not.
  • provide custom service factories.
  • do some additional other stuff, see enterprython's features list.

Naturally, in our unit tests, we can still manually constructor-inject a MockConnection into the DomainLogic object we want to test:

domain_logic = DomainLogic(create_mock_connection())

Of course enterprython might not be the only static-type-annotation-based DI framework available for Python. In case, with this article, I was successful in convincing you about the general usefulness of this pattern, I'd like to encourage you to give it a try, but also to check out other options too.