<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Dependency_Inversion_Principle_in_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build Reliable Machine Learning Pipelines with the Dependency Inversion Principle in Python

## Decouple your ML components for maximum testability, flexibility, and scalability.


| ![space-1.jpg](https://github.com/Tanu-N-Prabhu/Python/blob/master/Img/christina-wocintechchat-com-SqmaKDvcIso-unsplash.jpg?raw=true) |
|:--:|
|Photo by <a href="https://unsplash.com/@wocintechchat?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Christina @ wocintechchat.com</a> on <a href="https://unsplash.com/photos/shallow-focus-photo-of-python-book-SqmaKDvcIso?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>|

### Introduction
Machine Learning systems aren’t just models, they’re complex software systems with data pipelines, model orchestration, and deployment layers. Yet many ML engineers overlook design principles that make code production-ready. Today, let’s look at the Dependency Inversion Principle (DIP) and how it can transform the way you structure ML systems.

---

### Problem
In typical ML scripts, low-level modules (like Scikit-learn or Pandas) are tightly coupled with high-level business logic. This creates brittle systems; changing one thing breaks everything else. It also makes testing and scaling nearly impossible.

---

### Design Principle
#### Dependency Inversion Principle (DIP)

From SOLID principles, DIP states:

> *High-level modules should not depend on low-level modules. Both should depend on abstractions.*

In ML terms: your training code shouldn’t care whether you use Scikit-learn, XGBoost, or PyTorch — it should depend on abstract interfaces.


---

### Code Implementation (Clean ML Training with DIP)



In [1]:
# interfaces.py
from abc import ABC, abstractmethod

class IDataLoader(ABC):
    @abstractmethod
    def load_data(self):
        pass

class IModel(ABC):
    @abstractmethod
    def train(self, X, y):
        pass

    @abstractmethod
    def evaluate(self, X, y):
        pass

In [3]:
!pip install interfaces

Collecting interfaces
  Downloading interfaces-0.0.4.tar.gz (2.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: interfaces
  Building wheel for interfaces (setup.py) ... [?25l[?25hdone
  Created wheel for interfaces: filename=interfaces-0.0.4-py3-none-any.whl size=3251 sha256=d3d13f45df4d9ebeca2d4fe4ef37ce85746847da0b6fe245a0d87e26e08dda07
  Stored in directory: /root/.cache/pip/wheels/9d/f5/73/07bdb84637b7fbdef79552b29797f459815be4e2d37aa2ca61
Successfully built interfaces
Installing collected packages: interfaces
Successfully installed interfaces-0.0.4


In [None]:
# sklearn_implementations.py
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from interfaces import IDataLoader, IModel

class SklearnDataLoader(IDataLoader):
    def load_data(self):
        data = load_iris()
        return data.data, data.target

class SklearnRFModel(IModel):
    def __init__(self):
        self.model = RandomForestClassifier()

    def train(self, X, y):
        self.model.fit(X, y)

    def evaluate(self, X, y):
        return accuracy_score(y, self.model.predict(X))

In [None]:
# main.py
from sklearn_implementations import SklearnDataLoader, SklearnRFModel

def run_pipeline(data_loader, model):
    X, y = data_loader.load_data()
    model.train(X, y)
    accuracy = model.evaluate(X, y)
    print("Model Accuracy:", accuracy)

if __name__ == "__main__":
    run_pipeline(SklearnDataLoader(), SklearnRFModel())

### Output

Model Accuracy: 1.0

---

### Code Explanation

* `interfaces.py`: Defines abstractions for loading data and training models.

* `sklearn_implementations.py`: Implements these interfaces using Scikit-learn.

* `main.py`: Depends only on abstractions, not concrete libraries.

* Makes it easy to swap out `SklearnRFModel` with `XGBoostModel` or even a deep learning model.

---

### Why it’s so important

* Promotes flexibility: Swap out components without changing core logic.

* Enables mocking and unit testing: You can fake IDataLoader during tests.

* Decouples your ML pipeline from vendor lock-in (Scikit-learn, TensorFlow, etc.).

* Production-ready design pattern for building ML SDKs or APIs.

---

### UML Class Diagram

| ![space-1.jpg](https://github.com/Tanu-N-Prabhu/Python/blob/master/Img/uml_dip.png?raw=true) |
|:--:|
|Designed by Author|

####  UML Class Diagram Explanation

1. `IDataLoader` (Abstract Class / Interface)
    * This is an abstraction.
    * It declares a method `load_data()`.
    * Any data loader class (e.g., Scikit-learn, CSV, API-based) must implement this method.

2. `IModel` (Abstract Class / Interface)
    * Another abstraction defining two essential ML behaviors:
        * `train(X, y)`
        * `evaluate(X, y)`
    * Different model implementations (Random Forest, XGBoost, etc.) adhere to this interface.

3. `SklearnDataLoader` (Concrete Class)
    * Implements `IDataLoader`.
    * Loads data using Scikit-learn (in this example, the Iris dataset).
    * Fully replaceable with other loaders (e.g., `PandasCSVLoader`, `SQLDataLoader`) without changing the rest of the code.

4. `SklearnRFModel` (Concrete Class)
    * Implements `IModel`.
    * Uses a `RandomForestClassifier` from Scikit-learn internally.
    * Can be swapped out with any model implementing `IModel`(e.g., `XGBoostModel`, `KerasModel`).

5. `main.py`
    * Acts as the high-level module.
    * It depends on the interfaces `IDataLoader` and `IModel`, not on the concrete implementations.
    * This allows complete flexibility: inject any compatible class without modifying the pipeline logic.

---

### How This Reflects Dependency Inversion Principle
* Abstractions (interfaces) define contracts both high-level (`main.py`) and low-level (`SklearnDataLoader`, `SklearnRFModel`) modules rely on.

* High-level logic does not care how data is loaded or how the model is implemented.

* Enables inversion of control, objects are passed in (“injected”), not created inside.

---

### Applications

* Plug-and-play AutoML frameworks.

* calable ML SDKs for teams or open-source projects.

* Backend ML APIs where models can be swapped dynamically.

* Systems where testing, logging, or monitoring is critical.

---

### Conclusion
By following the Dependency Inversion Principle, you elevate your ML projects from experimental notebooks to clean, scalable systems. It's the key to writing machine learning code that doesn't just work, it lasts. This is what separates a good ML engineer from a great software-engineering-minded one. Thanks for reading my article, let me know if you have any suggestions or similar implementations via the comment section. Until then, see you next time. Happy coding!

---

### Before you go
* Be sure to Like and Connect Me
* Follow Me : [Medium](https://medium.com/@tanunprabhu95) | [GitHub](https://github.com/Tanu-N-Prabhu) | [LinkedIn](https://ca.linkedin.com/in/tanu-nanda-prabhu-a15a091b5) | [Python Hub](https://github.com/Tanu-N-Prabhu/Python)
* [Check out my latest articles on Programming](https://medium.com/@tanunprabhu95)
* Check out my [GitHub](https://github.com/Tanu-N-Prabhu) for code and [Medium](https://medium.com/@tanunprabhu95) for deep dives!



