-
Notifications
You must be signed in to change notification settings - Fork 295
Make the DBT dependency optional #412
Description
Alternative title: Sudden dependency bloat in data-diff 4.0 and up
Overview of the problem
Version 0.3.2 of data-diff required 11 dependencies total, most of which are already very popular libraries, like "rich", "click", and "dsnparse".
Version 0.4.0 of data-diff introduced 66 new dependencies (!).
This is due to the introduction of the mandatory "dbt" dependency, even for users who don't plan to use the --dbt switch.
Implications of not fixing this
-
Minimum load time for the tool increased from
0.165seconds to2.410seconds (!!) -
Extra requirements might collide with the requirements of other Python libraries, and make it harder for our non-dbt users to have data-diff installed alongside them.
-
Users who would consider using data-diff as a lightweight tool might be put out by the large number of dependencies.
-
That is also true in regards to being included by default in package managers. For example, in Ubuntu you can
apt install python3-larkto install Lark. It would have been much harder to include it if it had many dependencies.
Implications of fixing this
Users that don't already have dbt installed, and run --dbt, will see a message telling them to use pip install data-diff[dbt] which will install dbt.
However, there is absolutely no point in using the --dbt switch if you don't already have dbt installed and configured.
Conclusion
I think there is no reason to keep dbt as a dependency, but very good reasons to make it optional.