Release 0.6.0
Release 0.6.0
Installation
pip install baselinr==0.6.0
See the full changelog for details.
Release Notes for v0.6.0
New Features
- Airflow 2.x Integration: Complete Apache Airflow integration with operators, sensors, and hooks for data profiling and drift detection
- Airflow Operators:
BaselinrProfileOperator,BaselinrDriftOperator, andBaselinrQueryOperatorfor seamless integration into Airflow DAGs - Root Cause Analysis (RCA) Module: Comprehensive RCA system for analyzing data anomalies with lineage analysis, temporal correlation, and pattern matching
- RCA Collectors: Pipeline run collectors for Airflow, Dagster, and dbt to gather metadata for root cause analysis
- Data Validation Engine: New validation framework for data quality checks
- Airflow RCA Collector: Collects Airflow DAG run metadata via REST API, direct database access, or environment variables
- Docker Airflow Support: Full Airflow stack in Docker with example DAGs and integration testing
Improvements
- SQLAlchemy 2.0 Compatibility: Fixed transaction management to use
engine.begin()for proper DML operations - Enhanced RCA Analysis: Improved lineage-based and temporal correlation analysis for identifying root causes
- Airflow Configuration: Extended RCA collector configuration with Airflow-specific settings for API and database access
- Test Coverage: Comprehensive unit tests for Airflow integration with proper mocking of optional dependencies
Maintenance
- Dependency Management: Added
apache-airflow>=2.0.0,<3.0.0as optional dependency - Code Quality: Fixed all SQLAlchemy 2.0 compatibility issues in storage layer
- Documentation: Added comprehensive Airflow integration guides and quick start documentation