Skip to content

Release 0.6.0

Choose a tag to compare

@github-actions github-actions released this 03 Dec 03:26
· 196 commits to main since this release
39766fd

Release 0.6.0

Installation

pip install baselinr==0.6.0

See the full changelog for details.

Release Notes for v0.6.0

New Features

  • Airflow 2.x Integration: Complete Apache Airflow integration with operators, sensors, and hooks for data profiling and drift detection
  • Airflow Operators: BaselinrProfileOperator, BaselinrDriftOperator, and BaselinrQueryOperator for seamless integration into Airflow DAGs
  • Root Cause Analysis (RCA) Module: Comprehensive RCA system for analyzing data anomalies with lineage analysis, temporal correlation, and pattern matching
  • RCA Collectors: Pipeline run collectors for Airflow, Dagster, and dbt to gather metadata for root cause analysis
  • Data Validation Engine: New validation framework for data quality checks
  • Airflow RCA Collector: Collects Airflow DAG run metadata via REST API, direct database access, or environment variables
  • Docker Airflow Support: Full Airflow stack in Docker with example DAGs and integration testing

Improvements

  • SQLAlchemy 2.0 Compatibility: Fixed transaction management to use engine.begin() for proper DML operations
  • Enhanced RCA Analysis: Improved lineage-based and temporal correlation analysis for identifying root causes
  • Airflow Configuration: Extended RCA collector configuration with Airflow-specific settings for API and database access
  • Test Coverage: Comprehensive unit tests for Airflow integration with proper mocking of optional dependencies

Maintenance

  • Dependency Management: Added apache-airflow>=2.0.0,<3.0.0 as optional dependency
  • Code Quality: Fixed all SQLAlchemy 2.0 compatibility issues in storage layer
  • Documentation: Added comprehensive Airflow integration guides and quick start documentation