Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): unbundle airflow plugin emitter dependencies #7493

Merged
merged 2 commits into from
Mar 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ This file documents any backwards-incompatible changes in DataHub and assists pe

### Breaking Changes
- #7016 Add `add_database_name_to_urn` flag to Oracle source which ensure that Dataset urns have the DB name as a prefix to prevent collision (.e.g. {database}.{schema}.{table}). ONLY breaking if you set this flag to true, otherwise behavior remains the same.
- The Airflow plugin no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub-airflow-plugin[datahub-kafka]` for Kafka support.
- The Airflow lineage backend no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub[airflow,datahub-kafka]` for Kafka support.


### Potential Downtime

Expand Down
8 changes: 8 additions & 0 deletions docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ If you're using Airflow 1.x, use the Airflow lineage plugin with acryl-datahub-a
pip install acryl-datahub-airflow-plugin
```

:::note

The [DataHub Rest](../../metadata-ingestion/sink_docs/datahub.md#datahub-rest) emitter is included in the plugin package by default. To use [DataHub Kafka](../../metadata-ingestion/sink_docs/datahub.md#datahub-kafka) install `pip install acryl-datahub-airflow-plugin[datahub-kafka]`.

:::

2. Disable lazy plugin loading in your airflow.cfg.
On MWAA you should add this config to your [Apache Airflow configuration options](https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-env-variables.html#configuring-2.0-airflow-override).

Expand Down Expand Up @@ -89,6 +95,8 @@ If you are looking to run Airflow and DataHub using docker locally, follow the g

```shell
pip install acryl-datahub[airflow]
# If you need the Kafka-based emitter/hook:
pip install acryl-datahub[airflow,datahub-kafka]
```

2. You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one.
Expand Down
1 change: 1 addition & 0 deletions metadata-ingestion-modules/airflow-plugin/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,5 +125,6 @@ def get_long_description():
install_requires=list(base_requirements),
extras_require={
"dev": list(dev_requirements),
"datahub-kafka": f"acryl-datahub[datahub-kafka] == {package_metadata['__version__']}",
},
)
1 change: 0 additions & 1 deletion metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,6 @@ def get_long_description():
"airflow": {
"apache-airflow >= 2.0.2",
*rest_common,
*kafka_common,
},
"circuit-breaker": {
"gql>=3.3.0",
Expand Down