Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ #6204

Merged
merged 18 commits into from
Nov 11, 2022

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Oct 14, 2022

Closes #4809 and closes #6132 and fixes #6145.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata labels Oct 14, 2022
@github-actions
Copy link

github-actions bot commented Oct 14, 2022

Unit Test Results (metadata ingestion)

       8 files  ±0         8 suites  ±0   1h 0m 8s ⏱️ -18s
   757 tests ±0     754 ✔️ ±0  3 💤 ±0  0 ±0 
1 516 runs  ±0  1 509 ✔️ ±0  7 💤 ±0  0 ±0 

Results for commit 6e91c86. ± Comparison against base commit ae2ea52.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Oct 14, 2022

Unit Test Results (build & test)

613 tests  ±0   609 ✔️ ±0   11m 41s ⏱️ -17s
151 suites ±0       4 💤 ±0 
151 files   ±0       0 ±0 

Results for commit 6e91c86. ± Comparison against base commit ae2ea52.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@koconder koconder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make one minor change

@hsheth2 hsheth2 changed the title feat(ingest): loosen sqlalchemy dep feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ Oct 25, 2022
@hsheth2 hsheth2 changed the title feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ feat(ingest): loosen sqlalchemy dep & support airflow 2.3 Oct 25, 2022
@tommy-watts-depop
Copy link

Hey, Is there an ETA for this?

@hsheth2
Copy link
Collaborator Author

hsheth2 commented Oct 27, 2022

@tommy-watts-depop Can't commit to an ETA, but I'm hoping to wrap this up mid-next week

@Vimpy
Copy link

Vimpy commented Nov 1, 2022

Hey any ETA to merge?

update pytest-docker

feat(ingest): loosen sqlalchemy dep

ci matrix

update airflow stuff to match

more airflow fixes

always install the right sqlalchemy stubs

sqlalchemy compat

type fixes

fix ci

fix types in oracle

fix more stuff with lint

fix attrs typing

more airflow type annotations

refactor iolet preprocessing

more type fixes

even more type fixes

markupsafe compat

fix airflow tests

more airflow compat

tweak workflow

fix imports

skip dag bag load test

fix sqlalchemy mypy

final type fixes

ignore more mypy issues

add assert to handle mypy issue

fix athena test

modify airflow plugin
@hsheth2 hsheth2 marked this pull request as ready for review November 3, 2022 06:17
@KulykDmytro
Copy link
Contributor

in meanwhile airflow released 2.4.2 version

@Vimpy
Copy link

Vimpy commented Nov 9, 2022

Hey any ETA to merge this?

Copy link
Contributor

@koconder koconder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@koconder
Copy link
Contributor

@Vimpy needs another reviewer to approve the PR

@koconder
Copy link
Contributor

Lint tests failing:

ERROR: /home/runner/work/datahub/datahub/metadata-ingestion/src/datahub_provider/_plugin.py Imports are incorrectly sorted and/or formatted.
--- /home/runner/work/datahub/datahub/metadata-ingestion/src/datahub_provider/_plugin.py:before	2022-11-10 07:44:16.998670
+++ /home/runner/work/datahub/datahub/metadata-ingestion/src/datahub_provider/_plugin.py:after	2022-11-10 07:52:03.617106
ERROR: /home/runner/work/datahub/datahub/metadata-ingestion/src/datahub_provider/_lineage_core.py Imports are incorrectly sorted and/or formatted.

Also smoke tests failing, @hsheth2 can you clean up your commits?

@keerthiis
Copy link

keerthiis commented Nov 11, 2022

@hsheth2 - we see this issue with traitlets & pyspark & grpcio too :
acryl-datahub[redshift] 0.8.45 depends on traitlets<5.2.2; extra == "redshift"
The user requested (constraint file for Airflow 2.3.2) traitlets==5.2.2.post1

acryl-datahub[airflow,datahub-business-glossary,datahub-kafka,datahub-rest,druid,glue,kafka,s3,sagemaker] 0.8.41 depends on pyspark==3.0.3; extra == "s3"
The user requested (constraint) pyspark==3.2.1

 acryl-datahub[airflow,datahub-business-glossary,datahub-kafka,datahub-rest,druid,glue,kafka,sagemaker] 0.8.45 depends on grpcio==1.44.0; extra == "kafka"
The user requested (constraint) grpcio==1.46.3

@hsheth2 hsheth2 changed the title feat(ingest): loosen sqlalchemy dep & support airflow 2.3 feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ Nov 11, 2022
Copy link
Contributor

@treff7es treff7es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsheth2
Copy link
Collaborator Author

hsheth2 commented Nov 11, 2022

@keerthiis I'd recommend using the PythonVirtualenvOperator in your airflow setup - your root airflow image should only have acryl-datahub[airflow] or acryl-datahub-airflow-plugin (depending on if you're using the lineage backend or the plugin), and then the specific tasks would depend on acryl-datahub[<source-name>]

I'm planning on adding an example to the docs as well

@hsheth2 hsheth2 merged commit 3e907ab into datahub-project:master Nov 11, 2022
@hsheth2 hsheth2 deleted the loosen-airflow branch November 11, 2022 20:04
hsheth2 added a commit to hsheth2/datahub that referenced this pull request Nov 12, 2022
treff7es pushed a commit that referenced this pull request Nov 14, 2022
cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023
cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
7 participants