Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow operator #215

Merged
merged 7 commits into from
Nov 10, 2023
Merged

Airflow operator #215

merged 7 commits into from
Nov 10, 2023

Conversation

lprzychodzien
Copy link
Collaborator

@lprzychodzien lprzychodzien commented Nov 4, 2023

Explanation

Made the following changes:

  • Updated airflow so each DAG has an individual folder
  • Centralized airflow DAG under airflow_operator. Airflow specific functions should go here.
  • Centralized common tasks under common_dag_tasks file
  • Fixed FDA enforcement to pull full zip file vs daily
  • Fixed purple book to modify CSV outside of loading for better maintainability and room for additional expansions.

Did not change all DAGs, remaining DAGs that to be updated includes:

  • Dailymed Daily DAG
  • Dailymed RX Full
  • NADAC
  • RxNorm

Tests

Tested each touched DAG with the following results: (:heavy_check_mark: means all tasks ran)

  • dailymed_pharm_class: ✔️
  • dailymed_rxnorm: ✔️
  • dailymed_zip_file_metadata: ✔️
  • fda_enforcement: ✔️
  • fda_excluded: ✔️
  • fda_ndc: ✔️
  • fda_unfinished: ✔️
  • orange_book: ✔️
  • purple_book: ✔️
  • rxterms: ✔️

Copy link
Member

@jrlegrand jrlegrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor changes that I think can be cleaned up with a future PR. Thank you for this work! We've been meaning to clean this stuff up for a long time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lprzychodzien - why are these dag tasks here? Do we need to remove this file since you switched to the zip file vs API method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe just keep the load_json and load_df_to_pg tasks and get rid of the extract task?

"fda_enforcement",
con=engine,
schema="datasource",
if_exists="append",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are kind of loading the full file every time, we probably want to change this to "replace" - "append" was used originally b/c we were loading week by week.

@jrlegrand jrlegrand merged commit f1e0448 into coderxio:main Nov 10, 2023
@jrlegrand
Copy link
Member

Fixes #214
Fixes #34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants