This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
Remove get_*_operator
functions, simplify commoncrawl logic
#301
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AetherUnbound
added
✨ goal: improvement
Improvement to an existing user-facing feature
💻 aspect: code
Concerns the software code in the repository
🟩 priority: low
Low priority and doesn't need to be rushed
labels
Dec 9, 2021
zackkrida
reviewed
Dec 9, 2021
Comment on lines
+151
to
+153
# This was "--default" previously but a task within the DAG | ||
# modified it on DAG parse time to be this value. | ||
CC_INDEX_TEMPLATE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup!
zackkrida
approved these changes
Dec 9, 2021
sarayourfriend
approved these changes
Dec 10, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Here's the testing output:
#############################################################################
RUNNING "airflow dags list"
#############################################################################
dag_id | filepath | owner | paused
==========================+=========================+================+=======
airflow_log_cleanup | maintenance/airflow_log | data-eng-admin | True
| _cleanup_workflow.py | |
brooklyn_museum_workflow | providers/brooklyn_muse | data-eng-admin | True
| um_workflow.py | |
check_new_smithsonian_uni | providers/check_new_smi | data-eng-admin | True
t_codes_workflow | thsonian_unit_codes_wor | |
| kflow.py | |
cleveland_museum_workflow | providers/cleveland_mus | data-eng-admin | True
| eum_workflow.py | |
commoncrawl_etl_workflow | commoncrawl/commoncrawl | data-eng-admin | True
| _etl.py | |
europeana_ingestion_workf | providers/europeana_ing | data-eng-admin | True
low | estion_workflow.py | |
europeana_workflow | providers/europeana_wor | data-eng-admin | True
| kflow.py | |
finnish_museums_workflow | providers/finnish_museu | data-eng-admin | True
| ms_workflow.py | |
flickr_ingestion_workflow | providers/flickr_ingest | data-eng-admin | True
| ion_workflow.py | |
flickr_workflow | providers/flickr_workfl | data-eng-admin | True
| ow.py | |
freesound_workflow | providers/freesound_wor | data-eng-admin | True
| kflow.py | |
image_expiration_workflow | database/image_expirati | data-eng-admin | True
| on_workflow.py | |
jamendo_workflow | providers/jamendo_workf | data-eng-admin | True
| low.py | |
metropolitan_museum_workf | providers/metropolitan_ | data-eng-admin | True
low | museum_workflow.py | |
museum_victoria_workflow | providers/museum_victor | data-eng-admin | True
| ia_workflow.py | |
nypl_workflow | providers/nypl_workflow | data-eng-admin | True
| .py | |
oauth2_authorization | oauth2/authorize_dag.py | data-eng-admin | True
oauth2_token_refresh | oauth2/token_refresh_da | data-eng-admin | True
| g.py | |
phylopic_workflow | providers/phylopic_work | data-eng-admin | True
| flow.py | |
rawpixel_workflow | providers/rawpixel_work | data-eng-admin | True
| flow.py | |
recreate_audio_popularity | database/recreate_audio | data-eng-admin | True
_calculation | _popularity_calculation | |
| .py | |
recreate_image_popularity | database/recreate_image | data-eng-admin | True
_calculation | _popularity_calculation | |
| .py | |
refresh_all_audio_popular | database/refresh_all_au | data-eng-admin | True
ity_data | dio_popularity_data.py | |
refresh_all_image_popular | database/refresh_all_im | data-eng-admin | True
ity_data | age_popularity_data.py | |
refresh_audio_view_data | database/refresh_audio_ | data-eng-admin | True
| view_data.py | |
refresh_image_view_data | database/refresh_image_ | data-eng-admin | True
| view_data.py | |
science_museum_workflow | providers/science_museu | data-eng-admin | True
| m_workflow.py | |
smithsonian_workflow | providers/smithsonian_w | data-eng-admin | True
| orkflow.py | |
staten_museum_workflow | providers/statens_museu | data-eng-admin | True
| m_workflow.py | |
stocksnap_workflow | providers/stocksnap_wor | data-eng-admin | True
| kflow.py | |
sync_commoncrawl_workflow | commoncrawl/sync_common | data-eng-admin | True
| crawl_workflow.py | |
tsv_to_postgres_loader | database/loader_workflo | data-eng-admin | True
| w.py | |
walters_workflow | providers/walters_workf | data-eng-admin | True
| low.py | |
wikimedia_commons_workflo | providers/wikimedia_wor | data-eng-admin | True
w | kflow.py | |
wikimedia_ingestion_workf | providers/wikimedia_ing | data-eng-admin | True
low | estion_workflow.py | |
wordpress_workflow | providers/wordpress_wor | data-eng-admin | True
| kflow.py | |
I see two commoncrawl entries. Is that all that should be there?
That's correct! |
This was referenced Jul 12, 2022
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟩 priority: low
Low priority and doesn't need to be rushed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
Fixes #242 by @AetherUnbound
Description
As described in #242, many of the
get_*_operator
functions are only used once and create an unnecessary level of indirection when trying to view a DAG definition. This PR attempts to remove allget_*_operator
functions and replace them with operator definitions directly within the DAG definition.The commoncrawl DAGs were the most complex to refactor, and I ended up creating a utility function to house some of the utilities that were defined in the
operators.py
file initially.While attempting to verify this I also found that our
.airflowignore
file was a bit too permissive and was preventing the commoncrawl DAGs from being parsed. I corrected this behavior and the DAGs parsed just fine.Testing Instructions
just test
andjust run airflow dags list
, specifically checking for the presence of commoncrawl DAGs!Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin