-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example DAG to load/export from Bigquery #265
Conversation
48140f4
to
3f6b308
Compare
Codecov Report
@@ Coverage Diff @@
## main #265 +/- ##
=======================================
Coverage 91.15% 91.15%
=======================================
Files 36 36
Lines 1526 1526
Branches 257 257
=======================================
Hits 1391 1391
Misses 109 109
Partials 26 26 Continue to review full report at Codecov.
|
8faac98
to
7cf9cb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaxil it looks great! I have a minor suggestion and a question, inline.
- And loads it into a Google Cloud Storage bucket in a CSV file | ||
|
||
Pre-requisites: | ||
- Create an Airflow Connection to connect to Bigquery Table. Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on encouraging users to use Airflow default connections as opposed to creating a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recommend in Production use-cases to run airflow db upgrade
(not airflow db init
. Ref: Airflow docs ) which does not create default connections.
- Finds the Top 5 movies based on Rating using pandas dataframe | ||
- And loads it into a Google Cloud Storage bucket in a CSV file | ||
|
||
Pre-requisites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth to add to the pre-requisites the fact the user needs to install astro either using astro[all]
or astro[google]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done on L9: I didn't add the library name purposely, so we don't have to rename it when we change the package name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Example DAG: - Pulls a CSV file from Github and loads it into BigQuery. - Extracts the data from BigQuery and load into in-memory Pandas Dataframe - Finds the Top 5 movies based on Rating using pandas dataframe - And loads it into a Google Cloud Storage bucket in a CSV file closes #150
7cf9cb0
to
caa57e1
Compare
This Example DAG:
closes #150