Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example DAG to load/export from Bigquery #265

Merged
merged 1 commit into from
Apr 4, 2022
Merged

Conversation

kaxil
Copy link
Collaborator

@kaxil kaxil commented Apr 1, 2022

This Example DAG:

  • Pulls a CSV file from Github and loads it into BigQuery.
  • Extracts the data from BigQuery and load into in-memory Pandas Dataframe
  • Finds the Top 5 movies based on Rating using pandas dataframe
  • And loads it into a Google Cloud Storage bucket in a CSV file

closes #150

@codecov
Copy link

codecov bot commented Apr 2, 2022

Codecov Report

Merging #265 (8faac98) into main (78dcfcd) will not change coverage.
The diff coverage is n/a.

❗ Current head 8faac98 differs from pull request most recent head caa57e1. Consider uploading reports for the commit caa57e1 to get more accurate results

@@           Coverage Diff           @@
##             main     #265   +/-   ##
=======================================
  Coverage   91.15%   91.15%           
=======================================
  Files          36       36           
  Lines        1526     1526           
  Branches      257      257           
=======================================
  Hits         1391     1391           
  Misses        109      109           
  Partials       26       26           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4b825f...caa57e1. Read the comment docs.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaxil it looks great! I have a minor suggestion and a question, inline.

- And loads it into a Google Cloud Storage bucket in a CSV file

Pre-requisites:
- Create an Airflow Connection to connect to Bigquery Table. Example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on encouraging users to use Airflow default connections as opposed to creating a new one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recommend in Production use-cases to run airflow db upgrade (not airflow db init. Ref: Airflow docs ) which does not create default connections.

- Finds the Top 5 movies based on Rating using pandas dataframe
- And loads it into a Google Cloud Storage bucket in a CSV file

Pre-requisites:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth to add to the pre-requisites the fact the user needs to install astro either using astro[all] or astro[google].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done on L9: I didn't add the library name purposely, so we don't have to rename it when we change the package name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Example DAG:
 - Pulls a CSV file from Github and loads it into BigQuery.
 - Extracts the data from BigQuery and load into in-memory Pandas Dataframe
 - Finds the Top 5 movies based on Rating using pandas dataframe
 - And loads it into a Google Cloud Storage bucket in a CSV file

closes #150
@kaxil kaxil merged commit 784bcb7 into main Apr 4, 2022
@kaxil kaxil deleted the Add-BigQuery-example branch April 4, 2022 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add BigQuery example
2 participants