Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a multiple node DAG for GCP #30

Closed
8 tasks done
hannelita opened this issue Jun 22, 2020 · 4 comments
Closed
8 tasks done

Create a multiple node DAG for GCP #30

hannelita opened this issue Jun 22, 2020 · 4 comments

Comments

@hannelita
Copy link
Contributor

hannelita commented Jun 22, 2020

Using the same structure of the multiple node DAG assuming local files, create a DAG that handles resources on the cloud (CSV and JSON).

Acceptance

  • DAG with multiple nodes doing the entire pipeline on GCP.

Tasks

DAG tasks:

  • 1. Upload CSV from CKAN instance to bucket Read remote CSV
  • 2. delete_datastore
  • 3. create_datastore
  • 4. Creake JSOn file on bucket
  • 5. convert_csv_to_json
  • 6. Send converted JSON file to CKAN

ckannext-aircan (connector) tasks:
- [x] 1. create endpoint to receive Airflow response after processing
- [x] 2. Handle Airflow response
- [ ] 3. If success, download processed json file from bucket

NOTE: This is not the strategy. We will send the processed JSON via API.

Screen Shot 2020-07-13 at 07 33 35

Analysis

After this long task is complete, we still need to:
- [ ] Handle errors This will be in the next milestone
- [ ] Handle absence of a response from Airflow This will be in the next milestone
- [ ] Delete remote file (create a separate DAG for that) This will be in the next milestone

@hannelita
Copy link
Contributor Author

Before everything, try to specify remote file locations for csv and json

@hannelita
Copy link
Contributor Author

Remote CSV fetching works

@hannelita
Copy link
Contributor Author

Problem with enoding when reading remote file on Airflow DAG

@hannelita
Copy link
Contributor Author

Resolved with #53

@hannelita hannelita mentioned this issue Jul 15, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant