Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-organizing the docs + Google Cloud Composer #38

Merged
merged 1 commit into from
Jun 26, 2020

Conversation

hannelita
Copy link
Contributor

Fixes #37

@hannelita hannelita requested a review from mbeilin June 24, 2020 14:04
@hannelita
Copy link
Contributor Author

@mbeilin Now there are instructions for GCP 😊 Can you review + check if they are more or less accurate?

@hannelita hannelita mentioned this pull request Jun 24, 2020
5 tasks

`ckan_api_load_multiple_steps` does the same steps of `api_ckan_load_single_node`, but it uses multiple nodes (tasks). You can repeat the steps of the previous section and run `ckan_api_load_multiple_steps`.


Copy link
Contributor

@mbeilin mbeilin Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the following error on runtime:

 airflow trigger_dag ckan_api_load_multiple_steps  --conf='{ "resource_id": "be9f63d3-40c1-4261-acb1-232bf675da64", "schema_fields_array": [ 'id', 'full_text', 'FID' ], "csv_input": "/home/michael/airflow/example.csv", "json_output": "/home/michael/airflow/my2.json" }'
[2020-06-25 13:27:05,058] {__init__.py:51} INFO - Using executor SequentialExecutor
[2020-06-25 13:27:05,058] {dagbag.py:396} INFO - Filling up the DagBag from /home/michael/PycharmProjects/aircan_project/aircan/aircan/dags/api_ckan_load_multiple_nodes.py
Traceback (most recent call last):
  File "/home/michael/.local/bin/airflow", line 37, in <module>
    args.func(args)
  File "/home/michael/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 75, in wrapper
    return f(*args, **kwargs)
  File "/home/michael/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 237, in trigger_dag
    execution_date=args.exec_date)
  File "/home/michael/.local/lib/python3.6/site-packages/airflow/api/client/local_client.py", line 34, in trigger_dag
    execution_date=execution_date)
  File "/home/michael/.local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag
    replace_microseconds=replace_microseconds,
  File "/home/michael/.local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 86, in _trigger_dag
    run_conf = json.loads(conf)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 83 (char 82)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at your "schema_fields_array", they must contain fields in double quotes: ex: "schema_fields_array": [ "FID", "Mkt-RF", "SMB", "HML", "RF" ]

`ckan_api_load_multiple_steps` does the same steps of `api_ckan_load_single_node`, but it uses multiple nodes (tasks). You can repeat the steps of the previous section and run `ckan_api_load_multiple_steps`.



Copy link
Contributor

@mbeilin mbeilin Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand while triggering the DAG without sending parameters in CLI:
airflow trigger_dag ckan_api_load_multiple_steps
and setting DAG parameters directly in args in code seems working:

args = {
   'start_date': days_ago(0),
    'params': { 
        "resource_id": "be9f63d3-40c1-4261-acb1-232bf675da64",
        "schema_fields_array": "[ 'id', 'full_text', 'FID' ]",
        "csv_input": "/home/michael/airflow/example.csv",
        "json_output": "/home/michael/airflow/my.json"
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to be able to pass the params on the trigger. Like, imagine someone just grabbing our dags - we may want something transparent where users do not need to modify the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that was just to get things working, actually the double quotes make the difference 💯

@mbeilin
Copy link
Contributor

mbeilin commented Jun 25, 2020

LGTM

@hannelita hannelita merged commit 4dd9ef1 into master Jun 26, 2020
@hannelita hannelita deleted the docs/readme-improvement branch June 26, 2020 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update and organize the README after recent changes
2 participants