-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor][l] Splitting main lib files #18
Conversation
…db + api) files + refactroing DAGs with new library file locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Some minor stuff on README you could fix in next commit.
|
||
### Example 2: Local file to CKAN DataStore | ||
|
||
### Example 2: Local file to CKAN DataStore using the Datastore API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
We'll assume you have: | ||
|
||
* a local CKAN setup and running at https://localhost:5000 | ||
* a dataset (for example, `my-dataset`) with a resource (for example, `my-resource` with the `my-res-id-123` as its `resource_id`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For really nice quickstart we would provide them with a script to do that - i think this was in original README. But it is kind of a bonus.
|
||
3. Make sure you have these environment variables properly set up: | ||
```bash | ||
export LC_ALL=en_US.UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these needed. Maybe a small note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also indentation.
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES | ||
``` | ||
|
||
4. Run the airflow webserver: `airflow webserver` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been happening as part of airflow setup (?) Or do we have to redo because we reset the dags folder?
|
||
9. Check your CKAN instance and verify that the data has been loaded. | ||
|
||
10. Trigger the DAG with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do i run this after seeing if the data is loaded?
* The DAG will convert your CSV file to a JSON file and the upload it. `json_output` specifies the path where you want to dump your JSON file. | ||
|
||
|
||
9. Check your CKAN instance and verify that the data has been loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do i do this? perhaps a small curl
example.
logging.info('Loading CSV via API') | ||
try: | ||
with open('/Users/hannelita/Development/freelance/Datopian/aircan/dags/r3.json') as f: | ||
records = json.load(f) | ||
return load_csv_via_api( | ||
return load_resource_via_api( | ||
"0d1505b5-a8ef-4c3b-b19b-101b8e594d6e", records, get_config()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there an explicit uuid here.
@@ -0,0 +1,43 @@ | |||
# Standard library imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean 👏
Tackling the Housekeeping task from #10:
load.py
into two separate files:hybrid_load
andapi_load
dags/api_ckan_load_single_node.py
, which contains a single node DAG that runs on airflow.hybrid_load
to support a resource_idload_resource_via_api
instead ofload_csv_via_api