Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor][l] Splitting main lib files #18

Merged
merged 2 commits into from
Jun 16, 2020
Merged

Conversation

hannelita
Copy link
Contributor

@hannelita hannelita commented Jun 16, 2020

Tackling the Housekeeping task from #10:

  • Splitting previous load.py into two separate files: hybrid_load and api_load
  • Splitting DAG files. This PR focuses on dags/api_ckan_load_single_node.py, which contains a single node DAG that runs on airflow.
  • Cleans up unused code / imports/ comments
  • Gets rid of hard-coded json and any other pieces of that. Everything now is parameterized on DAG config.
  • Refactors hybrid_load to support a resource_id
  • Better method names (e.g. load_resource_via_api instead of load_csv_via_api
  • Docs

…db + api) files + refactroing DAGs with new library file locations
@hannelita hannelita added the WIP Work in progress label Jun 16, 2020
@hannelita hannelita removed the WIP Work in progress label Jun 16, 2020
Copy link
Member

@rufuspollock rufuspollock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. Some minor stuff on README you could fix in next commit.


### Example 2: Local file to CKAN DataStore

### Example 2: Local file to CKAN DataStore using the Datastore API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

We'll assume you have:

* a local CKAN setup and running at https://localhost:5000
* a dataset (for example, `my-dataset`) with a resource (for example, `my-resource` with the `my-res-id-123` as its `resource_id`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For really nice quickstart we would provide them with a script to do that - i think this was in original README. But it is kind of a bonus.


3. Make sure you have these environment variables properly set up:
```bash
export LC_ALL=en_US.UTF-8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these needed. Maybe a small note.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also indentation.

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
```

4. Run the airflow webserver: `airflow webserver`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been happening as part of airflow setup (?) Or do we have to redo because we reset the dags folder?


9. Check your CKAN instance and verify that the data has been loaded.

10. Trigger the DAG with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do i run this after seeing if the data is loaded?

* The DAG will convert your CSV file to a JSON file and the upload it. `json_output` specifies the path where you want to dump your JSON file.


9. Check your CKAN instance and verify that the data has been loaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do i do this? perhaps a small curl example.

logging.info('Loading CSV via API')
try:
with open('/Users/hannelita/Development/freelance/Datopian/aircan/dags/r3.json') as f:
records = json.load(f)
return load_csv_via_api(
return load_resource_via_api(
"0d1505b5-a8ef-4c3b-b19b-101b8e594d6e", records, get_config())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there an explicit uuid here.

@@ -0,0 +1,43 @@
# Standard library imports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean 👏

@rufuspollock rufuspollock merged commit 47ad7aa into master Jun 16, 2020
@hannelita hannelita deleted the feat/lib-load-refactor branch June 19, 2020 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants