Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

[API Integration - TEXT] Unglue.it #193

Closed
annatuma opened this issue Nov 15, 2019 · 0 comments
Closed

[API Integration - TEXT] Unglue.it #193

annatuma opened this issue Nov 15, 2019 · 0 comments
Labels
✨ goal: improvement Improvement to an existing feature providers 🙅 status: discontinued Not suitable for work as repo is in maintenance

Comments

@annatuma
Copy link
Contributor

annatuma commented Nov 15, 2019

This is a provider of texts and is therefore blocked by the Catalog not being ready to ingest that content type at this time

Provider API Endpoint / Documentation

https://unglue.it/api/help

Internal users only: CC has an API key for this service, please check CC's password manager.

Provider description

A provider of openly licensed ebooks, some of which are available from Project Gutenberg.

Licenses Provided

They indicate that the works on their site as CC licensed or have another open license. We'd need to restrict ingestion to CC licenses.

Provider API Technical info

There isn't a clear way for a frontend user to filter books on the site by license type.

The basic API documentation doesn't include license info at the high level:
https://unglue.it/api/v1/?format=json

However, they reference an ONIX structure, where rights information is returned in the Epub License field:

CC BY-NC-ND

01
https://creativecommons.org/licenses/by-nc-nd/3.0/

For example:
https://unglue.it/api/onix/by-nc-nd/epub/?max=20

More work is needed to determine if we can get all the information we need for ingestion

General Recommendations for implementation

  • The script should be in the src/cc_catalog_airflow/dags/provider_api_scripts/ directory.
  • The script should have a test suite in the same directory.
  • The script must use the ImageStore class (Import this from
    src/cc_catalog_airflow/dags/provider_api_scripts/common/storage/image.py).
  • The script should use the DelayedRequester class (Import this from
    src/cc_catalog_airflow/dags/provider_api_scripts/common/requester.py).
  • The script must not use anything from
    src/cc_catalog_airflow/dags/provider_api_scripts/modules/etlMods.py, since
    that module is deprecated.
  • If the provider API has can be queried by 'upload date' or something similar,
    the script should take a --date parameter when run as a script, giving the
    date for which we should collect images. The form should be YYYY-MM-DD (so,
    the script can be run via python my_favorite_provider.py --date 2018-01-01).
  • The script must provide a main function that takes the same parameters as from
    the CLI. In our example from above, we'd then have a main function
    my_favorite_provider.main(date). The main should do the same thing calling
    from the CLI would do.
  • The script must conform to PEP8. Please use pycodestyle (available via
    pip install pycodestyle) to check for compliance.
  • The script should use small, testable functions.
  • The test suite for the script may break PEP8 rules regarding long lines where
    appropriate (e.g., long strings for testing).

Examples of other Provider API Scripts

For example Provider API Scripts and accompanying test suites, please see

  • src/cc_catalog_airflow/dags/provider_api_scripts/flickr.py and
  • src/cc_catalog_airflow/dags/provider_api_scripts/test_flickr.py, or
  • src/cc_catalog_airflow/dags/provider_api_scripts/wikimedia_commons.py and
  • src/cc_catalog_airflow/dags/provider_api_scripts/test_wikimedia_commons.py.
@annatuma annatuma created this issue from a note in CC Catalog Pipeline (Pending Review) Nov 15, 2019
@annatuma annatuma moved this from Pending Review to Blocked in CC Catalog Pipeline Feb 24, 2020
@annatuma annatuma changed the title https://unglue.it/ [API Integration - TEXT] Unglue.it Feb 24, 2020
@kgodey kgodey added 🚧 status: blocked Blocked & therefore, not ready for work ✨ goal: improvement Improvement to an existing feature and removed blocked labels Sep 22, 2020
@cc-open-source-bot cc-open-source-bot added the 🏷 status: label work required Needs proper labelling before it can be worked on label Dec 2, 2020
@kgodey kgodey added this to [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey removed this from [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey added this to Pending Review in Backlog Dec 2, 2020
@kgodey kgodey added 🙅 status: discontinued Not suitable for work as repo is in maintenance and removed 🏷 status: label work required Needs proper labelling before it can be worked on 🚧 status: blocked Blocked & therefore, not ready for work labels Dec 16, 2020
@kgodey kgodey closed this as completed Dec 16, 2020
@kgodey kgodey moved this from Pending Review to Done in Backlog Dec 16, 2020
@TimidRobot TimidRobot removed this from Blocked in CC Catalog Pipeline Jan 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
✨ goal: improvement Improvement to an existing feature providers 🙅 status: discontinued Not suitable for work as repo is in maintenance
Development

No branches or pull requests

3 participants