Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

[API Integration - AUDIO] IMSLP #363

Closed
6 tasks
annatuma opened this issue Apr 9, 2020 · 1 comment
Closed
6 tasks

[API Integration - AUDIO] IMSLP #363

annatuma opened this issue Apr 9, 2020 · 1 comment
Labels
✨ goal: improvement Improvement to an existing feature providers 🙅 status: discontinued Not suitable for work as repo is in maintenance

Comments

@annatuma
Copy link
Contributor

annatuma commented Apr 9, 2020

Provider API Endpoint / Documentation

https://imslp.org/api.php

Provider description

This is a provider of sheet music and music recordings. There is CC-licensed content in both categories. For this ticket, we'd like to ingest the CC-licensed audio. In the future we may also want to ingest sheet music, but that is out of scope here.

Example file:
https://imslp.org/wiki/Dilatate_sunt_tribulationes_(Abbatini%2C_Antonio_Maria)

Ticket work required beyond this point

Licenses Provided

Provider API Technical info

Checklist to complete before beginning development

No development should be done on a Provider API Script until the following info is gathered:

  • Verify there is a way to retrieve the entire relevant portion of the provider's collection in a systematic way via their API.
  • Verify the API provides license info (license type and version; license URL provides both, and is preferred)
  • Verify the API provides stable direct links to individual works.
  • Verify the API provides a stable landing page URL to individual works.
  • Note other info the API provides, such as thumbnails, dimensions, attribution info (required if non-CC0 licenses will be kept), title, description, other meta data, tags, etc.
  • Attach example responses to API queries that have the relevant info.

General Recommendations for implementation

  • The script should be in the src/cc_catalog_airflow/dags/provider_api_scripts/ directory.
  • The script should have a test suite in the same directory.
  • The script must use the ImageStore class (Import this from
    src/cc_catalog_airflow/dags/provider_api_scripts/common/storage/image.py).
  • The script should use the DelayedRequester class (Import this from
    src/cc_catalog_airflow/dags/provider_api_scripts/common/requester.py).
  • The script must not use anything from
    src/cc_catalog_airflow/dags/provider_api_scripts/modules/etlMods.py, since
    that module is deprecated.
  • If the provider API has can be queried by 'upload date' or something similar,
    the script should take a --date parameter when run as a script, giving the
    date for which we should collect images. The form should be YYYY-MM-DD (so,
    the script can be run via python my_favorite_provider.py --date 2018-01-01).
  • The script must provide a main function that takes the same parameters as from
    the CLI. In our example from above, we'd then have a main function
    my_favorite_provider.main(date). The main should do the same thing calling
    from the CLI would do.
  • The script must conform to PEP8. Please use pycodestyle (available via
    pip install pycodestyle) to check for compliance.
  • The script should use small, testable functions.
  • The test suite for the script may break PEP8 rules regarding long lines where
    appropriate (e.g., long strings for testing).

Examples of other Provider API Scripts

For example Provider API Scripts and accompanying test suites, please see

  • src/cc_catalog_airflow/dags/provider_api_scripts/flickr.py and
  • src/cc_catalog_airflow/dags/provider_api_scripts/test_flickr.py, or
  • src/cc_catalog_airflow/dags/provider_api_scripts/wikimedia_commons.py and
  • src/cc_catalog_airflow/dags/provider_api_scripts/test_wikimedia_commons.py.
@annatuma annatuma added this to Pending Review in CC Catalog Pipeline via automation Apr 9, 2020
@kgodey kgodey added this to Pending Review in Backlog Apr 9, 2020
@annatuma annatuma moved this from Pending Review to Blocked in CC Catalog Pipeline Apr 9, 2020
@annatuma annatuma removed this from Pending Review in Backlog Apr 9, 2020
@mathemancer
Copy link
Contributor

I love IMSLP! However, note that almost all of the audio you'll find there will be MIDI 'performances'. (just play the mp3 for the Example file). Given that, I'm not sure this should be highly-prioritized.

@kgodey kgodey added 🚧 status: blocked Blocked & therefore, not ready for work ✨ goal: improvement Improvement to an existing feature 🧹 status: ticket work required Needs more details before it can be worked on and removed blocked labels Sep 22, 2020
@cc-open-source-bot cc-open-source-bot added the 🏷 status: label work required Needs proper labelling before it can be worked on label Dec 2, 2020
@kgodey kgodey added this to [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey removed this from [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey added this to Pending Review in Backlog Dec 2, 2020
@kgodey kgodey added this to [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey removed this from [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey added this to [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey removed this from [TEMPORARY] Deprioritize in Active Sprint Dec 2, 2020
@kgodey kgodey added 🙅 status: discontinued Not suitable for work as repo is in maintenance and removed 🏷 status: label work required Needs proper labelling before it can be worked on 🚧 status: blocked Blocked & therefore, not ready for work 🧹 status: ticket work required Needs more details before it can be worked on labels Dec 16, 2020
@kgodey kgodey closed this as completed Dec 16, 2020
@kgodey kgodey moved this from Pending Review to Done in Backlog Dec 16, 2020
@TimidRobot TimidRobot removed this from Blocked in CC Catalog Pipeline Jan 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
✨ goal: improvement Improvement to an existing feature providers 🙅 status: discontinued Not suitable for work as repo is in maintenance
Development

No branches or pull requests

4 participants