Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Integration] Example WordPress REST API Photo Directory #1705

Closed
3 of 7 tasks
zackkrida opened this issue Sep 24, 2021 · 2 comments · Fixed by WordPress/openverse-catalog#223
Closed
3 of 7 tasks
Assignees
Labels
✨ goal: improvement Improvement to an existing user-facing feature
Projects

Comments

@zackkrida
Copy link
Member

Provider API Endpoint / Documentation

This is a sample endpoint for a photo directory using custom endpoints with the WordPress REST API. Building an integration like this will help us work with future WordPress REST APIs and learn their concepts of pagination, query params, response shapes, and more.

Our example endpoint has a homepage at: https://photodir.zack.cat/ with documentation on the available endpoints.

Licenses Provided

For our purposes here, assume all images are CC0.
License URL: https://creativecommons.org/share-your-work/public-domain/cc0

Provider API Technical info

WordPress REST API Docs

Pay special attention to pagination and how to order descending by date, and to make sure there 100 results per page:

?orderby=date&order=desc&after=2016-10-13T17:00:00&per_page=100

Please note the example API does not implement these query params.

Also, they are not implemented in this example API either, but single result landing pages are at /photos/photo/{slug}.

Checklist to complete before beginning development

No development should be done on a Provider API Script until the following info is gathered:

  • Verify there is a way to retrieve the entire relevant portion of the provider's collection in a systematic way via their API.
  • Verify the API provides license info (license type and version; license URL provides both, and is preferred)
  • Verify the API provides stable direct links to individual works.
  • Verify the API provides a stable landing page URL to individual works.
  • Note other info the API provides, such as thumbnails, dimensions, attribution info (required if non-CC0 licenses will be kept), title, description, other meta data, tags, etc.
  • Attach example responses to API queries that have the relevant info.

General Recommendations for implementation

  • The script should be in the openverse_catalog/dags/provider_api_scripts/ directory.
  • The script should have a test suite in the same directory.
  • The script must use the ImageStore class (Import this from
    openverse_catalog/dags/provider_api_scripts/common/storage/image.py).
  • The script should use the DelayedRequester class (Import this from
    openverse_catalog/dags/provider_api_scripts/common/requester.py).
  • The script must not use anything from
    openverse_catalog/dags/provider_api_scripts/modules/etlMods.py, since
    that module is deprecated.
  • If the provider API has can be queried by 'upload date' or something similar,
    the script should take a --date parameter when run as a script, giving the
    date for which we should collect images. The form should be YYYY-MM-DD (so,
    the script can be run via python my_favorite_provider.py --date 2018-01-01).
  • The script must provide a main function that takes the same parameters as from
    the CLI. In our example from above, we'd then have a main function
    my_favorite_provider.main(date). The main should do the same thing calling
    from the CLI would do.
  • The script must conform to PEP8. Please use pycodestyle (available via
    pip install pycodestyle) to check for compliance.
  • The script should use small, testable functions.
  • The test suite for the script may break PEP8 rules regarding long lines where
    appropriate (e.g., long strings for testing).

Examples of other Provider API Scripts

For example Provider API Scripts and accompanying test suites, please see

  • openverse_catalog/dags/provider_api_scripts/flickr.py and
  • openverse_catalog/dags/provider_api_scripts/test_flickr.py, or
  • openverse_catalog/dags/provider_api_scripts/wikimedia_commons.py and
  • openverse_catalog/dags/provider_api_scripts/test_wikimedia_commons.py.

Implementation

  • 🙋 I would be interested in implementing this feature.
@zackkrida zackkrida added 🧹 status: ticket work required Needs more details before it can be worked on 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work ✨ goal: improvement Improvement to an existing user-facing feature and removed 🧹 status: ticket work required Needs more details before it can be worked on 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels Sep 24, 2021
@krysal krysal self-assigned this Sep 28, 2021
@krysal
Copy link
Member

krysal commented Oct 4, 2021

After exploring the responses of the Example API I have some questions I would like to clarify.

First, this is an example response of the /wp-json/wp/v2/photos endpoint:

List of photos
[
  {
    "id": 43,
    "date": "2021-06-08T07:37:45",
    "date_gmt": "2021-06-08T07:37:45",
    "guid": {
      "rendered": "https://photodir.zack.cat/?post_type=photo&p=43"
    },
    "modified": "2021-06-08T07:37:45",
    "modified_gmt": "2021-06-08T07:37:45",
    "slug": "56560bf1d6",
    "status": "publish",
    "type": "photo",
    "link": "https://photodir.zack.cat/photo/56560bf1d6/",
    "title": {
      "rendered": "56560bf1d6"
    },
    "content": {
      "rendered": "<p>Lupinus polyphyllus (aka Washington lupine)</p>\n",
      "protected": false
    },
    "author": 3606,
    "featured_media": 44,
    "template": "",
    "meta": {
      "spay_email": ""
    },
    "photo-categories": [
      8
    ],
    "photo-colors": [
      15,
      36
    ],
    "photo-orientations": [
      24
    ],
    "photo-tags": [
      35,
      28
    ],
    "_links": {
      "self": [
        {
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photos/43"
        }
      ],
      "collection": [
        {
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photos"
        }
      ],
      "about": [
        {
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/types/photo"
        }
      ],
      "author": [
        {
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/users/3606"
        }
      ],
      "version-history": [
        {
          "count": 1,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photos/43/revisions"
        }
      ],
      "predecessor-version": [
        {
          "id": 45,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photos/43/revisions/45"
        }
      ],
      "wp:featuredmedia": [
        {
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/media/44"
        }
      ],
      "wp:attachment": [
        {
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/media?parent=43"
        }
      ],
      "wp:term": [
        {
          "taxonomy": "photo_category",
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photo-categories?post=43"
        },
        {
          "taxonomy": "photo_color",
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photo-colors?post=43"
        },
        {
          "taxonomy": "photo_orientation",
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photo-orientations?post=43"
        },
        {
          "taxonomy": "photo_tag",
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/photo-tags?post=43"
        }
      ],
      "curies": [
        {
          "name": "wp",
          "href": "https://api.fake.com/{rel}",
          "templated": true
        }
      ]
    }
  },
...
]

and this is an example response of the /wp-json/wp/v2/media endpoint:

Image details
[
  {
    "id": 44,
    "date": "2021-06-08T07:34:17",
    "date_gmt": "2021-06-08T07:34:17",
    "guid": {
      "rendered": "https://storage.googleapis.com/fake.zack.cat/2021/05/56560bf1d69971f38.94814132.jpg"
    },
    "modified": "2021-09-23T21:56:53",
    "modified_gmt": "2021-09-23T21:56:53",
    "slug": "washington-lupine",
    "status": "inherit",
    "type": "attachment",
    "link": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132.jpg",
    "title": {
      "rendered": "washington-lupine"
    },
    "author": 3606,
    "comment_status": "open",
    "ping_status": "closed",
    "template": "",
    "meta": {
      "spay_email": ""
    },
    "description": {
      "rendered": "<p class=\"attachment\"><a href='https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132.jpg'><img width=\"225\" height=\"300\" src=\"https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132-225x300.jpg\" class=\"attachment-medium size-medium\" alt=\"\" loading=\"lazy\" srcset=\"https://storage.googleapis.com/fake.zack.cat/2021/05/56560bf1d69971f38.94814132-225x300.jpg 225w, https://storage.googleapis.com/fake.zack.cat/2021/05/56560bf1d69971f38.94814132-768x1024.jpg 768w, https://storage.googleapis.com/fake.zack.cat/2021/05/56560bf1d69971f38.94814132-1152x1536.jpg 1152w, https://storage.googleapis.com/fake.zack.cat/2021/05/56560bf1d69971f38.94814132-1536x2048.jpg 1536w\" sizes=\"(max-width: 225px) 100vw, 225px\" /></a></p>\n<p>Lupinus polyphyllus (aka Washington lupine)</p>\n"
    },
    "caption": {
      "rendered": "<p>Lupinus polyphyllus (aka Washington lupine)</p>\n"
    },
    "alt_text": "",
    "media_type": "image",
    "mime_type": "image/jpeg",
    "media_details": {
      "width": 3024,
      "height": 4032,
      "file": "2021/05/56560bf1d69971f38.94814132.jpg",
      "sizes": {
        "medium": {
          "file": "56560bf1d69971f38.94814132-225x300.jpg",
          "width": 225,
          "height": 300,
          "mime_type": "image/jpeg",
          "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132-225x300.jpg"
        },
        "large": {
          "file": "56560bf1d69971f38.94814132-768x1024.jpg",
          "width": 768,
          "height": 1024,
          "mime_type": "image/jpeg",
          "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132-768x1024.jpg"
        },
        "thumbnail": {
          "file": "56560bf1d69971f38.94814132-150x150.jpg",
          "width": 150,
          "height": 150,
          "mime_type": "image/jpeg",
          "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132-150x150.jpg"
        },
        "medium_large": {
          "file": "56560bf1d69971f38.94814132-768x1024.jpg",
          "width": 768,
          "height": 1024,
          "mime_type": "image/jpeg",
          "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132-768x1024.jpg"
        },
        "1536x1536": {
          "file": "56560bf1d69971f38.94814132-1152x1536.jpg",
          "width": 1152,
          "height": 1536,
          "mime_type": "image/jpeg",
          "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132-1152x1536.jpg"
        },
        "2048x2048": {
          "file": "56560bf1d69971f38.94814132-1536x2048.jpg",
          "width": 1536,
          "height": 2048,
          "mime_type": "image/jpeg",
          "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132-1536x2048.jpg"
        },
        "full": {
          "file": "56560bf1d69971f38.94814132.jpg",
          "width": 3024,
          "height": 4032,
          "mime_type": "image/jpeg",
          "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132.jpg"
        }
      },
      "image_meta": {
        "aperture": "1.8",
        "credit": "Zack Krida",
        "camera": "Pixel 2 XL",
        "caption": "",
        "created_timestamp": "1591460495",
        "copyright": "",
        "focal_length": "4.459",
        "iso": "45",
        "shutter_speed": "0.00171",
        "title": "",
        "orientation": "1",
        "keywords": []
      },
      "gcs_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132.jpg",
      "gcs_name": "2021/05/56560bf1d69971f38.94814132.jpg",
      "gcs_bucket": "fake.zack.cat"
    },
    "post": 43,
    "source_url": "https://photodir.zack.cat/cdn/2021/05/56560bf1d69971f38.94814132.jpg",
    "_links": {
      "self": [
        {
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/media/44"
        }
      ],
      "collection": [
        {
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/media"
        }
      ],
      "about": [
        {
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/types/attachment"
        }
      ],
      "author": [
        {
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/users/3606"
        }
      ],
      "replies": [
        {
          "embeddable": true,
          "href": "https://photodir.zack.cat/api/wp-json/wp/v2/comments?post=44"
        }
      ]
    }
  },
  ...
]

We can't grab all the images data only from the first endpoint so we would need to make several requests per image:

  1. One request for most of the image details: title, url, height, width, filetype, thumbnail, meta_data...
    • The first endpoint returns a title but is the same text that the slug, nothing descriptive, so the field from the details endpoint seems to be the correct one to use
  2. One request for author information as previous endpoints seems to return just links or references, so I guess it should be an extra endpoint for this. Perhaps /wp-json/wp/v2/users/<user_id> ?
  3. N request for tags and categories, as the first endpoint return lists with ids, not the descriptive text.

So some questions are:

  • What do we use for the foreign_identifier field? There is id and slug, and slug is used for the foreign_landing_url but seems like it could be a shorter link with id also, returned in the _links.self.url field in both endpoints, and that leads to the second question
  • There seem to be two more links that can lead to the same place:
    • From the 1rs endpoint: /api/wp-json/wp/v2/photos/<photo_id>
    • From the 2nd endpoint: /api/wp-json/wp/v2/media/<photo_id>
      Just to confirm but do we still want to use the /photos/photo/{slug} form for the foreign_landing_url?
  • Do we want to include colors as tags or meta_data?

@zackkrida
Copy link
Member Author

Great questions!

  • The foreign_identifier should be the slug from the photos endpoint

  • The foreign_landing_url should be the link from the photos endpoint

  • colors should be meta_data I think, we won't show these to the user quite yet

  • The title should actually be the content.rendered on the photos endpoint, with the HTML tags stripped. FYI this is the same as caption.rendered on the media endpoint.

  • For authors, there's a /wp-json/wp/v2/users endpoint that lists all users, with the same pagination as the other endpoints, so we could either hit that endpoint and build a list of all authors, or make multiple queries per-photo. The first seems better to me performance-wise.

  • tags and categories would work the same way.

I'm going to try to get you access to the real API today @krysal, so that might make finishing this a bit easier 😄

@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ goal: improvement Improvement to an existing user-facing feature
Projects
Archived in project
Openverse
  
Done!
2 participants