Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auckland Museum #1771

Closed
obulat opened this issue Apr 21, 2021 · 4 comments · Fixed by #3258
Closed

Auckland Museum #1771

obulat opened this issue Apr 21, 2021 · 4 comments · Fixed by #3258
Assignees
Labels
help wanted Open to participation from the community 🟩 priority: low Low priority and doesn't need to be rushed ☁️ provider: images Image provider 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧹 status: ticket work required Needs more details before it can be worked on
Projects

Comments

@obulat
Copy link
Contributor

obulat commented Apr 21, 2021

Source Site

https://www.aucklandmuseum.com/discover/collections/our-data

Provider API Endpoint / Documentation

Tutorial: https://github.com/AucklandMuseum/API/wiki/Tutorial#fields-available-for-query-string-searches
Endpoint: https://api.aucklandmuseum.com/

Provider description

The Auckland Museum displays wide varieties of historical artifacts and collections in addition to providing education and research resources.

Provider API Technical info

Rate Limit: 10 requests per second/ 1000 requests per day.


This issue has been migrated from the CC Search Catalog repository

Author: akshgpt7
Date: Mon Feb 24 2020
Labels: Hacktoberfest,help wanted,providers,✨ goal: improvement,🙅 status: discontinued

Original Comments:

akshgpt7 commented on Tue Feb 25 2020:

@annatuma, @mathemancer I found out that the Auckland Museum's online collection has CC licensed images, which can be identified by the copyright field in the response JSON.

mathemancer commented on Tue Feb 25 2020:

@akshgpt7 That's great, thank you! Further info to gather would be:

  • What is the overall volume of objects to be queried?
  • Is it possible to catalog all CC-licensed items via API calls in some reasonable way? (I.e., there must be some way to systematically loop through all objects, gathering their metadata)

akshgpt7 commented on Fri Feb 28 2020:

@mathemancer

  • The current volume of objects with CC in the copyright field is a bit over 100k.
  • To catalog the CC-licensed items, the following endpoint can be used https://api.aucklandmuseum.com/search/collectionsonline/_search?q=copyright:CC&has_image=true. For each object, there's a primaryRepresentation field that contains the image url. I believe it's very much possible to extract the items through the same.
    Most of the images are licensed under this license: http://creativecommons.org/licenses/by/4.0/ (Needs confirmation).

mathemancer commented on Mon Mar 02 2020:

This is great info @akshgpt7 , thank you!

annatuma commented on Thu Mar 05 2020:

@akshgpt7 you are welcome to tackle this integration if you're interested in doing so. Let us know, for now this ticket is assigned to you to work on.

akshgpt7 commented on Fri May 29 2020:

@mathemancer I have been working on this script.
I have one question moving forward. There is a rate limit of 1000 requests per day. How do I go about handling checks to not exceed that in a day?

I was thinking something like getting the time of the day at the start of the script, and maintaining a request_count. Then getting the time of the day on each request and stop the script if request_count hits 1000 (on the same day), or refresh it if the day passes before completing 1000 requests.

However, I'm not sure if this is the right way to go about handling the 1000 requests/day limit. Moreover, how do we make sure to start off the next day from the same page where we left on the previous day?

mathemancer commented on Mon Jun 15 2020:

@akshgpt7 I suggest using the DelayedRequester class with a delay of 87 seconds. This will keep the overall number of requests under the limit.

@dhruvkb dhruvkb added help wanted Open to participation from the community 🧹 status: ticket work required Needs more details before it can be worked on labels Jun 14, 2021
@obulat obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 24, 2023
@obulat obulat added the 🟩 priority: low Low priority and doesn't need to be rushed label Mar 8, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
@AetherUnbound AetherUnbound added the ☁️ provider: images Image provider label May 6, 2023
@obulat obulat changed the title [API Integration] Auckland Museum (original #280) Auckland Museum Jun 20, 2023
@ngken0995
Copy link
Collaborator

@obulat Can I be assign to this?

@ngken0995
Copy link
Collaborator

@obulat https://api.aucklandmuseum.com/search/collectionsonline/_search which doesn’t specify any query but simply returns all records. The hits section shows the total number of records that matched our search query and the max limit might be 10000. Also, http://api.aucklandmuseum.com/search/collectionsonline/_search?q=copyright:CC has as total number of 10000. Will this prevent us from retrieving all the data?

@obulat
Copy link
Contributor Author

obulat commented Oct 23, 2023

@obulat https://api.aucklandmuseum.com/search/collectionsonline/_search which doesn’t specify any query but simply returns all records. The hits section shows the total number of records that matched our search query and the max limit might be 10000. Also, http://api.aucklandmuseum.com/search/collectionsonline/_search?q=copyright:CC has as total number of 10000. Will this prevent us from retrieving all the data?

I think that we should

  1. ingest the 10000 records, even if that's not the whole collection.
  2. try to reach out to them to ask if it's possible to get all of the data somehow. But 2 should not prevent us from doing 1.
    @WordPress/openverse-catalog, can we reach out to the museum and ask them if there's a way of accessing more than 10000 results?

@ngken0995
Copy link
Collaborator

Duplicate: #2598

Licenses Provided

Not everything is openly licensed but some things are PDM or CC: https://www.aucklandmuseum.com/legal/rights-and-permissions

Reference from #2598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Open to participation from the community 🟩 priority: low Low priority and doesn't need to be rushed ☁️ provider: images Image provider 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧹 status: ticket work required Needs more details before it can be worked on
Projects
Archived in project
Openverse
  
Backlog
Development

Successfully merging a pull request may close this issue.

4 participants