New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataset: brill_iconclass #30
Comments
Looks great! Removing the candidate tag 😃 |
#self-assign |
#ready-for-review |
I have written a script for loading this one: https://huggingface.co/datasets/biglam/cultural_heritage_metadata_accuracy @albertvillanova I didn't make this one streaming. I did have a version that supported streaming but it was quite a bit slower to load. It's possible I missed something obvious though. I used the following to generate examples in the streaming version: def _generate_examples(self, download_dir):
with ZipFile(download_dir) as myzip:
with myzip.open("data.json") as json_file:
data = json.load(json_file)
for row, item in enumerate(data.items()):
filepath, labels = item
image = Image.open(myzip.open(filepath))
yield row, {"image": image, "label": labels} My own feeling is that streaming is less important for this one but if I've missed an obvious way of supporting streaming happy to hear it! |
BTW, I am not affiliated with Brill any more, so my contact address should be updated. You can use info@iconclass.org for testset related matters. |
Is it an idea to add the core data of the Iconclass system as a dataset? |
I will update that 🙂 |
That would be great. I had originally planned to also add a configuration of this dataset that had the 'translation' of the iconclass labels i.e. turing the iconclass code into the associated description. I know there used to be a Python library that allowed for these queries but I think it's no longer maintained? Adding the core data as a dataset would both be nice as its own dataset but could also potentially be used as a way of doing this 'translation'. |
Ouch, yes. that Python library was also made by me, but has terrible (read: non-existent) documentation. If there is interest in using it, will galvanise me to give it some spit-and-polish. |
That would be great, happy to offer some help with that if useful. Its possible some of the functionality could be replicated by having the underlying data available on the hub but some areas might be better served by a specific library. |
I have updated the location of this testset, it is now on: |
Thanks for letting me know — just updated the URLs. |
A URL for this dataset
https://labs.brill.com/ictestset/
Dataset description
A test dataset and challenge to apply machine learning to collections described with the Iconclass classification system.
Iconclass is a metadata standard used by some LAM institutions. This dataset is of particular interest for the following reasons:
Dataset modality
Image
Dataset licence
Creative Commons Public Domain Dedication and Certification
Other licence
No response
How can you access this data
As a download from a repository/website
Confirm the dataset has an open licence
Contact details for data custodian
posthumus@brill.com
The text was updated successfully, but these errors were encountered: