Automatically add metadata to Hugging Face Hub repos when uploading projects #790

juhoinkinen · 2024-05-28T11:07:58Z

The 🤗 Hugging Face Hub has good support for metadata of the model, dataset and spaces repositories with the yaml section of the Model Cards (i.e. the README.md files of the repos).

While updating Annif-tutorial, specifically writing the Model card section of the Hugging Face Hub exercise, I noted that it is possible to create the Model card programmatically with the HFH client and also to directly update the metadata of the Model card.

Adding the metadata to the Model card is easy via the HFH website UI, but I still assume most users won't remember or bother to do it, so it would be nice that the some metadata was added automatically when running annif upload. It could be the default behaviour, which could be opted-out with an option of annif upload.

For example, to set the model task to "text-classification" and a custom tag "annif" this code could be used:

from huggingface_hub import metadata_update

metadata_update(
    "juhoinkinen/Annif-models-upload-testing",
    {
        "pipeline_tag": "text-classification",
        "tags": ["annif"]
    }
)

I think these are the most import metadata, and they could be hard coded, but also the language(s) could be considered, as they are available in the vocabulary. See e.g. this NatLibFi/FintoAI-data-YSO repo for also some other (manually added) metadata.

There is just one small issue with the metadata_update() use: if the README.md does not exist in a repo, it is created with with the ModelCard template contents, which I feel is not well applicable for Annif projects, because it expects very much information. So either an empty README.md should be uploaded before calling metadata_update(), or maybe that function the HFH client could be modified to accept a parameter to control whether the Model card template is used or not.

This would be continuation to PR #762.

The text was updated successfully, but these errors were encountered:

osma · 2024-05-29T13:08:27Z

Instead of adding a totally empty README.md, could we use our own model card template that's better aligned with Annif models? For example it could include the Annif version used for training, the backend, vocabulary name and size, possibly some of the hyperparameters / configuration settings as well.

juhoinkinen added the enhancement label May 28, 2024

juhoinkinen mentioned this issue Jun 17, 2024

Automatically add metadata to Hugging Face Hub repos when uploading projects #793

Merged

juhoinkinen linked a pull request Jun 17, 2024 that will close this issue

Automatically add metadata to Hugging Face Hub repos when uploading projects #793

Merged

juhoinkinen added this to the 1.2 milestone Sep 17, 2024

juhoinkinen closed this as completed in #793 Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically add metadata to Hugging Face Hub repos when uploading projects #790

Automatically add metadata to Hugging Face Hub repos when uploading projects #790

juhoinkinen commented May 28, 2024 •

edited

Loading

osma commented May 29, 2024

Automatically add metadata to Hugging Face Hub repos when uploading projects #790

Automatically add metadata to Hugging Face Hub repos when uploading projects #790

Comments

juhoinkinen commented May 28, 2024 • edited Loading

osma commented May 29, 2024

juhoinkinen commented May 28, 2024 •

edited

Loading