Skip to content

Add ConTextTab model #1528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 18, 2025
Merged

Add ConTextTab model #1528

merged 9 commits into from
Jun 18, 2025

Conversation

MaxSchambach
Copy link
Contributor

@MaxSchambach MaxSchambach commented Jun 13, 2025

Adding ConTextTab model to library and snippets.
The checkpoints are linked to https://github.com/SAP-samples/contexttab and a custom download stats filter is added.

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your PR 🤗 - left some minor nits to be addressed and then we can proceed to merge.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for the PR 🤗 Following up on @Vaibhavs10 's previous comments, I would also suggest top-level weights and a multi-repo structure. This is what we recommend to library authors to get the best user experience on the Hub. Let me know what you think :)

@@ -132,6 +132,61 @@ wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)`,
];

export const contexttab = (): string[] => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, these 3 snippets don't really show how to use the model in the repo. Adding snippets would be very beneficial if contexttab had a from_pretrained method for instance to load a pre-trained model from a repo_id. By doing so, you allow users to share their own weights in new repos which will result in personalized snippets to reuse their model. Does that make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand the confusion! I've omitted the default parameters, but both ConTextTabClassifier and ConTextTabRegressor shown in the example use the HF hub model weights, taking a checkpoint and checkpoint_revision argument pointing to the HF repo. I can add it to be more verbose if you like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wauplin : done ✅

prettyLabel: "ConTextTab",
repoName: "contexttab",
repoUrl: "https://github.com/SAP-samples/contexttab",
countDownloads: `path:"l2/base.pt"`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting back to the discussion above #1528 (comment).

I do think the best structure would be:

  • store the weights at top level in the repo as mentioned by @Vaibhavs10
  • create 1 repo per checkpoint that you want to add
  • add in each readme why the checkpoint is particular (how it has been trained, which data, etc.). It is still possible to refer to a "base repo" for more details about the family of models
  • have all models tracked with library_name: contexttab

Doing so has many advantages:

  1. you can define a common structure for all contexttab repos on the Hub and therefore implement methods like from_pretrained/push_to_hub in your library that will help other users build on top of your work and share their results
  2. related to 1., the snippets section should be much more relevant with personalized snippets (see above)
  3. you will have 1 download counter per checkpoint. If everything is in the same repo, the counter will mix everything and you won't really know which ones got the most traction
  4. no need to maintain a countDownloads` rule with hardcoded paths
  5. improves discoverability for users with correct tags and filtering

This is why we usually strongly recommend 1 model == 1 repo instead of having all of them in one repo. If you want, you can use the Collection feature to nicely group several repos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context! I absolutely understand the benefits, however we have to go through an approval process for open sourcing repositories so this approach would add quite a lot of overhead on our side. Hence, having the folder structure in the repo together with version tagging is, for us, much more efficient to maintain. I do understand the downsides related to, e.g., download counting but would propose to keep it as is and only create new repositories for major model changes. Would that work for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the constraints, so yes let's keep it like this.

Copy link
Contributor

@Wauplin Wauplin Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw we double-checked with @Vaibhavs10 and path_extension:"pt" works correclty on sub-folders so you won't have to manage a list of hardcoded paths

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, perfect, than I'll revert that back!

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating!

@Wauplin Wauplin merged commit 7d32a9e into huggingface:main Jun 18, 2025
3 of 4 checks passed
@MaxSchambach MaxSchambach deleted the add-contexttab branch June 19, 2025 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants