-
Notifications
You must be signed in to change notification settings - Fork 437
Add ConTextTab model #1528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ConTextTab model #1528
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your PR 🤗 - left some minor nits to be addressed and then we can proceed to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Thanks for the PR 🤗 Following up on @Vaibhavs10 's previous comments, I would also suggest top-level weights and a multi-repo structure. This is what we recommend to library authors to get the best user experience on the Hub. Let me know what you think :)
@@ -132,6 +132,61 @@ wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) | |||
ta.save("test-2.wav", wav, model.sr)`, | |||
]; | |||
|
|||
export const contexttab = (): string[] => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, these 3 snippets don't really show how to use the model in the repo. Adding snippets would be very beneficial if contexttab had a from_pretrained
method for instance to load a pre-trained model from a repo_id. By doing so, you allow users to share their own weights in new repos which will result in personalized snippets to reuse their model. Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand the confusion! I've omitted the default parameters, but both ConTextTabClassifier
and ConTextTabRegressor
shown in the example use the HF hub model weights, taking a checkpoint
and checkpoint_revision
argument pointing to the HF repo. I can add it to be more verbose if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wauplin : done ✅
prettyLabel: "ConTextTab", | ||
repoName: "contexttab", | ||
repoUrl: "https://github.com/SAP-samples/contexttab", | ||
countDownloads: `path:"l2/base.pt"`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting back to the discussion above #1528 (comment).
I do think the best structure would be:
- store the weights at top level in the repo as mentioned by @Vaibhavs10
- create 1 repo per checkpoint that you want to add
- add in each readme why the checkpoint is particular (how it has been trained, which data, etc.). It is still possible to refer to a "base repo" for more details about the family of models
- have all models tracked with
library_name: contexttab
Doing so has many advantages:
- you can define a common structure for all
contexttab
repos on the Hub and therefore implement methods likefrom_pretrained
/push_to_hub
in your library that will help other users build on top of your work and share their results - related to 1., the
snippets
section should be much more relevant with personalized snippets (see above) - you will have 1 download counter per checkpoint. If everything is in the same repo, the counter will mix everything and you won't really know which ones got the most traction
- no need to maintain a countDownloads` rule with hardcoded paths
- improves discoverability for users with correct tags and filtering
This is why we usually strongly recommend 1 model == 1 repo instead of having all of them in one repo. If you want, you can use the Collection feature to nicely group several repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context! I absolutely understand the benefits, however we have to go through an approval process for open sourcing repositories so this approach would add quite a lot of overhead on our side. Hence, having the folder structure in the repo together with version tagging is, for us, much more efficient to maintain. I do understand the downsides related to, e.g., download counting but would propose to keep it as is and only create new repositories for major model changes. Would that work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the constraints, so yes let's keep it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw we double-checked with @Vaibhavs10 and path_extension:"pt"
works correclty on sub-folders so you won't have to manage a list of hardcoded paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice, perfect, than I'll revert that back!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
Adding ConTextTab model to library and snippets.
The checkpoints are linked to
https://github.com/SAP-samples/contexttab
and a custom download stats filter is added.