Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canine model and High VRAM usage #115

Open
Qubitium opened this issue Feb 11, 2024 · 4 comments
Open

Canine model and High VRAM usage #115

Qubitium opened this issue Feb 11, 2024 · 4 comments

Comments

@Qubitium
Copy link

Qubitium commented Feb 11, 2024

@bminixhofer We are observing very high vram usage with canine model even though the wtp-canine-s-12l-no-adapters fp32 weights are only about 515MB so we naively expected batch=1 in fp16 mode to use 207.5MB of ram for weights plus runtime/inference costs. We didn't expect batch=1 vram to be 1.3GB. Input text is around 230kb text file.

Is this a bug or architecture norm for the canine model? If norm, is there anything that we can do to reduce the memory footprint? Thanks.

wtp = WtP("wtp-canine-s-12l-no-adapters")
wtp.half().to(device="cuda")
batch vram GB
1 1.309
2 1.335
4 1.385
6 1.428
8 1.487
10 1.542
12 1.583
14 1.639
16 1.688
32 2.094
@bminixhofer
Copy link
Owner

bminixhofer commented Feb 28, 2024

Hi, thanks for these benchmarks! And sorry for being slow to respond.

You could debug this by checking how much memory the vanilla CANINE (https://huggingface.co/google/canine-s) takes for a forward pass vs. a forward pass of the WtP model (see e.g. here: https://github.com/bminixhofer/wtpsplit/?tab=readme-ov-file#advanced-usage).

If there's a discrepancy there I'll investigate it. It's possible that CANINE just needs a lot of memory though, I am not super happy with that architecture and will upgrade the models to a different arch soon(ish).

@Qubitium
Copy link
Author

Will do. Btw, if you need gpu compute to train the next model, I can provide you with a A100 80+G. You can ping me up on Twitter at qbitium.

@bminixhofer
Copy link
Owner

Thanks! And that's very generous, deferring to @markus583 since he is doing the training but we are using TPUs so there is probably no need.

@markus583
Copy link
Collaborator

Very generous indeed! Thanks but the TPUs are very strong. I'd be very curious whether there is a discrepancy too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants