[Proposal] Support DDP for activation generation and SAE training. #14

Hzfinfdu · 2024-06-06T09:00:57Z

A natural approach to faster SAE training is data parallel. Maybe we can just simply use DDP to make 8 copies of the TL model to yield activation and synchronize SAE gradients. This may help accelerate activation gen, which is the speed bottleneck for larger LMs.

This may not work on larger size models, say 70B models. Maybe the ultimate solution is a producer-consumer design pattern. Let's leave this for later.

Support DDP

alanxmay · 2024-06-13T12:11:23Z

@Hzfinfdu Hi, thanks for your amazing work.

Is DDP working now? I tried with 4 GPU, but found process on device (1,2,3) never end.

alanxmay · 2024-06-13T12:22:19Z

BTW, I did some modification to get over a bug in the ddp code.

Error message without modification:

...
[rank3]:   File "/home/alan/dev/sae/Language-Model-SAEs/TransformerLens/transformer_lens/components/embed.py", line 34, in forward
[rank3]:     return self.W_E[tokens, :]
[rank3]: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:0)

Hzfinfdu · 2024-06-13T12:33:24Z

@alanxmay Thanks for your comment!

We are currently working on it. Initially we did not take industry-size models into account, neither was DDP. We may have to refactor for about a week to work with that.

8B models just work on an A100 GPU with a small batch size.

If this does not fit in your scenario, you may have to wait for a while xd.

alanxmay · 2024-06-13T12:54:14Z

@Hzfinfdu Thanks for your replay, my setup is 8*V100(32G) sadly 🤷‍♂️.

Hzfinfdu self-assigned this Jun 6, 2024

dest1n1s added the enhancement New feature or request label Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Support DDP for activation generation and SAE training. #14

[Proposal] Support DDP for activation generation and SAE training. #14

Hzfinfdu commented Jun 6, 2024

alanxmay commented Jun 13, 2024

alanxmay commented Jun 13, 2024

Hzfinfdu commented Jun 13, 2024 •

edited

Loading

alanxmay commented Jun 13, 2024

[Proposal] Support DDP for activation generation and SAE training. #14

[Proposal] Support DDP for activation generation and SAE training. #14

Comments

Hzfinfdu commented Jun 6, 2024

alanxmay commented Jun 13, 2024

alanxmay commented Jun 13, 2024

Hzfinfdu commented Jun 13, 2024 • edited Loading

alanxmay commented Jun 13, 2024

Hzfinfdu commented Jun 13, 2024 •

edited

Loading