Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arguments for simple H3 model than could learn Causal or Masked LM modeling? #3

Closed
kingb12 opened this issue Mar 12, 2023 · 5 comments

Comments

@kingb12
Copy link

kingb12 commented Mar 12, 2023

Hi all! I've been trying to get the training experiments to run and struggling with some errors in which hydra cannot parse the configs given at /experiment/pile/h3 (I was following the instructions in experiments.md for the Pile). I'm actually hoping to train on a different dataset entirely though, for which I already have a working pipeline. Given correct installation of the dependencies in this repo, is there a best way to instantiate an H3 model that would be suitable for Causal language modeling, and/or for Masked LM?

For example, hoping to come up with something comparable to this, and just using it in my existing pipeline:

config = AutoConfig.from_pretrained(
        "roberta-base",
        vocab_size=tokenizer.vocab_size,
        random_init=True,
        is_decoder=True
    )

model = RobertaForCausalLM(config)

Here is what I have for Causal LM, though I am still trying to sort out the dependencies to get it to run.

model = ConvLMHeadModel(
        d_model=768, n_layer=12, d_inner=768 * 4,
        vocab_size=tokenizer.vocab_size, resid_dropout=0.0, embed_dropout=0.1,
        layer="h3", attn_layer_idx=[1, 8],
        attn_cfg=None,
        fused_mlp=True,
        fused_dropout_add_ln=True,
        residual_in_fp32=True,
        pad_vocab_size_multiple=8,
    )

Are these reasonable choices for these arguments, and/or are there others I would need to specify? Are there important pieces of the training or data preparation specified in the configs, that I would need to replicate in another pipeline designed for a HF transformer LM? Sorry if these are obvious/documented, was just having a hard time reading through the configs.

@DanFu09
Copy link
Contributor

DanFu09 commented Mar 12, 2023 via email

@kingb12
Copy link
Author

kingb12 commented Mar 12, 2023

Awesome, thanks! This worked great. Just for my understanding, is the ConvLMHeadModel a synonym for SSMLMHeadModel, or do they just share the same constructor signature?

I also wanted to ask if there are any necessary steps to apply attention masks/prevent forward leakage of information in causal LM training for this model. Based on my experiments so far results would suggest there is no such leakage, but I was surprised no attention_mask or similar arguments are accepted in forward. Apologies if I'm missing something that should be obvious for SSMs/H3, I'm not very familiar with how they work and am mostly trying to assist some students in setting up/debugging the model for one of their experiments

@DanFu09
Copy link
Contributor

DanFu09 commented Mar 12, 2023 via email

@DanFu09
Copy link
Contributor

DanFu09 commented Mar 12, 2023 via email

@kingb12
Copy link
Author

kingb12 commented Mar 19, 2023

Hi I had forgotten to come back to this, but this was very helpful and everything works as expected! Thanks for the help and the insights

@kingb12 kingb12 closed this as completed Mar 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants