# VTX
---
[AIGen](https://github.com/LuciferianInk/aigen) is a text generation and training library, originally forked from [AITextGen](https://aitextgen.minimaxir.com/) (which is now defunct).

AIGen is also the foundation of [VTX](https://github.com/0-5788719150923125/vtx).

To use this notebook in Kaggle, one must first enable the "Internet". To do so:

1. Find "Notebook options" in the sidebar on the right-hand side of this page.
2. If required, verify your phone number.
3. Choose "Internet on"

In [None]:
# Kaggle uses an old version of CUDA, so we need to install a version of Pytorch that was built for that version.
!pip install torch>=2.1.0 --no-build-isolation --index-url https://download.pytorch.org/whl/cu110

# We don't even use this, but have to install it because of Kaggle bugs
!pip install torchaudio

# Now we install AIGen
!pip install 'git+https://github.com/LuciferianInk/aigen.git'

# Install our fork of ModuleFormer
!pip install 'git+https://github.com/LuciferianInk/ModuleFormer.git@enable-gradient-checkpointing'

## Configuration

We could set a bunch of variables here, but we don't. For now, we just hardcode things for clarity.

In [2]:
# Set some variables
import os

focus = 'frame'
os.environ["FOCUS"] = focus
os.environ["TOKENIZERS_PARALLELISM"] = 'false'

## ModuleFormer

We want to pretrain a ModuleFormer, so we import its code and configure the settings (based upon the defaults from MoLM).

In [None]:
# Pretrain configs for ModuleFormer
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoModelForSequenceClassification,
    AutoTokenizer,
)

from moduleformer import (
    ModuleFormerConfig,
    ModuleFormerForCausalLM,
    ModuleFormerForSequenceClassification,
)

AutoConfig.register("moduleformer", ModuleFormerConfig)
AutoModelForCausalLM.register(ModuleFormerConfig, ModuleFormerForCausalLM)
AutoModelForSequenceClassification.register(
    ModuleFormerConfig, ModuleFormerForSequenceClassification
)

base_model = "ibm/MoLM-350M-4B"

pretrain_config = AutoConfig.from_pretrained(base_model)
overrides = {
    "universal": True,
    "world_size": 23,
    "activation_function": 'gelu',
    "n_layer": 16,
    "n_head": 2,
    "k_att": 3,
    "k_mlp": 3,
    "n_att_experts": 8,
    "n_mlp_experts": 16,
    "n_ctx": 2048, # history_length * n_layer
    "n_embd": 768,
    "att_hidden": 256,
    "ffd_hidden": 512,
    "block_size": 128,
    "gate_type": 'gmm',
    "gating_size": 64,
    "aux_loss_type": 'mi',
    "aux_loss_weight": 0.1,
    "history_length": 128,
    "resid_pdrop": 0.1,
    "embd_pdrop": 0.1,
    "attn_pdrop": 0.1,
    "moe_pdrop": 0.1,
    "sample_topk": 2,
    "tie_word_embeddings": True,
}
setattr(pretrain_config, "_name_or_path", focus)
for k, v in overrides.items():
    setattr(pretrain_config, k, v)
print(f"modified pretrain config:")
print(pretrain_config)

## Load the model

For training, we tweak the tokenizer a bit, before loading the model.

In [None]:
# Tweak the tokenizer
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    base_model,
    cache_dir="/kaggle/working/models",
    padding="max_length",
    padding_side="left",
    use_fast=True,
    return_overflowing_tokens=True,
    truncation=True,
    trust_remote_code=True,
)

In [None]:
# Instantiate your model
import os
from aigen import aigen

prototype = aigen(
    model=base_model,
    tokenizer=tokenizer,
    cache_dir="/kaggle/working/models",
    precision=16,
    gradient_checkpointing=False,
    config=pretrain_config
)

print(prototype)

## Parameter-Efficient Fine-Tuning (PEFT)
Here is a basic example of Low-Rank Adapter training. We remove this because it's not used in pre-training.

In [None]:
# # Prepare model for PEFT training

# from peft import (
#     LoraConfig,
#     get_peft_model,
#     prepare_model_for_kbit_training,
# )

# peft_config = LoraConfig(
#     task_type="CAUSAL_LM",
#     r=4,
#     lora_alpha=16,
#     lora_dropout=0.01,
#     bias="all",
#     target_modules=[
#       "embed_in",
#       "query_key_value",
#       "dense",
#       "dense_h_to_4h",
#       "dense_4h_to_h",
#       "embed_out"
#     ]
# )

# prototype.model = prepare_model_for_kbit_training(
#     prototype.model, use_gradient_checkpointing=True
# )

# prototype.model = get_peft_model(prototype.model, peft_config)

# prototype.model.print_trainable_parameters()

## Training

Finally, we train the model with settings from above.

In [None]:
# Train the model

prototype.model.training = True

prototype.train(
    devices="auto",
    strategy="auto",
    streaming_data=[
        {"dataset": "tiiuae/falcon-refinedweb", "content_key": "content", "padding": "max_length"}
    ],
    output_dir="/kaggle/working/trained",
    batch_size=4,
    gradient_accumulation_steps=64,
    block_size=512,
    num_steps=10000,
    warmup_steps=100,
    optimizer="Lion",
    learning_rate=0.000333,
    weight_decay=0.01,
    gradient_clip_val=1.0,
    scheduler="cosine",
    generate_every=500,
    save_every=1000,
)

## Testing

For testing via this notebook, we just run an interactive inference session.

In [None]:
# Test inference

while True:
    print("PROMPT:\n")
    prompt = input()
    completion = prototype.generate(
        prompt=prompt,
        do_sample=True,
        min_length=23,
        max_new_tokens=111,
        temperature=0.9,
        eta_cutoff=0.0003,
        penalty_alpha=0.6,
        top_k=4,
        repetition_penalty=1.023,
        no_repeat_ngram_size=13,
        renormalize_logits=True,
        remove_invalid_values=True,
        max_time=60,
        use_cache=True,
    )
    print("COMPLETION:\n")
    print(completion)