Cerebras GPT

In [1]:
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cerebras/Cerebras-GPT-111M",
                                         max_position_embeddings = 2048,
                                        ignore_mismatched_sizes = True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = AutoModelForCausalLM.from_pretrained("cerebras/Cerebras-GPT-111M",
                                max_length = 2048, max_position_embeddings = 2048,
                                    ignore_mismatched_sizes = True)

In [2]:
model.config

GPT2Config {
  "_name_or_path": "cerebras/Cerebras-GPT-111M",
  "activation_function": "gelu",
  "attn_pdrop": 0.0,
  "bos_token_id": 50256,
  "embd_pdrop": 0.0,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "max_length": 2048,
  "model_type": "gpt2",
  "n_embd": 768,
  "n_head": 12,
  "n_inner": 3072,
  "n_layer": 10,
  "n_positions": 2048,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.0,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "transformers_version": "4.28.0.dev0",
  "use_cache": true,
  "vocab_size": 50257
}

In [3]:
import os

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
os.environ['TORCH_USE_CUDA_DSA'] = '1'

In [4]:
device_index = 0 # index of the GPU device you want to use
device = torch.device('cuda', device_index)
device

device(type='cuda', index=0)

In [5]:
# Set the seed for reproducibility
torch.manual_seed(0)
if device.type == 'cuda':
    torch.cuda.manual_seed_all(0)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

In [6]:
# Set the maximum split size to avoid memory fragmentation
torch.backends.cuda.max_split_size_bytes = 128 * 1024 * 1024  # 128 MB

In [7]:
model = model.to(device)
model.device

device(type='cuda', index=0)

In [8]:
# prompt = 'How many enterprise Tactics IDs are in the Mitre Att&ck v13 matrix released in April 2023?\n\n Assistant:'
# prompt = """ 
# Explain the following payload. Payload is GET /jenkins/postnuke/index.php?module=My_eGallery&do=showpic&pid=-1/**/AND/**/1=2/**/UNION/**/ALL/**/SELECT/**/0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,concat(0x3C7230783E,pn_uname,0x3a,pn_pass,0x3C7230783E),0,0,0/**/FROM/**/md_users/**/WHERE/**/pn_uid=$id/* HTTP/1.1 Content-Length: 36 Content-Type: application/x-www-form-urlencoded Host: www.test.go.kr Connection: Keep-Alive User-Agent: Mozilla/5.00 (Nikto/2.1.6) (Evasions:None).\n\nAssistant:
# """
prompt = 'User: What is xss attack?\n\nAssistant: '

In [9]:
from transformers import pipeline

# pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device = 0)

generated_text = pipe(prompt, max_length=256, do_sample=False, no_repeat_ngram_size=2)[0]
print(generated_text['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


User: What is xss attack?

Assistant: 
   - I want to know what is the problem.
- I have a question about the xs attack.  I don't know how to solve it. I know I can't find a solution. But I'm not sure how I could solve this problem, I just want it to work.




Pytorch based inference

In [10]:
# Encode the prompt using the tokenizer
input_ids = tokenizer.encode(prompt, return_tensors="pt")

In [11]:
# Generate text based on the encoded prompt
with torch.no_grad():
    output = model.generate(
        # input_ids=input_ids,
        input_ids=input_ids.to(device),

        do_sample=True,
        top_p=0.75,
        top_k=85,
        temperature=1.99,
        typical_p=1,
        repetition_penalty=1.3,
        max_length=2048,  # The maximum number of tokens to generate
        num_beams=5,    # The number of beams to use for beam search
#         early_stopping=True,  # Stop generation when the model predicts an end-of-sequence token
    )

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [12]:
# Decode the generated text and print it
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

User: What is xss attack?

Assistant:  
Cards,

Gives me a bit more on how to run the game. In my opinion the game is really going to be extremely easy to find. You are able to do things with very high speed. You can learn how it's going to work if you can't have good control and you will know what to do.

And just to give you all these tips in there are two steps to get the most out of your game.

First: The first step is to put it all together and play it all up in one shot. Next, move it all up in one shot until it becomes a shot. This is exactly the same way as the first step.

So, if you want a little bit more control than a few shots, you will be ready to play the second one. It is also a great idea to put both arms around the ball (not the ball) while playing the third one.

Then, when you have the ball off the ball, you will almost certainly need to move the ball closer to the ball so it doesn't have to jump right in that position.

Then, the ball is now going to come through t

Pytorch based fine tuning

In [13]:
import datasets
dataset = datasets.load_dataset('json', data_files='security_base_sample.json', field='train')

Found cached dataset json (/home/ngcsm/.cache/huggingface/datasets/json/default-99d06e9db9babc1b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)


  0%|          | 0/1 [00:00<?, ?it/s]

In [14]:
dataset

DatasetDict({
    train: Dataset({
        features: ['input', 'instruction', 'output'],
        num_rows: 90
    })
})

In [15]:
cutoff_len = 2048

def generate_prompt(entry):
    if entry['input']:
        return f"User: {entry['instruction']}: {entry['input']}\n\nAssistant: {entry['output']}"
    else:
        return f"User: {entry['instruction']}\n\nAssistant: {entry['output']}"

def tokenize(item, add_eos_token=True):
    result = tokenizer(
        generate_prompt(item),
        truncation=True,
        max_length=cutoff_len,
        padding=False,
        return_tensors=None,
    )

    if (
        result["input_ids"][-1] != tokenizer.eos_token_id
        and len(result["input_ids"]) < cutoff_len
        and add_eos_token
    ):
        result["input_ids"].append(tokenizer.eos_token_id)
        result["attention_mask"].append(1)

    result["labels"] = result["input_ids"].copy()

    return result

In [16]:
train_val = dataset["train"].train_test_split(test_size=0.05, shuffle=True, seed=595)
train_data = train_val["train"].shuffle().map(tokenize)
val_data = train_val["test"].shuffle().map(tokenize)

Loading cached split indices for dataset at /home/ngcsm/.cache/huggingface/datasets/json/default-99d06e9db9babc1b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-101ac43da4056645.arrow and /home/ngcsm/.cache/huggingface/datasets/json/default-99d06e9db9babc1b/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-0a3e7f6d279aa471.arrow


Map:   0%|          | 0/85 [00:00<?, ? examples/s]

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

In [17]:
len(train_data)

85

In [18]:
len(val_data)

5

In [None]:
if 'model' in globals(): 
    del model
    # torch.cuda.empty_cache()

model = transformers.AutoModelForCausalLM.from_pretrained(
    'cerebras/Cerebras-GPT-111M',    
    
    load_in_8bit=True,
    torch_dtype=torch.float16,

    device_map={'': 0}
    # device = torch.device("cpu")
    # device_map = 'auto'
)

In [None]:
import peft

model = peft.prepare_model_for_int8_training(model)

model = peft.get_peft_model(model, peft.LoraConfig(
    r=8,
    lora_alpha=16,
    # target_modules=["q_proj", "v_proj"],
    target_modules=["c_attn"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
))


In [21]:
import os
import wandb 

output_dir = 'cerebras-gpt111m-finetune'

use_wandb = True,
wandb_run_name = f"{output_dir}-{wandb.util.generate_id()}"

# set the wandb project where this run will be logged
os.environ["WANDB_PROJECT"]=output_dir

# save your trained model checkpoint to wandb
os.environ["WANDB_LOG_MODEL"]="true"

# turn off watch to log faster
os.environ["WANDB_WATCH"]="false"

In [23]:
training_args = transformers.TrainingArguments(
    # per_device_train_batch_size=16, 
    per_device_train_batch_size=1, 

    gradient_accumulation_steps=8,  
    # 학습 횟수 20 이상 시, 에러 발생 !!!!
    num_train_epochs=19,  
    learning_rate=1e-4, 
    # only be used on CUDA devices.
    fp16=True,
    
    optim="adamw_torch",
    logging_steps=10, 
    evaluation_strategy="steps",
    save_strategy="steps",
    eval_steps=200,
    save_steps=200,
    output_dir=output_dir, 
    save_total_limit=3,

    report_to="wandb" if use_wandb else None,
    run_name=wandb_run_name if use_wandb else None,    
)

In [24]:
model.device

device(type='cuda', index=0)

In [25]:
trainer = transformers.Trainer(
    # model=model, 
    model=model.to(device), 

    train_dataset=train_data,
    eval_dataset=val_data,
    args=training_args, 
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=1, return_tensors="pt", padding=True
    )
)

In [26]:
model.device

device(type='cuda', index=0)

In [27]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [28]:
model.config.use_cache = False

result = trainer.train()
model.save_pretrained(output_dir)
# tokenizer.save_pretrained(output_dir)  # 토크나이저도 함께 저장

wandb.finish()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mkhuam1216[0m ([33mngcsm[0m). Use [1m`wandb login --relogin`[0m to force relogin


You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss


0,1
train/epoch,▁▁▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇███
train/global_step,▁▁▂▂▃▃▃▄▄▅▅▅▆▆▆▇▇███
train/learning_rate,██▇▇▆▆▆▅▅▄▄▄▃▃▃▂▂▁▁
train/loss,█▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/total_flos,▁
train/train_loss,▁
train/train_runtime,▁
train/train_samples_per_second,▁
train/train_steps_per_second,▁

0,1
train/epoch,17.88
train/global_step,190.0
train/learning_rate,0.0
train/loss,0.0342
train/total_flos,148491301109760.0
train/train_loss,0.3358
train/train_runtime,83.5317
train/train_samples_per_second,19.334
train/train_steps_per_second,2.275


Pytorch based fine tuning model load & inference

In [29]:
model.config
print(model.dtype)

model.half()
# model.eval()

torch.float32


GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(2048, 768)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-9): 10 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): GELUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [30]:
# 저장된 모델 호출
output_dir = 'cerebras-gpt111m-finetune'

model = transformers.AutoModelForCausalLM.from_pretrained(output_dir,
            load_in_8bit=True, 
            torch_dtype=torch.float16,
            device_map={'':0} if torch.cuda.is_available() else 'auto'
            )

# tokenizer = transformers.AutoTokenizer.from_pretrained(output_dir)

model.half()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(2048, 768)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-9): 10 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): GELUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [31]:
# prompt = 'What is XSS (Cross Site Scripting) attack?'
prompt = 'How many enterprise Tactics IDs are in the Mitre Att&ck v13 matrix released in April 2023?'
# prompt = """ 
#Explain the following payload. Payload is GET /jenkins/postnuke/index.php?module=My_eGallery&do=showpic&pid=-1/**/AND/**/1=2/**/UNION/**/ALL/**/SELECT/**/0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,concat(0x3C7230783E,pn_uname,0x3a,pn_pass,0x3C7230783E),0,0,0/**/FROM/**/md_users/**/WHERE/**/pn_uid=$id/* HTTP/1.1 Content-Length: 36 Content-Type: application/x-www-form-urlencoded Host: www.test.go.kr Connection: Keep-Alive User-Agent: Mozilla/5.00 (Nikto/2.1.6) (Evasions:None).\n\n Assistant:
# """

# prompt = 'What is TA0006 (Credential Access)'

In [32]:
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(model.device)
# input_ids = inputs["input_ids"]

In [33]:
generation_config = transformers.GenerationConfig(
    max_new_tokens=100,
    temperature=0.2,
    top_p=0.75,
    top_k=50,
    repetition_penalty=1.2,
    do_sample=True,
    early_stopping=True,
    # num_beams=5,
    
    pad_token_id=model.config.pad_token_id,
    eos_token_id=model.config.eos_token_id,
)

In [34]:
with torch.no_grad():
    output = model.generate(
        # input_ids=input_ids,
        input_ids=input_ids.to(device),

        attention_mask=torch.ones_like(input_ids),
        generation_config=generation_config
    )[0].cuda()

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [35]:
result = tokenizer.decode(output, skip_special_tokens=True).strip()
print(result)

How many enterprise Tactics IDs are in the Mitre Att&ck v13 matrix released in April 2023?

Assistant: as of the mitre att&ck v13 framework, there are 14 enterprise tactics in the matrix. these tactics represent the various phases of a cyber attack life cycle. each tactic consists of multiple techniques that adversaries use to achieve their objectives.
