# **Module 6:** Generative AI for Cyber Security

## Fine-tunning a model for Cyber Security

Fine-tuning a language model in the context of LLMs (Masked Language Models like BERT), refers to the process of taking a pre-trained language model and further training it on a smaller, task-specific dataset to adapt it to a specific downstream task. The idea is to leverage the knowledge learned during the initial pre-training on a large corpus and then fine-tune the model to perform well on a specific task of interest.

1. **Pre-training on a Large Corpus**

    Initially, the language model is pre-trained on a large and diverse dataset. During this phase, the model learns general language patterns, syntax, and contextual relationships between words.

1. **Task-Specific Data**

    After pre-training, the model is fine-tuned on a smaller dataset that is specific to the task you want the model to perform well on. This dataset is typically labeled and consists of examples relevant to the downstream task.

1. **Architecture and Parameters**

    The architecture of the model remains the same, but the parameters learned during pre-training are further adjusted based on the task-specific data. The fine-tuning process updates the weights of the model to make it more suited for the specific task.


1. **Task-Specific Objective Function**

    The objective function used during fine-tuning is tailored to the downstream task. For example, in classification tasks, the model might be fine-tuned using a cross-entropy loss function.

1. **Learning Rate and Training Hyperparameters**

    Fine-tuning often involves adjusting the learning rate and other hyperparameters to ensure effective training on the smaller dataset. This helps prevent overfitting and encourages the model to adapt to the specific task.
   
1. **Transfer of Knowledge**

    The knowledge gained during pre-training, such as understanding of language structures and semantics, is transferred to the task-specific model. Fine-tuning allows the model to specialize without losing the general language understanding acquired during pre-training.

![](https://www.labellerr.com/blog/content/images/2023/08/Fine-tune-example.png)

Fine-tuning is particularly useful when you have a limited amount of task-specific data. By starting with a pre-trained model, you can benefit from the knowledge embedded in the model and fine-tune it to achieve good performance on your specific task, even with a smaller dataset. This approach has been successful in various natural language processing (NLP) tasks, ranging from sentiment analysis to named entity recognition.

## Case Study: SecureBret

[SecureBERT](https://arxiv.org/pdf/2204.02685) is a domain-specific language model to represent cybersecurity textual data which is trained on a large amount of in-domain text crawled from online resources.

SecureBERT can be used as the base model for any downstream task including text classification, NER, Seq-to-Seq, QA, etc.
- SecureBERT has demonstrated significantly higher performance in predicting masked words within the text when compared to existing models like RoBERTa (base and large), SciBERT, and SecBERT.
- SecureBERT has also demonstrated promising performance in preserving general English language understanding (representation).

![](https://user-images.githubusercontent.com/46252665/195998237-9bbed621-8002-4287-ac0d-19c4f603d919.png)

In [None]:
#!pip install transformers
#!pip install torch
#!pip install tokenizers

In [94]:
from transformers import RobertaTokenizer, RobertaModel
import torch

tokenizer = RobertaTokenizer.from_pretrained("ehsanaghaei/SecureBERT")
model = RobertaModel.from_pretrained("ehsanaghaei/SecureBERT")

inputs = tokenizer("This is SecureBERT!", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state


import torch
import transformers
from transformers import RobertaTokenizer, RobertaTokenizerFast

tokenizer = RobertaTokenizerFast.from_pretrained("ehsanaghaei/SecureBERT")
model = transformers.RobertaForCausalLM.from_pretrained("ehsanaghaei/SecureBERT")

def predict_mask(sent, tokenizer, model, topk =10, print_results = True):
    token_ids = tokenizer.encode(sent, return_tensors='pt')
    masked_position = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero()
    masked_pos = [mask.item() for mask in masked_position]
    words = []
    with torch.no_grad():
        output = model(token_ids)

    last_hidden_state = output[0].squeeze()

    list_of_list = []
    for index, mask_index in enumerate(masked_pos):
        mask_hidden_state = last_hidden_state[mask_index]
        idx = torch.topk(mask_hidden_state, k=topk, dim=0)[1]
        words = [tokenizer.decode(i.item()).strip() for i in idx]
        words = [w.replace(' ','') for w in words]
        list_of_list.append(words)

    return list_of_list


from IPython.display import display, HTML
import html

def input_masked_sentence(input):
    def escape_mask(text):
        return text.replace('<mask>', '<&zwj;mask>')
    display(HTML(f"<b>Input:</b> {escape_mask(input)}"))
    for output in predict_mask(input, tokenizer, model):
        display(HTML(f"<b>SecureBert:</b> {' | '.join(output)}"))

Some weights of the model checkpoint at ehsanaghaei/SecureBERT were not used when initializing RobertaModel: ['lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ehsanaghaei/SecureBERT and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
If you want 

The `<mask>` token is commonly used in the context of masked language models (LLMs) or masked language model pre-training. This approach is a type of unsupervised learning where a model is trained to predict missing or masked tokens in a sequence of text. The `<mask>` token is used to represent the positions in the input text where tokens are masked or hidden during training.

Here's a general overview of how the `<mask>` token works in the context of LLMs:

**Masking during Training:**

During the pre-training phase, a certain percentage of tokens in the input text are randomly selected to be masked. These masked tokens are then replaced with the `<mask>` token.
The model is trained to predict the original identity of the masked tokens based on the context provided by the surrounding tokens.
Objective Function:

The objective function during training is typically to maximize the likelihood of predicting the correct tokens at the masked positions.
This training process helps the model learn contextual relationships and dependencies between words in a given language.

![](https://user-images.githubusercontent.com/46252665/195998153-f5682f7c-60a8-486d-b2c1-9ef5732c24ba.png)

**Fine-Tuning and Downstream Tasks:**

After pre-training, the model can be fine-tuned on specific downstream tasks (such as text classification, named entity recognition, etc.).
The knowledge gained during pre-training helps the model perform well on a range of natural language processing (NLP) tasks.
Prediction during Inference:

During inference or when using the model for downstream tasks, the `<mask>` token can be used to predict missing tokens in a given sequence. For example, if you provide a sentence with some tokens replaced by `<mask>`, the model can predict the most likely tokens for those masked positions.
Overall, the `<mask>` token is a key element in the training process of LLMs, enabling them to learn rich representations of language and perform well on a variety of NLP tasks. The most well-known model that uses this approach is BERT (Bidirectional Encoder Representations from Transformers).



In [59]:
input_masked_sentence("Adversaries may also compromise sites then include <mask> content designed to collect website authentication cookies from visitors.")

In [60]:
input_masked_sentence("One example of this is MS14-068, which targets <mask> and can be used to forge Kerberos tickets using domain user permissions.")

In [61]:
input_masked_sentence("Paris is the <mask> of France.")

In [62]:
input_masked_sentence("Virus causes <mask>.")

In [63]:
input_masked_sentence("Sending huge amount of packets through network leads to <mask>.")

In [64]:
input_masked_sentence("A <mask> injection occurs when an attacker inserts malicious code into a server")

## Using LLMs as Conversational Model

In [153]:
from transformers import pipeline, Conversation
converse = pipeline("conversational", model="facebook/blenderbot-400M-distill")

A Conversational Language Model (also known as Casual Language Model) is a type of language model designed to generate human-like responses in a conversational context. These models aim to understand and generate text in a way that simulates natural conversation. The main differences between conversational language model and our previous masked model are:

- **Context-Awareness**

    Conversational language models are designed to be context-aware, meaning they consider the preceding dialogue to generate relevant and coherent responses. They often utilize context from the conversation history to understand the user's intent and provide more accurate replies.

- **Sequential Processing**

    Conversational models process input sequentially, considering the conversation history turn by turn. This sequential nature enables them to generate responses based on the evolving context.

- **Long-Term Dependency**

    Effective conversational models need to capture long-term dependencies in the conversation. They should remember important information and references from earlier turns to provide meaningful and contextually appropriate responses.
  
- **User Intent Recognition**

    Understanding user intent is crucial in conversational language models. These models often incorporate intent recognition mechanisms to identify the user's goals, queries, or commands from the input text.

- **Dynamic Context Handling**

    Conversational models dynamically update their understanding of the conversation as new information is provided. This dynamic context handling allows them to adapt to changes in user queries or context shifts during the conversation.

- **Ethical Considerations**

    Conversational language models should be designed with ethical considerations in mind. This includes addressing biases, preventing the generation of inappropriate or harmful content, and ensuring responsible use of the technology.

![](https://assets-global.website-files.com/6305e5d52c28356b4fe71bac/63f8cfaeb05eed305bbc24f4_Holistic-AI-Figure-1.png)

*Language Modeling Approaches. (a) Masked Language Modeling predict hidden words in the sequence. (b) Causal Language Modeling, predict the next word in the sequence*

Prominent examples of conversational language models include OpenAI's ChatGPT, which is a powerful language model capable of generating context-aware responses in a conversational setting. These models have found applications in virtual assistants, chatbots, customer support systems, and various other conversational interfaces.

In [154]:
conversations = []
def ask_model(input):
    return converse(Conversation("Under the context of network and cyber security. " + input))

In [157]:
ask_model("What is DDoS")

In [158]:
ask_model("How can we prevent it?")