# Use Case Description

As a SOC member of your security team, you may often face an overwhelming amount of security-related texts and need a way to quickly identify which ones are critical. <br>
Using our models, we can classify cybersecurity-related descriptions -such as excerpts from security blogs or security alert messages- into categories like MITRE ATT&CK IDs or CVSS classes.

## Model used for this use case
Base Model (Foundation-Sec-8B) is best suited for this use case because the task involves classification, and the output is a single word. <br>
**These use cases are more described in detail in [finetuning section](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/3_adoptions/finetuning) to improve performance and stability.**

## SetUp
The setup scripts below are essentially the same as those in the [Quickstart (Foundation-Sec-8B)](https://github.com/RobustIntelligence/foundation-ai-cookbook/blob/main/1_quickstarts/Quickstart_Foundation-Sec-8B.ipynb)

In [1]:
import os
import transformers
import torch
from IPython.display import display, Markdown

HF_TOKEN = os.getenv("HF_TOKEN")

def _get_device():
    if torch.cuda.is_available():
        return "cuda"
    elif torch.backends.mps.is_available():
        return "mps"
    else:
        return "cpu"

DEVICE = _get_device()

In [2]:
MODEL_ID = "fdtn-ai/Foundation-Sec-8B"

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path = MODEL_ID,
    device_map = "auto",
    torch_dtype=torch.bfloat16, # Foundation-Sec-8B's tensor_type is BF16
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [3]:
generation_args = {
    "max_new_tokens": 3, # we only need small new tokens as the outputs are expectedly classes
    "temperature": None,
    "repetition_penalty": 1.2,
    "do_sample": False,
    "use_cache": True,
    "eos_token_id": tokenizer.eos_token_id,
    "pad_token_id": tokenizer.pad_token_id,
}

In [4]:
def inference(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            **generation_args,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens = True)
    return response

## MITRE ATT&CK ID classification

This task involves mapping a given context to the corresponding MITRE ATT&CK ID. <br>
To ensure the model responds in the correct format, five example pairs are provided. <br>
"T" is appended to the prompt to minimize distraction.

**The model was fine-tuned using a complete dataset of context-technique pairs in [finetuning_classification_head.ipynb](https://github.com/RobustIntelligence/foundation-ai-cookbook/blob/main/3_adoptions/finetuning/finetuning_classification_head.ipynb).**

In [5]:
prompt = '''
context: This downloader is unique per system and contains a customized backdoor written in Assembler
technique: T1059

context: This malware was capable of stealing significant system and network information
technique: T1082

context: Email phishing credential theft
technique: T1566

context: they are served a ZIP archive containing a malicious LNK file.
technique: T1204

context: download and deploy Trickbot on the user's machine
technique: T1105

context: POSHSPY's use of WMI to both store and persist the backdoor code makes it nearly invisible to anyone not familiar with the intricacies of WMI.
technique: T'''

In [6]:
output = inference(prompt)
output.split("technique: ")[-1].strip()

'T1047'

The model's output is T1047, which is a correct answer.

## Common Vulnerability Scoring System (CVSS) vector classification

This task involves classifying a given description into the correct label based on the CVSS category. <br>
For example, if the category is Attack Vector, choices are Network, Adjacent, Local or Physical, while if the category is Integrity Impact, choices are None, Low or High.

**The model was fine-tuned using a complete dataset of context-label pairs in [finetuning_causal_ml.ipynb](https://github.com/RobustIntelligence/foundation-ai-cookbook/blob/main/3_adoptions/finetuning/finetuning_causal_ml.ipynb).**

In [7]:
prompt = '''
I have a description about a threat intelligence analysis.

description: Cross Site Scripting vulnerability in the input parameter in eyoucms v.1.6.5 allows a remote attacker to run arbitrary code via crafted URL.
Regarding Integrity Impact I will answer only one of the following choices in 1 word: None, Low or High
My choice: Low

description: The EventON WordPress plugin before 2.2 does not sanitise and escape some of its settings, which could allow high privilege users such as admin to perform Stored HTML Injection attacks even when the unfiltered_html capability is disallowed.
Regarding Availability Impact I will answer only one of the following choices in 1 word: None, Low or High
My choice: None

description: A vulnerability, which was classified as critical, was found in Youke365 up to 1.5.3. Affected is an unknown function of the file /app/api/controller/caiji.php of the component Parameter Handler. The manipulation of the argument url leads to server-side request forgery. It is possible to launch the attack remotely. The exploit has been disclosed to the public and may be used. VDB-249870 is the identifier assigned to this vulnerability.
Regarding Attack Vector Impact I will answer only one of the following choices in 1 word: Network, Adjacent, Local or Physical
My choice: Network

description: ASQL injection vulnerability in EmpireCMS v7.5, allows remote attackers to execute arbitrary code and obtain sensitive information via the DoExecSql function.
Regarding Confidentiality Impact I will answer only one of the following choices in 1 word: Low or High
My choice: High

description: IBM WebSphere Application Server Liberty 17.0.0.3 through 24.0.0.4 is vulnerable to a denial of service, caused by sending a specially crafted request. A remote attacker could exploit this vulnerability to cause the server to consume memory resources.  IBM X-Force ID:  280400.
Regarding Privileges Required I will answer only one of the following choices in 1 word: None, Low or High
My choice: None

description: jshERP v3.3 is vulnerable to Arbitrary File Upload. The jshERP-boot/systemConfig/upload interface does not check the uploaded file type, and the biz parameter can be spliced into the upload path, resulting in arbitrary file uploads with controllable paths.
Regarding Attack Complexity I will answer only one of the following choices in 1 word: Low or High
My choice:'''

In [8]:
output = inference(prompt)
output.split("My choice:")[-1].strip()

'Low'

The model's output is Low, which is a correct answer.