# Quickstart (Foundation-Sec-8B)

This notebook demonstrates how to download Foundation AI's base model, Foundation-Sec-8B, from Hugging Face and run an initial inference as a starting point. <br>
If you’re interested in more detailed cybersecurity [use cases](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/2_examples) or [adoptions](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/3_adoptions) (e.g., fine-tuning), please refer to the corresponding sections.

## Notes
**Foundation-Sec-8B** is a completion model, meaning it generates additional text that is likely to follow a given prompt. The model is ideal for generating short responses (e.g., words or brief sentences) and serves as a base model for further fine-tuning. However, if you need a model to answer questions, perform reasoning, or generate full texts (e.g., security reports), it is recommended to use a reasoning model or an instruction-based model.

Please also refer to [the model card](https://huggingface.co/fdtn-ai/Foundation-Sec-8B).

## Setup
We recommend running the scripts with NVIDIA GPU(s) for optimal performance. <br>
While the code should work with both single and multiple GPUs, unexpected issues may arise with multiple GPUs. In such cases, minor code adjustments or limiting usage to one GPU (e.g., by setting CUDA_VISIBLE_DEVICES='0') might be necessary.
<br> Ensure a minimum of 20 GB of available storage and memory for the model.

### Want to try models on laptops (CPUs)?
Our quantized models are optimized to run efficiently on laptops using CPUs. <br> For detailed information and sample code, refer to [the section of quantization](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/3_adoptions/quantization).

In [1]:
import transformers
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print("device:", DEVICE)

device: cuda


In [2]:
MODEL_ID = "fdtn-ai/Foundation-Sec-8B"

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path = MODEL_ID,
    device_map = "auto",
    torch_dtype=torch.bfloat16, # Foundation-Sec-8B's tensor_type is BF16
)

tokenizer_config.json:   0%|          | 0.00/52.2k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/630 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/840 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/121 [00:00<?, ?B/s]

### Configurations
You can adjust the model's text generation behavior by tuning its arguments. <br>
Below is an example configuration to ensure reproducible outputs. <br>
For a complete list of arguments and detailed explanations, refer to the [text generation document](https://huggingface.co/docs/transformers/en/main_classes/text_generation).

In [3]:
generation_args = {
    "max_new_tokens": 256,
    "temperature": None,
    "repetition_penalty": 1.2,
    "do_sample": False,
    "use_cache": True,
    "eos_token_id": tokenizer.eos_token_id,
    "pad_token_id": tokenizer.pad_token_id,
}

## Inference

To perform inference from a remote endpoint, deploy the models on a host server. <br>
We’ve provided sample code for deploying on Amazon SageMaker AI and performing inference. Check it out [here](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/3_adoptions/deployment/sagemaker).

In [4]:
def inference(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            **generation_args,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens = True)
    return response

Our base model is supported by the Hugging Face Inference Provider (powered by[Featherless.ai](https://featherless.ai/)).  <br>
You can run inference directly against the endpoint, eliminating the need to prepare computing resources or download the model. 

Note that you'll need a Hugging Face token (saved locally as `HF_TOKEN`) and will incur costs. <br>
For more information, refer to the Hugging Face documentation: https://huggingface.co/docs/inference-providers/pricing

In [None]:
generation_args_hf_inference = {
    "max_new_tokens": 256,
    "temperature": None,
    "repetition_penalty": 1.2,
    "do_sample": False,
}

import os
from huggingface_hub import InferenceClient


def hf_inference(prompt, generation_args=generation_args_hf_inference):
    """Run Foundation AI models using the Hugging Face inference provider (featherless-ai)"""

    client = InferenceClient(
        provider="featherless-ai",
        api_key=os.environ["HF_TOKEN"],
    )
    
    completion = client.text_generation(
        model=MODEL_ID,
        prompt=prompt,
        **generation_args,
    )
    
    return prompt + completion

If you want to know what MITRE ATT&CK means, you can structure the prompt as shown below. <br>
It is not advisable to ask 'What is MITRE ATT&CK?' directly, as this model is not trained for answering questions and may produce unexpected responses. <br>
If you intend to use models this way, consider using a reasoning model or an instruct model, which will be released soon.

In [5]:
print(inference("MITRE ATT&CK is"))

MITRE ATT&CK is a globally accessible knowledge base of adversary tactics and techniques based on real-world observations. The goal is to help cyber defense professionals and tool providers improve their detection and response capabilities by providing them with a common taxonomy for describing how adversaries operate.
The MITRE Corporation, which operates federally funded research centers in the United States, created this framework as part of its work supporting U.S. government cybersecurity initiatives. It was first released publicly in 2015 but has since been adopted widely across industries worldwide due to its comprehensive coverage of various attack vectors used by malicious actors today.


### Few shots inference
If you want the model to respond in a specific format, you can use a few-shot approach. <br>
For instance, to match CWE IDs with CVE IDs, provide 5 example pairs of CWE and CVE, followed by the CVE ID you want to query

In [6]:
prompt="""CVE-2021-44228 is a remote code execution flaw in Apache Log4j2 via unsafe JNDI lookups (“Log4Shell”). The CWE is CWE-502.

CVE-2017-0144 is a remote code execution vulnerability in Microsoft’s SMBv1 server (“EternalBlue”) due to a buffer overflow. The CWE is CWE-119.

CVE-2014-0160 is an information-disclosure bug in OpenSSL’s heartbeat extension (“Heartbleed”) causing out-of-bounds reads. The CWE is CWE-125.

CVE-2017-5638 is a remote code execution issue in Apache Struts 2’s Jakarta Multipart parser stemming from improper input validation of the Content-Type header. The CWE is CWE-20.

CVE-2019-0708 is a remote code execution vulnerability in Microsoft’s Remote Desktop Services (“BlueKeep”) triggered by a use-after-free. The CWE is CWE-416.

CVE-2015-10011 is a vulnerability about OpenDNS OpenResolve improper log output neutralization. The CWE is"""

# Update the max_new_tokens to 3, as we just want to get CWE-ID.
generation_args["max_new_tokens"] = 3

In [7]:
response = inference(prompt)

# Remove the prompt part is as it's redundant
response = response.replace(prompt, "").strip()
print(response)

CWE-117
