# [Preview] Quickstart (Reasoning Model)

This notebook demonstrates how to download Foundation AI's reasoning model from Hugging Face and run an initial inference as a starting point. <br>
If you’re interested in more detailed cybersecurity [use cases](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/2_examples) or [adoptions](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/3_adoptions), please refer to the corresponding sections.

## Notes
This is a reasoning model, designed to tackle complex tasks that require multi-step reasoning and explicit logical thinking. They are particularly effective for tasks like automate red-teaming, advanced incident investigation, and so on. 

## Setup
We recommend running the scripts with NVIDIA GPU(s) for optimal performance. <br>
While the code should work with both single and multiple GPUs, unexpected issues may arise with multiple GPUs. In such cases, minor code adjustments or limiting usage to one GPU (e.g., by setting CUDA_VISIBLE_DEVICES='0') might be necessary.
<br> Ensure a minimum of 40 GB of available storage and memory for the model.

## Important
Since the model is in preview mode, you'll need to log in to your authorized Hugging Face account and use the correct token. <br> 
If you have any questions, please contact our team.

In [1]:
import os

# export Huggfing Face token to HF_TOKEN
HF_TOKEN = os.getenv("HF_TOKEN")

In [2]:
import transformers
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print("device:", DEVICE)

device: cuda


In [3]:
MODEL_ID = "" # To be relaced with the final model name

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, token=HF_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=MODEL_ID,
    device_map="auto",
    torch_dtype=torch.float32, # this model's tensor_type is float32
    token=HF_TOKEN,
)

Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

### Configurations
You can adjust the model's text generation behavior by tuning its arguments. <br>
Below is an example configuration to ensure reproducible outputs. <br>
For a complete list of arguments and detailed explanations, refer to the [text generation document](https://huggingface.co/docs/transformers/en/main_classes/text_generation).

In [4]:
generation_args = {
    "max_new_tokens": 1024,
    "temperature": None,
    "repetition_penalty": 1.2,
    "do_sample": False,
    "use_cache": True,
    "eos_token_id": tokenizer.eos_token_id,
    "pad_token_id": tokenizer.pad_token_id,
}

## Inference

In [5]:
import re

def inference(prompt):
    messages = [
        {"role": "user", "content": prompt},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

    # This <think> token is essential for reasoning model.
    inputs += "<think>\n"
    
    inputs = tokenizer(inputs, return_tensors="pt").to(DEVICE)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            **generation_args,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens = False)

    # extract the thinking part only
    match = re.search(r"<think>(.*?)<\|end_of_text\|>", response, re.DOTALL)
    
    return match.group(1).strip()

Below is a simple query to request best practices for security. <br>
The reasoning model can handle complex tasks, such as identifying vulnerabilities in security logs or triaging alerts. <br>
For more examples, refer to [various use cases](https://github.com/RobustIntelligence/foundation-ai-cookbook/tree/main/2_examples) for those usages.

In [6]:
from IPython.display import display, Markdown
display(Markdown(inference("What are the best practices for security when developing RAG applications?")))

When developing Retrieval-Augmented Generation (RAG) applications, which integrate large language models with a retrieval system to provide context-aware responses, several key security considerations must be addressed. Here's an overview of best practices that can help ensure your application is secure:

### 1. Data Privacy and Security

- **Anonymization:** Ensure that retrieved documents do not contain sensitive information about individuals. Use anonymization techniques where necessary.
- **Data Encryption:** Encrypt data both at rest and in transit. This includes encryption of stored documents and any communication between clients and servers.
- **Access Control:** Implement strict access controls. Only authorized personnel should have access to retrieve or store documents within the system.

### 2. Secure Data Storage and Retrieval

- **Secure APIs:** When using external databases or APIs for document storage, use HTTPS and authenticate requests securely (e.g., OAuth).
- **Indexing Sensitive Information Minimally:** Avoid indexing personal identifiable information (PII), financial details, etc., unless absolutely necessary. If such data is indexed, implement strong protections against unauthorized access.
- **Regular Audits:** Conduct regular audits on what data is being stored and how it’s accessed. Remove unnecessary data from indexes.

### 3. Model Safety and Bias Mitigation

- **Bias Assessment:** Evaluate the training datasets used by LLMs for biases. The same applies to the retrieved content; ensure they don't propagate harmful stereotypes or misinformation.
- **Safety Measures:** Deploy safety measures like prompt evaluation, output filtering, and adversarial testing to prevent generation of unsafe outputs.
- **Model Updates:** Regularly update models to address newly discovered vulnerabilities and improve fairness and bias reduction.

### 4. Input Validation and Sanitization

- **Input Filtering:** Validate and sanitize inputs to prevent injection attacks (like SQLi if interacting with databases directly). For text-based inputs, this might involve removing malicious code patterns.
- **Rate Limiting:** Apply rate limiting to prevent denial-of-service (DoS) attacks via excessive queries.

### 5. Output Handling and Moderation

- **Content Moderation:** Integrate robust moderation systems to filter generated outputs for toxic content, hate speech, or other unwanted materials.
- **User Feedback Loops:** Allow users to flag inappropriate answers, helping refine future generations and retrievals.

### 6. Compliance with Regulations

- **GDPR/CCPA Compliance:** Adhere to privacy regulations regarding user data processing and storage. Obtain necessary consents and inform users about their rights.
- **HIPAA Compliance** (if handling health info): Ensure all processes comply with HIPAA standards for protecting patient data.

### 7. Incident Response Plan

- **Monitoring:** Set up monitoring tools to detect anomalies or suspicious activities.
- **Incident Response Team:** Have a team ready to respond quickly to breaches or leaks.
- **Post-Incident Analysis:** After incidents, conduct thorough analyses to understand causes and improve defenses.

### 8. Third-party Dependencies Management

- **Vendor Risk Management:** Assess third-party services' security postures, especially those hosting or managing parts of your infrastructure or data.
- **Up-to-date Libraries:** Keep dependencies updated to mitigate known vulnerabilities in libraries and frameworks used.

By following these best practices, developers can significantly enhance the security posture of their RAG applications, ensuring safer interactions and better protection of user data.