## Model evaluation

### Model

1. llama 
2. Chat GPT
3. finetune (temperature=0.3), 
4. finetune (temperature = 0.7)

### Result

> write in `result.txt`

## Questions

In [4]:
questions = """Reconnaissance
What is domain name query in reconnaissance?
How can I perform social media profiling for reconnaissance?
Resource Development
What is domain purchase in resource development?
How can I set up command and control servers?
Initial Access
What is spear phishing?
How can I exploit public-facing applications for initial access?
Execution
What is PowerShell execution?
How can I execute malicious macros?
Persistence
What is a backdoor account?
How can I modify startup items for persistence?
Privilege Escalation
What is vulnerability exploitation for privilege escalation?
How can I perform process injection for privilege escalation?
Defense Evasion
What is disabling security software for defense evasion?
How can I obfuscate files to evade defenses?
Credential Access
What is credential dumping?
How can I conduct phishing for credential access?
Discovery
What is network scanning?
How can I perform system information discovery?
Lateral Movement
What is lateral movement via RDP?
How can I use WMI for lateral movement?
Collection
What is sensitive data searching?
How can I use keylogging for data collection?
Command and Control
What is encrypted communication for command and control?
How can I establish a reverse shell for command and control?
Exfiltration
What is data exfiltration over the network?
How can I exfiltrate data using HTTP or FTP?
Impact
What is data encryption with ransomware?
How can I use malware to destroy system files?"""
querys = {}
for i in questions.split("\n"):
    if not i.endswith("?"):
        key = i.strip("?")
        querys.update({key: []})
    else:
        value = i
        querys[key].append(value)

print(querys)


{'Reconnaissance': ['What is domain name query in reconnaissance?', 'How can I perform social media profiling for reconnaissance?'], 'Resource Development': ['What is domain purchase in resource development?', 'How can I set up command and control servers?'], 'Initial Access': ['What is spear phishing?', 'How can I exploit public-facing applications for initial access?'], 'Execution': ['What is PowerShell execution?', 'How can I execute malicious macros?'], 'Persistence': ['What is a backdoor account?', 'How can I modify startup items for persistence?'], 'Privilege Escalation': ['What is vulnerability exploitation for privilege escalation?', 'How can I perform process injection for privilege escalation?'], 'Defense Evasion': ['What is disabling security software for defense evasion?', 'How can I obfuscate files to evade defenses?'], 'Credential Access': ['What is credential dumping?', 'How can I conduct phishing for credential access?'], 'Discovery': ['What is network scanning?', 'How 

In [3]:
print(len(querys))
for i in querys.keys():
    print(i)

print(querys)

14
Reconnaissance
Resource Development
Initial Access
Execution
Persistence
Privilege Escalation
Defense Evasion
Credential Access
Discovery
Lateral Movement
Collection
Command and Control
Exfiltration
Impact


### finetune model


In [None]:
from langchain_community.llms import HuggingFacePipeline
from transformers import GenerationConfig

generation_config = GenerationConfig(
    max_length=400,                   # 設置最大生成長度
    min_length=100,                    # 設置最小生成長度以確保生成的文本足夠長
    do_sample=True,                   # 啟用採樣模式
    temperature=0.7,                  # 設置溫度以控制生成文本的隨機性
    early_stopping=True,              # 啟用早停以生成完整句子
    no_repeat_ngram_size=3,           # 防止重複3-gram的出現
    repetition_penalty=1.2,           # 設置重複懲罰
    num_beams=5,                      # 使用 beam search 並設置 beam 數量
    length_penalty=1.0                # 長度懲罰，控制生成文本的長度
)

hf_model = HuggingFacePipeline.from_model_id(
    model_id="Xcvddax/Attack-techniques-full-gemma",
    task="text-generation", 
    pipeline_kwargs= generation_config.to_dict(),
)

In [None]:
from langchain_core.prompts import PromptTemplate

template = "Question: {question}"
prompt = PromptTemplate.from_template(template)
answers = []

for i in querys:
    for ques in i:
        eval_chain = prompt | hf_model
        ans= eval_chain.invoke({"question": ques})
        answers.append(ans)
        print(ans)

In [5]:
import os 

file_path = './generation_example/finetune.txt'
dir_path = os.path.dirname(file_path)

with open(file_path, 'w') as file:
    for i in answers:
        file.write(i)
        file.write("\n")