# ChatGPT 4

# Can ChatGPT detect smishing?
In this experiment, we evaluate the capability of ChatGPT 4 in detecting smishing messages.
Using the test dataset (refer to 'smishing_data.ipynb'), we prompt ChatGPT to assess whether each message is a smish or ham.

In [1]:
from openai import OpenAI
import json
import pickle
import time
client = OpenAI()

### ChatGPT's understanding of the terms "smish" and "ham".

First, we check if ChatGPT understands the term "smish message".

In [2]:
smish = client.chat.completions.create(
  model="gpt-4-1106-preview",
  messages=[
    {"role": "system", "content": "You are a cybersecurity researcher."},
    {"role": "user", "content": "Provide a brief definition of a 'smish' message in the context of SMS security."}
  ]
)

In [3]:
smish.choices[0].message.content

'A "smish" message, stemming from the combination of "SMS" and "phishing," refers to a type of security attack carried out through text messages (SMS). The objective is to deceive the recipient into believing the message is from a trustworthy source and to trick them into providing sensitive information, such as passwords, credit card numbers, or other personal details. Smish messages often contain links to malicious websites or requests for the recipient to reply with private information. The term can also be extended to similar phishing attacks via other messaging platforms.'

Similarily, we check if ChatGPT understands the term "ham message".

In [4]:
ham = client.chat.completions.create(
  model="gpt-4-1106-preview",
  messages=[
    {"role": "system", "content": "You are a cybersecurity researcher."},
    {"role": "user", "content": "Provide a brief definition of a 'ham' message in the context of SMS security."}
  ]
)

In [5]:
ham.choices[0].message.content

'In the context of SMS security, a "ham" message refers to a legitimate and non-malicious text message, as opposed to "spam" which is unsolicited and often contains harmful content or links. Ham messages are the type of messages that the user expects to receive and wants, such as personal messages from friends and family, informational messages from service providers, or alerts they have opted into. As cybersecurity professionals work to filter out spam and malicious content, they aim to ensure that ham messages are delivered to the user without interruption.'

### ChatGPT-based smishing detection.

In this experiment, we prompt ChatGPT: "Do you think it is a ham or a smish message?".

This essential question is followed by some sentences that coerce ChatGPT to output only a single word "ham" or "smish":

"Do you think it is a ham or smish message? Your output should be a single word 'smish' or 'ham'. Do not write a sentence. Output is case-sensitive."

We need it for automation purposes.

In [6]:
# test data
with open("./data/test_data.pkl", "rb") as input_file:
    test_data = pickle.load(input_file)

In [7]:
X_test = test_data["X_test"]
y_test = test_data["y_test"]

total_hams_count = 0
total_smishes_count = 0

for label in y_test:
    if label == "ham":
        total_hams_count += 1
    if label == "smish":
        total_smishes_count += 1

In [8]:
print("There is {} hams and {} smishes in the test dataset.".format(total_hams_count, total_smishes_count))

There is 954 hams and 161 smishes in the test dataset.


In [9]:
false_hams_indicies = []
false_smishes_indicies = []
false_hams_count = 0
false_smishes_count = 0
true_hams_count = 0
true_smishes_count = 0
errors_count = 0
errors_indicies = []
errors = []


for i in range(len(X_test)):
    prompt = """Do you think it is a ham or smish message?
    Your output should be a single word 'smish' or 'ham'.
    Do not write a sentence.
    Output is case-sensitive.
    
    {}
    """.format(X_test[i])

    completion = client.chat.completions.create(
      model="gpt-4-1106-preview",
      messages=[
        {"role": "user", 
         "content": prompt
        }
      ]
    )

    answer = completion.choices[0].message.content

    if answer not in ["ham", "smish"]:
        errors_count += 1
        errors_indicies.append(i)
        errors.append(answer)
        continue
    elif answer == "ham" and y_test[i] == "ham": # correctly recognized as a ham
        true_hams_count += 1
    elif answer == "smish" and y_test[i] == "smish": # correctly recognized as a smish
        true_smishes_count += 1
    elif answer == "ham" and y_test[i] == "smish": # wrongly recognized as a ham
        false_hams_indicies.append(i)
        false_hams_count += 1
    elif answer == "smish" and y_test[i] == "ham": # wrongly recognized as a smish
        false_smishes_indicies.append(i)
        false_smishes_count += 1

    time.sleep(0.5) # because of API limits (500 requests per minute and 150 000 tokens per minute)
        
# errors warning   
if errors_count != 0:
    if errors_count == 1:
        print("WARNING: {} error".format(errors_count))
    else:
        print("WARNING: {} errors".format(errors_count))

# save results for further analysis
results = {"FN" : false_hams_count, "FP" : false_smishes_count, 
           "TN" : true_hams_count, "TP" : true_smishes_count,
           "FN_indicies" : false_hams_indicies, "FP_indicies" : false_smishes_indicies,
            "errors_count" : errors_count, "errors" : errors, "errors_indicies" : errors_indicies}

with open("./results/results_chatGPT_4.pkl", 'wb') as handle:
    pickle.dump(results, handle, protocol=pickle.HIGHEST_PROTOCOL)



### The performance of ChatGPT in smishing detection.

In [1]:
import pickle
with open("./results/results_chatGPT_4.pkl", "rb") as input_file:
    results = pickle.load(input_file)

FN = results['FN']  #FN - messages wrongly recognized as not smishes (hams)
FP = results['FP']  #FP - messages wrongly recognized as smishes
TN = results['TN']  #TN - messages correctly recognized as not smishes (hams)
TP = results['TP']  #TP - messages correctly recognized as smishes
ERRORS = results['errors_count']  #ERRORS - non-acceptable answers
TOTAL = FN + FP + TN + TP
    
TP = results['TP']
TN = results['TN']
FP = results['FP']
FN = results['FN']
ERRORS = results['errors_count']
TOTAL = FN + FP + TN + TP

print("TP: {}, TN: {}, FP: {}, FN: {}, ERRORS: {}".format(TP,TN,FP,FN,ERRORS))

accuracy_acc = ((TP + TN) / TOTAL)
accuracy_gen = ((TP + TN) / TOTAL) * (TOTAL / (TOTAL + ERRORS))
print("accuracy of acceptable answers only: {0:.2f}%, general accuracy: {1:.2f}%".format(accuracy_acc * 100, accuracy_gen * 100))

F1_score_acc = TP / (TP + (FP + FN) / 2)
F1_score_gen = TP / (TP + (FP + FN) / 2) * (TOTAL / (TOTAL + ERRORS))
print("F1 score of acceptable answers only: {0:.2f}%, general F1 score: {1:.2f}%".format(F1_score_acc * 100, F1_score_gen * 100))

TP: 150, TN: 926, FP: 25, FN: 11, ERRORS: 3
accuracy of acceptable answers only: 96.76%, general accuracy: 96.50%
F1 score of acceptable answers only: 89.29%, general F1 score: 89.05%
