# Testing 

This notebook contains the code used to test the model with real data. 

We will see if a text is **written by an AI** or by a **human**.

In [1]:
import tensorflow, keras, numpy as np

In [2]:
import re
import nltk
from nltk.corpus import stopwords



if not nltk.data.find('corpora/stopwords'):
    nltk.download('stopwords')

stop_words = set(stopwords.words('english'))


def clean_text(text):
    
    text = text.lower()
    
    
    text = re.sub(r"http\S+|www\S+|https\S+", '', text, flags=re.MULTILINE)
    

    
    text = re.sub(r'[^\w\s]', ' ', text)
    
    
    text = ' '.join([word for word in text.split() if word not in stop_words])
    
    return text

Preprocess for Bert e Contrastive

In [3]:
from transformers import AutoTokenizer

model_check='bert-base-uncased'

tokenizer = AutoTokenizer.from_pretrained(model_check)

def encode_text(text, tokenizer):
    encoded = tokenizer.batch_encode_plus(
        text,
        add_special_tokens=True,
        max_length=512,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors="tf",
    )
    
    input_ids = np.array(encoded["input_ids"], dtype="int32")
    attention_masks = np.array(encoded["attention_mask"], dtype="int32")

    return {
        "input_ids": input_ids,
        "attention_masks": attention_masks
    }

  from .autonotebook import tqdm as notebook_tqdm


This is the preprocessing for the Feed Forward Neural Network.

In [4]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

def preprocess_data_ffn(dataset):
    
    tokenizer = Tokenizer(oov_token="<OOV>")
    texts = [data['text'] for data in dataset]
    tokenizer.fit_on_texts(texts)
    sequences = tokenizer.texts_to_sequences(texts)
    padded_sequences = pad_sequences(sequences, maxlen=977, padding='post')
    return padded_sequences

Load all the models we have

In [5]:
from transformers import TFBertModel
from keras.models import load_model

bertfinetuned = load_model(filepath='../Models/Finetuned-BERT.h5', custom_objects={'TFBertModel': TFBertModel}, compile=False)
ffn = load_model(filepath='../Models/SimpleFFN.h5', custom_objects={'TFBertModel': TFBertModel}, compile=False)
contrastive = load_model(filepath='../Models/contrastive_classifier (1).h5', custom_objects={'TFBertModel': TFBertModel}, compile=False)

bert = {
    'model': bertfinetuned,
    'name': 'BERT Finetuned'
}

ffn = {
    'model': ffn,
    'name': 'Simple FFN'
}

contrastive = {
    'model': contrastive,
    'name': 'Contrastive'
}

models = [bert, contrastive, ffn]

Dataset creation

In [13]:
human1 = """There is a moral obligation for everyone to do what they can to avoid bloodshed and aggression everywhere. Now Stack Overflow is a collaborative effort. It works best when it's completely apolitical, i.e. serves people everywhere equally. In that we should be fully aware that it can do both good and bad, please don't entertain any illusion there.

Example: Maybe Russian military attack systems were programmed with the help of knowledge from Stack Overflow and that might have increased their efficacy. Given the large amount of knowledge stored here and the large number of visits to Stack Overflow, that might even be very likely.

I don't know exactly what you have in mind, but if for example, Stack Overflow were to temporarily switch off its Russian language localized site or restrict access to the English main site based on geography, I would have some sympathies for that and would say we should discuss and consider it.

That would put Russia on one level as for example Iran, but well, I think that might be kind of appropriate.

Given that Stack Overflow Inc. is a private owned company, I do not think that to be very likely though. Morality for companies has yet to be developed.

On the other hand, Stack Overflow can hardly do something significant about wars. The software used now is already written. If you really want to do something, your countries' government is a much bigger factor. Petition to your government and use your political influence as citizen to make a small positive change in the world.
"""

ai1 = """Stack Overflow, like any other company, may face questions about its moral obligation in response to global events, particularly those with potential military implications. Whether they should go beyond publishing a statement and take direct action is subjective and depends on various factors.

Some argue that companies have a responsibility to contribute positively to society beyond their primary business objectives. This could include actions such as donations to humanitarian causes, supporting affected communities, or using their platform to raise awareness.

However, other perspectives emphasize the importance of neutrality and impartiality. Companies may prefer to remain apolitical, positioning themselves as platforms for knowledge exchange rather than political entities.

The expectations of Stack Overflow's user community also play a role. If a significant portion of the community expects or demands a response, the company may feel compelled to address the situation more actively.

Ultimately, the decision depends on Stack Overflow's internal values, the expectations of their user base, and their assessment of the situation's ethical and moral dimensions. Legal and ethical considerations, as well as potential risks, need to be taken into account when deciding on any direct action.
"""

human2 = """
A popular theory among the One Piece community is that Imu might be based on the Japanese folktale of Kaguya, the Moon Princess. There are actually some similarities like the fact that both Imu and Kaguya have 5 people surrounding them (the Gorosei in Imu's case), trying to fulfil any of their wishes. The other evidence for the canonical existence of Kaguya in the One Piece story is Pandaman. Yes, Pandaman. Pandaman is kind of a meme Oda puts here and there. You can find him in the most random places in random chapters across the entirety of the One Piece manga and he was even mentioned in a Poneglyph.

However, I actually believe that the three main actors of the One Piece world Luffy, Blackbeard and Imu might be based on the Japanese myth of Izanagi's Three Precious Children. Izanagi had three children, you definitely have heard their names before. Amaterasu, the Sun deity. Tsukuyomi, the Moon deity. And Susanoo, the Deity of (Storms and) the Sea. It is said that Amaterasu was born when Izanagi washed his left eye, Tsukuyomi was born when he washed his right one and Susanoo when he washed his nose. Luffy, who is ate the mythical devil fruit based on a Sun deity (Nika) could be Amaterasu (duh). Blackbeard, who has been depicted with a moon by his side and literally controls the darkness with his devil fruit powers could resemble Tsukuyomi. And Imu, which might be an inversion of "Umi" the Japanese word for "sea" or "ocean" could be Oda's version of Susanoo."""

ai2 = """
The concept of the "One Piece" in Eiichiro Oda's manga and anime series has captivated the imagination of fans worldwide. The enigma surrounding this elusive treasure, located at the end of the Grand Line on the island of Raftel (or Laugh Tale), serves as a linchpin for the overarching narrative in "One Piece."

Eiichiro Oda, the series' creator, has intentionally maintained a veil of secrecy around the true nature of the One Piece, contributing to the suspense and anticipation that permeates the storyline. While fans have engaged in fervent speculation, ranging from the traditional idea of treasure to profound historical revelations, Oda has masterfully woven a tale that keeps readers and viewers guessing.

The pursuit of the One Piece is central to the protagonist Monkey D. Luffy's journey and his quest for the title of Pirate King. The mystery adds depth to the world-building and encourages a sense of wonder among the fan base. Oda's narrative choices and the gradual revelation of the One Piece contribute to the enduring popularity and longevity of the "One Piece" series.

It's important to note that any opinion or speculation about the One Piece is subjective, as the true nature of this legendary treasure is known only to Eiichiro Oda himself. As fans eagerly await each new chapter or episode, the allure of the One Piece remains an integral aspect of the series, and the eventual revelation promises to be a momentous and highly anticipated event in the world of "One Piece."
"""

human3 = """
I really like to do some walks sometimes with my dog. In the future I want to live in a small city with all the possible 
services and public transportations. Is that possible? I mean, is not something so hard to find. Even so, 
I don't know if this will be possible for someone like me...
"""

ai3 = """
In the quaint embrace of a small city, Alex found an unexpected refuge from the chaos of urban life. Dreaming of a simpler existence, Alex yearned to escape the bustling metropolis for the serene charm of narrow streets and familiar faces. However, beneath the desire for a quieter life lurked a perpetual pessimism that colored every aspiration with a tinge of doubt. Even amid the serene landscapes and neighborly conversations, Alex couldn't quite shake the habit of anticipating the worst outcomes. The small city's embrace, though comforting in many ways, struggled to dispel the persistent clouds of negativity that shadowed Alex's perception of the world.
"""

metin2_wikipedia = """
Metin2 is a massively multiplayer online role-playing game (MMORPG) originally developed by Ymir Entertainment (now owned by Webzen Games) and originally released in Korea in 2004.[1] It has since been published in many European countries and in the United States by Gameforge 4D GmbH.[2] Other versions exist in Asian languages.

Gameplay
Experience points are earned every time the player kills enemies or completes a mission from an NPC. The game's combat is based on a hack and slash system. Players can also gather groups of creatures and land basic attacks and skills on each one of them simultaneously while they are all attacking the character at the same time. The game has several different character classes, all of which have 2 different sets of skills they can use, which can also be upgraded.

The in-game currency is called Yang, used to purchase items from the different NPC's. Players can use this to trade with other players or to their own shop. Players can also make their own guild of fighters and guilds can have wars with each other.
"""

human4 = """
Where the fuck is Accord? What's the kingdom of the night, what happened after Zero (or Queen Beast) left the universe? What are dragons? What are the watchers? Maso? And many more questions. Yeah, Drakengard universe is far from full.

With that said, I want a Drakengard remake collection with the edgy, stupid, twisted shit uncensored and in a playable form. If that's not possible then a new game is perfectly fine for me
"""

In [14]:
dataset = [
    {
        "text": human1,
        "label": 0
    },
    {
        "text": ai1,
        "label": 1
    },
    {
        "text": human2,
        "label": 0
    },
    {
        "text": ai2,
        "label": 1
    },
    {
        "text": human3,
        "label": 0
    },
    {
        "text": ai3,
        "label": 1
    },
    {
        "text": metin2_wikipedia,
        "label": 0
    },
    {
        "text": human4,
        "label": 0
    }
]

dataset_ffn = dataset.copy()


for data in dataset:
    data["text"] = clean_text(data["text"])

print(dataset)


for data in dataset:
    data.update(encode_text([data["text"]], tokenizer))

print(dataset)


for data in dataset_ffn:
    data["text"] = clean_text(data["text"])

print(dataset_ffn)


dataset_ffn = preprocess_data_ffn(dataset_ffn)

print(dataset_ffn)


[{'text': 'moral obligation everyone avoid bloodshed aggression everywhere stack overflow collaborative effort works best completely apolitical e serves people everywhere equally fully aware good bad please entertain illusion example maybe russian military attack systems programmed help knowledge stack overflow might increased efficacy given large amount knowledge stored large number visits stack overflow might even likely know exactly mind example stack overflow temporarily switch russian language localized site restrict access english main site based geography would sympathies would say discuss consider would put russia one level example iran well think might kind appropriate given stack overflow inc private owned company think likely though morality companies yet developed hand stack overflow hardly something significant wars software used already written really want something countries government much bigger factor petition government use political influence citizen make small posi

In [15]:
predictions = []

for i in range(len(dataset)):
    for model in models:
        if model['name'] == 'Simple FFN':
            input_ids = np.expand_dims(dataset_ffn[i], axis=0)
            p = model['model'].predict(input_ids)
            prediction = {
                'model': model['name'],
                'prediction': 0 if p[0] < 0.5 else 1,
                'label': dataset[i]['label'],
                'accuracy': 1 - p[0] if p[0] < 0.5 else p[0]
            }
            
        else:
            p = model['model'].predict((dataset[i]['input_ids'], dataset[i]['attention_masks']))
            prediction = {
                'model': model['name'],
                'prediction': 0 if p[0][0] < 0.5 else 1,
                'label': dataset[i]['label'],
                'accuracy': 1 - p[0][0] if p[0][0] < 0.5 else p[0][0]
            }
        predictions.append(prediction)
    




In [16]:

for prediction in predictions:
    print('Model: ' + prediction['model'])
    print('Prediction: ' + str(prediction['prediction']))
    print('Real: ' + str(prediction['label']))
    print('Accuracy: ' + str(prediction['accuracy']))
    print('------------------------')


Model: BERT Finetuned
Prediction: 0
Real: 0
Accuracy: 0.9981754288310185
------------------------
Model: Contrastive
Prediction: 0
Real: 0
Accuracy: 0.8021265119314194
------------------------
Model: Simple FFN
Prediction: 0
Real: 0
Accuracy: [0.7062034]
------------------------
Model: BERT Finetuned
Prediction: 1
Real: 1
Accuracy: 0.99965775
------------------------
Model: Contrastive
Prediction: 1
Real: 1
Accuracy: 0.9955323
------------------------
Model: Simple FFN
Prediction: 0
Real: 1
Accuracy: [0.9821847]
------------------------
Model: BERT Finetuned
Prediction: 1
Real: 0
Accuracy: 0.65478605
------------------------
Model: Contrastive
Prediction: 0
Real: 0
Accuracy: 0.8758720308542252
------------------------
Model: Simple FFN
Prediction: 0
Real: 0
Accuracy: [0.99861825]
------------------------
Model: BERT Finetuned
Prediction: 1
Real: 1
Accuracy: 0.9996568
------------------------
Model: Contrastive
Prediction: 1
Real: 1
Accuracy: 0.87285984
------------------------
Model: S