# DAIGT Notebook

In this notebook we will try to predict if a `Text`  is made by a `LLM` or a `Human`. The dataset is composed by ~46k Rows 
containing `Text` and `Label` columns. The `Label` column is the target variable and it is a binary variable.

Let's see the data

In [31]:
# We are in kaggle
import os
import sys

# Read all files in the directory
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))




In [21]:
# Load the data from the folder

import os
import numpy as np
import pandas as pd
# Load the csv
dataset = pd.read_csv('data/train_v2_drcat_02.csv', sep=',', header=0)



In [22]:
# Shape
print(dataset.shape)

# Head
print(dataset.head(20))

(44868, 5)
                                                 text  label  \
0   Phones\n\nModern humans today are always on th...      0   
1   This essay will explain if drivers should or s...      0   
2   Driving while the use of cellular devices\n\nT...      0   
3   Phones & Driving\n\nDrivers should not be able...      0   
4   Cell Phone Operation While Driving\n\nThe abil...      0   
5   Cell phone use should not be legal while drivi...      0   
6   Phones and Driving\n\nDriving is a good way to...      0   
7   PHONES AND DRIVING\n\nIn this world in which w...      0   
8   People are debating whether if drivers should ...      0   
9   Texting and driving\n\nOver half of drivers in...      0   
10  explain if drivers should or should not be abl...      0   
11  Should drivers be able to be on their phone wh...      0   
12  Everyone knows that texting and driving is a t...      0   
13  Operating a motor vehicle while on your cell p...      0   
14  Phones & Driving Essay\n\

In [23]:
# Replace \n with space
dataset['text'] = dataset['text'].str.replace('\n', ' ')

# Replace \r with space
dataset['text'] = dataset['text'].str.replace('\r', ' ')

# Replace \t with space
dataset['text'] = dataset['text'].str.replace('\t', ' ')

In [24]:
# Print the head
print(dataset.head(20))

                                                 text  label  \
0   Phones  Modern humans today are always on thei...      0   
1   This essay will explain if drivers should or s...      0   
2   Driving while the use of cellular devices  Tod...      0   
3   Phones & Driving  Drivers should not be able t...      0   
4   Cell Phone Operation While Driving  The abilit...      0   
5   Cell phone use should not be legal while drivi...      0   
6   Phones and Driving  Driving is a good way to g...      0   
7   PHONES AND DRIVING  In this world in which we ...      0   
8   People are debating whether if drivers should ...      0   
9   Texting and driving  Over half of drivers in t...      0   
10  explain if drivers should or should not be abl...      0   
11  Should drivers be able to be on their phone wh...      0   
12  Everyone knows that texting and driving is a t...      0   
13  Operating a motor vehicle while on your cell p...      0   
14  Phones & Driving Essay  I believe th

In [25]:
# Remove source and rdizzl3_seven
dataset = dataset.drop(['source', 'RDizzl3_seven'], axis=1)

# Print the head
print(dataset.head(20))

                                                 text  label  \
0   Phones  Modern humans today are always on thei...      0   
1   This essay will explain if drivers should or s...      0   
2   Driving while the use of cellular devices  Tod...      0   
3   Phones & Driving  Drivers should not be able t...      0   
4   Cell Phone Operation While Driving  The abilit...      0   
5   Cell phone use should not be legal while drivi...      0   
6   Phones and Driving  Driving is a good way to g...      0   
7   PHONES AND DRIVING  In this world in which we ...      0   
8   People are debating whether if drivers should ...      0   
9   Texting and driving  Over half of drivers in t...      0   
10  explain if drivers should or should not be abl...      0   
11  Should drivers be able to be on their phone wh...      0   
12  Everyone knows that texting and driving is a t...      0   
13  Operating a motor vehicle while on your cell p...      0   
14  Phones & Driving Essay  I believe th

In [26]:
# Split the dataset into train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset['text'], dataset['label'], random_state=0)


In [27]:
# Print the shape
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)



(33651,)
(11217,)
(33651,)
(11217,)


In [28]:
# Print the head
print(X_train.head(20))

# Print the head
print(y_train.head(20))

23438    Distance learning school does not offer many p...
11452    I think the use of technology to read emotions...
42633    The United States of America has come a long w...
26408    I believe that community service should be opt...
4127     Should student projects be designed by teacher...
37741    Hey, I'm just a regular 8th grader, so bear wi...
25614    Making Decisions  When making a decision or as...
32430    Limiting car usage has several notable benefit...
42225    In a society where things are always changing ...
16930    Although there are many negative aspects of dr...
26560    Hey, ya'll! Today, I'm gonna talk about how ha...
41907    Title: The Benefits of Limiting Car Usage  It’...
24047    Many students rather take online classes so th...
26468    Hey, I'm super excited to be writing this essa...
32207    The impact of having a positive attitude on ac...
41892     Dear State Senator,  Presently, the United St...
14614    Dear principle.  I Think community service is .

In [29]:
# Import libraries
import tensorflow as tf
import tensorflow_hub as hub

# if not installed tensorflow_text
# %pip install tensorflow_text

# Import tensorflow_text
import tensorflow_text as text

In [30]:
# Creiamo il modello

bert_preprocess_model = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")

bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4", trainable=True)

# Create the model
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessed_text = bert_preprocess_model(text_input)

# Preprocessed text praticamente fa le seguenti cose:
# - Tokenizzazione
# - Aggiunta dei token speciali
# - Padding
# - Codifica degli ID dei token
# - Creazione di maschere di attenzione

outputs = bert_encoder(preprocessed_text)

# Layers della NN
L = tf.keras.layers.Dropout(0.1, name='dropout')(outputs['pooled_output'])
L = tf.keras.layers.Dense(1, activation='sigmoid', name='classifier')(L)

# Usiamo input e output per creare il modello
model = tf.keras.Model(inputs=[text_input], outputs=[L])

# Print the model summary
print(model.summary())

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])

# Train the model
history = model.fit(x=X_train,
                    y=y_train,
                    validation_data=(X_test, y_test),
                    batch_size=32,
                    epochs=2)

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer (KerasLayer)       {'input_mask': (Non  0           ['text[0][0]']                   
                                e, 128),                                                          
                                 'input_type_ids':                                                
                                (None, 128),                                                      
                                 'input_word_ids':                                                
                                (None, 128)}                                                  

ResourceExhaustedError: Graph execution error:

Detected at node 'gradients/transformer/layer_4/activation/Gelu/Pow_grad/Pow' defined at (most recent call last):
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel_launcher.py", line 17, in <module>
      app.launch_new_instance()
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\traitlets\config\application.py", line 1046, in launch_instance
      app.start()
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel\kernelapp.py", line 736, in start
      self.io_loop.start()
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\tornado\platform\asyncio.py", line 195, in start
      self.asyncio_loop.run_forever()
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 596, in run_forever
      self._run_once()
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 1890, in _run_once
      handle._run()
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\asyncio\events.py", line 80, in _run
      self._context.run(self._callback, *self._args)
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel\kernelbase.py", line 516, in dispatch_queue
      await self.process_one()
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel\kernelbase.py", line 505, in process_one
      await dispatch(*args)
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel\kernelbase.py", line 412, in dispatch_shell
      await result
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel\kernelbase.py", line 740, in execute_request
      reply_content = await reply_content
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel\ipkernel.py", line 422, in do_execute
      res = shell.run_cell(
    File "C:\Users\danie\AppData\Roaming\Python\Python39\site-packages\ipykernel\zmqshell.py", line 546, in run_cell
      return super().run_cell(*args, **kwargs)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3024, in run_cell
      result = self._run_cell(
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3079, in _run_cell
      result = runner(coro)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3284, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3466, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\IPython\core\interactiveshell.py", line 3526, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "C:\Users\danie\AppData\Local\Temp\ipykernel_28196\60729901.py", line 36, in <module>
      history = model.fit(x=X_train,
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\training.py", line 1564, in fit
      tmp_logs = self.train_function(iterator)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\training.py", line 1160, in train_function
      return step_function(self, iterator)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\training.py", line 1146, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\training.py", line 1135, in run_step
      outputs = model.train_step(data)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\training.py", line 997, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 576, in minimize
      grads_and_vars = self._compute_gradients(
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 634, in _compute_gradients
      grads_and_vars = self._get_gradients(
    File "c:\Users\danie\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 510, in _get_gradients
      grads = tape.gradient(loss, var_list, grad_loss)
Node: 'gradients/transformer/layer_4/activation/Gelu/Pow_grad/Pow'
failed to allocate memory
	 [[{{node gradients/transformer/layer_4/activation/Gelu/Pow_grad/Pow}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_train_function_75041]

In [None]:
# Plot the training and validation accuracy
import matplotlib.pyplot as plt
plt.plot(history.history['binary_accuracy'])
plt.plot(history.history['val_binary_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()



NameError: name 'history' is not defined