<a href="https://colab.research.google.com/github/apschlissel/w266-final-project/blob/main/T5_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T5 Classification - Reddit Data

Contains:
* T5 model creation
* Training on different n-sizes
* Many different dataset configurations

In [1]:
!pip install -q transformers

In [2]:
!pip install simpletransformers



In [24]:
from __future__ import print_function
import ipywidgets as widgets
from transformers import pipeline
from simpletransformers.t5 import T5Model, T5Args
from transformers.data.metrics.squad_metrics import compute_exact, compute_f1
import pandas as pd
import logging
import numpy as np
import torch
from tqdm.notebook import tqdm
from transformers import BertTokenizer
from torch.utils.data import TensorDataset
from transformers import BertForSequenceClassification
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import json
import re
import random
import math
from statistics import mean
from bs4 import BeautifulSoup
# Pull reddit data from reddit api
import requests
pd.options.display.max_colwidth = 1000
pd.set_option('display.max_rows', 100)

## Load Reddit Datasets

In [4]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [5]:
#dataset configurations

df_similar_sub_5000 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/similar_subreddits_5000_df.csv')

df_similar_sub_1000 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/similar_subreddits_1000_df.csv')

df_similar_sub_500 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/similar_subreddits_500_df.csv')

df_random_sub_5000 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/random_subreddits_5000_df.csv')

df_random_sub_1000 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/random_subreddits_1000_df.csv')

df_random_sub_500 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/random_subreddits_500_df.csv')

df_handpicked_sub_5000 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/handpicked_subreddits_5000_df.csv')

df_handpicked_sub_1000 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/handpicked_subreddits_1000_df.csv')

df_handpicked_sub_500 = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/handpicked_subreddits_500_df.csv')



In [6]:
#split into train/val/test

def train_val_test_split(df):

  train, val = train_test_split(df.index.values, 
                                test_size=0.20, 
                                random_state=42, 
                                stratify=df.subreddit.values)
  
  val, test = train_test_split(val, test_size=0.5, random_state=42)

  df_train = df[df.index.isin(train)]
  df_val = df[df.index.isin(val)]
  df_test = df[df.index.isin(test)]

  return df_train, df_val, df_test



In [7]:
df_similar_sub_5000_train, df_similar_sub_5000_val, df_similar_sub_5000_test = train_val_test_split(df_similar_sub_5000)
df_similar_sub_1000_train, df_similar_sub_1000_val, df_similar_sub_1000_test = train_val_test_split(df_similar_sub_1000)
df_similar_sub_500_train, df_similar_sub_500_val, df_similar_sub_500_test = train_val_test_split(df_similar_sub_500)

df_random_sub_5000_train, df_random_sub_5000_val, df_random_sub_5000_test = train_val_test_split(df_random_sub_5000)
df_random_sub_1000_train, df_random_sub_1000_val, df_random_sub_1000_test = train_val_test_split(df_random_sub_1000)
df_random_sub_500_train, df_random_sub_500_val, df_random_sub_500_test = train_val_test_split(df_random_sub_500)

df_handpicked_sub_5000_train, df_handpicked_sub_5000_val, df_handpicked_sub_5000_test = train_val_test_split(df_handpicked_sub_5000)
df_handpicked_sub_1000_train, df_handpicked_sub_1000_val, df_handpicked_sub_1000_test = train_val_test_split(df_handpicked_sub_1000)
df_handpicked_sub_500_train, df_handpicked_sub_500_val, df_handpicked_sub_500_test = train_val_test_split(df_handpicked_sub_500)

## Load Slang Data
Data are a direct translation of slang to their non-slang synonym

In [8]:
slang = pd.read_csv('/content/gdrive/MyDrive/w266/final_project/slang_dictionary_final.csv')
slang.head(20)

Unnamed: 0,Slang Term,Meaning
0,@@-o,tattletale
1,@teotd,at the end of the day
2,^5,high five
3,0773h,hello
4,10m,10 minutes
5,10q,thank you
6,10x,10 times
7,1337,leet speak
8,143,i love you
9,1up,extra life


In [9]:
slangit_dict = slang.set_index('Slang Term').to_dict()
slangit_dict = slangit_dict['Meaning']

In [10]:
keys_values = slangit_dict.items()
slangit_dict = {str(key): str(value) for key, value in keys_values}

In [11]:
def slang_lookup(text, dictionary):
    
    #try to make any case
    #try to make it only words with spaces
    text = text.lower()
    pattern = re.compile(r'(?<!\w)(' + '|'.join(re.escape(key) for key in slangit_dict.keys()) + r')(?!\w)')
    result = pattern.sub(lambda x: dictionary[x.group()], text)

    return result

In [12]:
my_text = 'I watched the UNC game at a bar b/c YOLO, FTW'

print(slang_lookup(my_text, slangit_dict))

i watched the unc game at a bar because you only live once, for the win


In [13]:
df_similar_sub_5000_train.head()

Unnamed: 0,subreddit,text
0,gaming,'Four Friends' by SpaceCaptSteve
1,gaming,Your Welcome
2,gaming,Mum surprised me with this for my birthday.
3,gaming,ASUS Announces GeForce GTX 970 Turbo Graphics Card
4,gaming,"Read and give the reviews and rating on app games, How's your experience"


In [39]:
#apply deslanging to all the train sets

df_similar_sub_5000_train['text_deslanged'] = df_similar_sub_5000_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_similar_sub_1000_train['text_deslanged'] = df_similar_sub_1000_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_similar_sub_500_train['text_deslanged'] = df_similar_sub_500_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))

df_random_sub_5000_train['text_deslanged'] = df_random_sub_5000_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_random_sub_1000_train['text_deslanged'] = df_random_sub_1000_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_random_sub_500_train['text_deslanged'] = df_random_sub_500_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))

df_handpicked_sub_5000_train['text_deslanged'] = df_handpicked_sub_5000_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_handpicked_sub_1000_train['text_deslanged'] = df_handpicked_sub_1000_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_handpicked_sub_500_train['text_deslanged'] = df_handpicked_sub_500_train['text'].apply(lambda x: slang_lookup(x, slangit_dict))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[r

In [40]:
#do same thing to val
df_similar_sub_5000_val['text_deslanged'] = df_similar_sub_5000_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_similar_sub_1000_val['text_deslanged'] = df_similar_sub_1000_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_similar_sub_500_val['text_deslanged'] = df_similar_sub_500_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))

df_random_sub_5000_val['text_deslanged'] = df_random_sub_5000_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_random_sub_1000_val['text_deslanged'] = df_random_sub_1000_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_random_sub_500_val['text_deslanged'] = df_random_sub_500_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))

df_handpicked_sub_5000_val['text_deslanged'] = df_handpicked_sub_5000_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_handpicked_sub_1000_val['text_deslanged'] = df_handpicked_sub_1000_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_handpicked_sub_500_val['text_deslanged'] = df_handpicked_sub_500_val['text'].apply(lambda x: slang_lookup(x, slangit_dict))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_

In [41]:
#do same thing to test
df_similar_sub_5000_test['text_deslanged'] = df_similar_sub_5000_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_similar_sub_1000_test['text_deslanged'] = df_similar_sub_1000_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_similar_sub_500_test['text_deslanged'] = df_similar_sub_500_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))

df_random_sub_5000_test['text_deslanged'] = df_random_sub_5000_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_random_sub_1000_test['text_deslanged'] = df_random_sub_1000_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_random_sub_500_test['text_deslanged'] = df_random_sub_500_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))

df_handpicked_sub_5000_test['text_deslanged'] = df_handpicked_sub_5000_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_handpicked_sub_1000_test['text_deslanged'] = df_handpicked_sub_1000_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))
df_handpicked_sub_500_test['text_deslanged'] = df_handpicked_sub_500_test['text'].apply(lambda x: slang_lookup(x, slangit_dict))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_

In [None]:
df_random_sub_5000_val.head()

In [28]:
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

#prep data for t5 - NOT DESLANGED
def prep_data(df, text, label):

  data = df[[text, label]]
  df['prefix'] = 'multilabel classification'
  df = df[['prefix', text, label]]
  df = df.rename(columns={'prefix': 'prefix', text: 'input_text', label: 'target_text'})
  
  return df

#train
df_similar_sub_5000_train_prep = prep_data(df_similar_sub_5000_train, 'text', 'subreddit')
df_similar_sub_1000_train_prep = prep_data(df_similar_sub_1000_train, 'text', 'subreddit')
df_similar_sub_500_train_prep = prep_data(df_similar_sub_500_train, 'text', 'subreddit')

df_random_sub_5000_train_prep = prep_data(df_random_sub_5000_train, 'text', 'subreddit')
df_random_sub_1000_train_prep = prep_data(df_random_sub_1000_train, 'text', 'subreddit')
df_random_sub_500_train_prep = prep_data(df_random_sub_500_train, 'text', 'subreddit')

df_handpicked_sub_5000_train_prep = prep_data(df_handpicked_sub_5000_train, 'text', 'subreddit')
df_handpicked_sub_1000_train_prep = prep_data(df_handpicked_sub_1000_train, 'text', 'subreddit')
df_handpicked_sub_500_train_prep = prep_data(df_handpicked_sub_500_train, 'text', 'subreddit')

#val
df_similar_sub_5000_val_prep = prep_data(df_similar_sub_5000_val, 'text', 'subreddit')
df_similar_sub_1000_val_prep = prep_data(df_similar_sub_1000_val, 'text', 'subreddit')
df_similar_sub_500_val_prep = prep_data(df_similar_sub_500_val, 'text', 'subreddit')

df_random_sub_5000_val_prep = prep_data(df_random_sub_5000_val, 'text', 'subreddit')
df_random_sub_1000_val_prep = prep_data(df_random_sub_1000_val, 'text', 'subreddit')
df_random_sub_500_val_prep = prep_data(df_random_sub_500_val, 'text', 'subreddit')

df_handpicked_sub_5000_val_prep = prep_data(df_handpicked_sub_5000_val, 'text', 'subreddit')
df_handpicked_sub_1000_val_prep = prep_data(df_handpicked_sub_1000_val, 'text', 'subreddit')
df_handpicked_sub_500_val_prep = prep_data(df_handpicked_sub_500_val, 'text', 'subreddit')

#test
df_similar_sub_5000_test_prep = prep_data(df_similar_sub_5000_test, 'text', 'subreddit')
df_similar_sub_1000_test_prep = prep_data(df_similar_sub_1000_test, 'text', 'subreddit')
df_similar_sub_500_test_prep = prep_data(df_similar_sub_500_test, 'text', 'subreddit')

df_random_sub_5000_test_prep = prep_data(df_random_sub_5000_test, 'text', 'subreddit')
df_random_sub_1000_test_prep = prep_data(df_random_sub_1000_test, 'text', 'subreddit')
df_random_sub_500_test_prep = prep_data(df_random_sub_500_test, 'text', 'subreddit')

df_handpicked_sub_5000_test_prep = prep_data(df_handpicked_sub_5000_test, 'text', 'subreddit')
df_handpicked_sub_1000_test_prep = prep_data(df_handpicked_sub_1000_test, 'text', 'subreddit')
df_handpicked_sub_500_test_prep = prep_data(df_handpicked_sub_500_test, 'text', 'subreddit')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == "__main__":


In [42]:
#train
df_similar_sub_5000_train_prep_deslang = prep_data(df_similar_sub_5000_train, 'text_deslanged', 'subreddit')
df_similar_sub_1000_train_prep_deslang = prep_data(df_similar_sub_1000_train, 'text_deslanged', 'subreddit')
df_similar_sub_500_train_prep_deslang = prep_data(df_similar_sub_500_train, 'text_deslanged', 'subreddit')

df_random_sub_5000_train_prep_deslang = prep_data(df_random_sub_5000_train, 'text_deslanged', 'subreddit')
df_random_sub_1000_train_prep_deslang = prep_data(df_random_sub_1000_train, 'text_deslanged', 'subreddit')
df_random_sub_500_train_prep_deslang = prep_data(df_random_sub_500_train, 'text_deslanged', 'subreddit')

df_handpicked_sub_5000_train_prep_deslang = prep_data(df_handpicked_sub_5000_train, 'text_deslanged', 'subreddit')
df_handpicked_sub_1000_train_prep_deslang = prep_data(df_handpicked_sub_1000_train, 'text_deslanged', 'subreddit')
df_handpicked_sub_500_train_prep_deslang = prep_data(df_handpicked_sub_500_train, 'text_deslanged', 'subreddit')

#val
df_similar_sub_5000_val_prep_deslang = prep_data(df_similar_sub_5000_val, 'text_deslanged', 'subreddit')
df_similar_sub_1000_val_prep_deslang = prep_data(df_similar_sub_1000_val, 'text_deslanged', 'subreddit')
df_similar_sub_500_val_prep_deslang = prep_data(df_similar_sub_500_val, 'text_deslanged', 'subreddit')

df_random_sub_5000_val_prep_deslang = prep_data(df_random_sub_5000_val, 'text_deslanged', 'subreddit')
df_random_sub_1000_val_prep_deslang = prep_data(df_random_sub_1000_val, 'text_deslanged', 'subreddit')
df_random_sub_500_val_prep_deslang = prep_data(df_random_sub_500_val, 'text_deslanged', 'subreddit')

df_handpicked_sub_5000_val_prep_deslang = prep_data(df_handpicked_sub_5000_val, 'text_deslanged', 'subreddit')
df_handpicked_sub_1000_val_prep_deslang = prep_data(df_handpicked_sub_1000_val, 'text_deslanged', 'subreddit')
df_handpicked_sub_500_val_prep_deslang = prep_data(df_handpicked_sub_500_val, 'text_deslanged', 'subreddit')

#test
df_similar_sub_5000_test_prep_deslang = prep_data(df_similar_sub_5000_test, 'text_deslanged', 'subreddit')
df_similar_sub_1000_test_prep_deslang = prep_data(df_similar_sub_1000_test, 'text_deslanged', 'subreddit')
df_similar_sub_500_test_prep_deslang = prep_data(df_similar_sub_500_test, 'text_deslanged', 'subreddit')

df_random_sub_5000_test_prep_deslang = prep_data(df_random_sub_5000_test, 'text_deslanged', 'subreddit')
df_random_sub_1000_test_prep_deslang = prep_data(df_random_sub_1000_test, 'text_deslanged', 'subreddit')
df_random_sub_500_test_prep_deslang = prep_data(df_random_sub_500_test, 'text_deslanged', 'subreddit')

df_handpicked_sub_5000_test_prep_deslang = prep_data(df_handpicked_sub_5000_test, 'text_deslanged', 'subreddit')
df_handpicked_sub_1000_test_prep_deslang = prep_data(df_handpicked_sub_1000_test, 'text_deslanged', 'subreddit')
df_handpicked_sub_500_test_prep_deslang = prep_data(df_handpicked_sub_500_test, 'text_deslanged', 'subreddit')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == "__main__":


In [17]:
torch.cuda.empty_cache()
torch.cuda.memory_summary(device=None, abbreviated=False)



In [20]:
model_args = T5Args()
model_args.num_train_epochs = 1
model_args.no_save = True
model_args.evaluate_generated_text = True
model_args.evaluate_during_training = True
model_args.evaluate_during_training_verbose = True
model_args.overwrite_output_dir = True
torch.cuda.memory_summary(device=None, abbreviated=False)
model_args.per_gpu_train_batch_size = 128

model = T5Model("t5", "t5-base", args=model_args, use_cuda=True)


In [None]:
def count_matches(labels, preds):
    print(labels)
    print(preds)
    return sum([1 if label == pred else 0 for label, pred in zip(labels, preds)])

In [22]:
def f1(truths, preds):
    return mean([compute_f1(truth, pred) for truth, pred in zip(truths, preds)])

In [23]:
#model.train_model(train_df, eval_data=eval_df, matches=count_matches)
model.train_model(df_similar_sub_500_train_prep, eval_data=df_similar_sub_500_val_prep, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/2000 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1282000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

NameError: ignored

In [26]:
t5_similar_500_slang = model.eval_model(df_similar_sub_500_test_prep, f1=f1)
print(t5_similar_500_slang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Running Evaluation:   0%|          | 0/32 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.27359915620763786, 'f1': 0.688}


{'eval_loss': 0.27359915620763786, 'f1': 0.688}


In [29]:
model.train_model(df_random_sub_500_train_prep, eval_data=df_random_sub_500_val_prep, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/2000 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1282000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.19226897228509188, 'f1': 0.852}
INFO:simpletransformers.t5.t5_model: Training of t5-base model complete. Saved to outputs/.


(250,
 {'global_step': [250],
  'eval_loss': [0.19226897228509188],
  'train_loss': [0.4225143790245056],
  'f1': [0.852]})

In [30]:
t5_random_500_slang = model.eval_model(df_random_sub_500_test_prep, f1=f1)
print(t5_random_500_slang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Running Evaluation:   0%|          | 0/32 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.1826236853376031, 'f1': 0.872}


{'eval_loss': 0.1826236853376031, 'f1': 0.872}


In [31]:
model.train_model(df_handpicked_sub_500_train_prep, eval_data=df_handpicked_sub_500_val_prep, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/2000 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1282000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.23965541110374033, 'f1': 0.644}
INFO:simpletransformers.t5.t5_model: Training of t5-base model complete. Saved to outputs/.


(250,
 {'global_step': [250],
  'eval_loss': [0.23965541110374033],
  'train_loss': [0.2731221914291382],
  'f1': [0.644]})

In [32]:
t5_handpicked_500_slang = model.eval_model(df_handpicked_sub_500_test_prep, f1=f1)
print(t5_handpicked_500_slang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Running Evaluation:   0%|          | 0/32 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.2567822120618075, 'f1': 0.616}


{'eval_loss': 0.2567822120618075, 'f1': 0.616}


In [33]:
model.train_model(df_similar_sub_1000_train_prep, eval_data=df_similar_sub_1000_val_prep, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/4000 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1284000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/500 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128500


Generating outputs:   0%|          | 0/63 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



Decoding outputs:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.30730720121590865, 'f1': 0.654}
INFO:simpletransformers.t5.t5_model: Training of t5-base model complete. Saved to outputs/.


(500,
 {'global_step': [500],
  'eval_loss': [0.30730720121590865],
  'train_loss': [0.05762629583477974],
  'f1': [0.654]})

In [34]:
t5_similar_1000_slang = model.eval_model(df_similar_sub_1000_test_prep, f1=f1)
print(t5_similar_1000_slang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128500


Running Evaluation:   0%|          | 0/63 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/63 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.33911291660652276, 'f1': 0.612}


{'eval_loss': 0.33911291660652276, 'f1': 0.612}


In [35]:
model.train_model(df_random_sub_1000_train_prep, eval_data=df_random_sub_1000_val_prep, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/4000 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1284000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128500


Generating outputs:   0%|          | 0/63 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.31732698535871884, 'f1': 0.766}
INFO:simpletransformers.t5.t5_model: Training of t5-base model complete. Saved to outputs/.


(500,
 {'global_step': [500],
  'eval_loss': [0.31732698535871884],
  'train_loss': [0.6890969276428223],
  'f1': [0.766]})

In [36]:
t5_random_1000_slang = model.eval_model(df_random_sub_1000_test_prep, f1=f1)
print(t5_random_1000_slang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128500


Running Evaluation:   0%|          | 0/63 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/63 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.2891825929520622, 'f1': 0.75}


{'eval_loss': 0.2891825929520622, 'f1': 0.75}


In [37]:
model.train_model(df_handpicked_sub_1000_train_prep, eval_data=df_handpicked_sub_1000_val_prep, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/4000 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1284000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128500


Generating outputs:   0%|          | 0/63 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.24246613243742596, 'f1': 0.606}
INFO:simpletransformers.t5.t5_model: Training of t5-base model complete. Saved to outputs/.


(500,
 {'global_step': [500],
  'eval_loss': [0.24246613243742596],
  'train_loss': [0.25626900792121887],
  'f1': [0.606]})

In [38]:
t5_handpicked_1000_slang = model.eval_model(df_handpicked_sub_1000_test_prep, f1=f1)
print(t5_handpicked_1000_slang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128500


Running Evaluation:   0%|          | 0/63 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/63 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/500 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.23307938087317678, 'f1': 0.644}


{'eval_loss': 0.23307938087317678, 'f1': 0.644}


### Deslanged Text

In [43]:
model.train_model(df_similar_sub_500_train_prep_deslang, eval_data=df_similar_sub_500_val_prep_deslang, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/2000 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1282000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.3060232256539166, 'f1': 0.688}
INFO:simpletransformers.t5.t5_model: Training of t5-base model complete. Saved to outputs/.


(250,
 {'global_step': [250],
  'eval_loss': [0.3060232256539166],
  'train_loss': [0.061171840876340866],
  'f1': [0.688]})

In [44]:
t5_similar_500_deslang = model.eval_model(df_similar_sub_500_test_prep_deslang, f1=f1)
print(t5_similar_500_deslang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Running Evaluation:   0%|          | 0/32 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.2895763334527146, 'f1': 0.68}


{'eval_loss': 0.2895763334527146, 'f1': 0.68}


In [45]:
model.train_model(df_random_sub_500_train_prep_deslang, eval_data=df_random_sub_500_val_prep_deslang, f1=f1)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/2000 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_1282000
INFO:simpletransformers.t5.t5_model: Training started


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.10434118536068127, 'f1': 0.936}
INFO:simpletransformers.t5.t5_model: Training of t5-base model complete. Saved to outputs/.


(250,
 {'global_step': [250],
  'eval_loss': [0.10434118536068127],
  'train_loss': [0.029462961480021477],
  'f1': [0.936]})

In [46]:
t5_random_500_deslang = model.eval_model(df_random_sub_500_test_prep_deslang, f1=f1)
print(t5_random_500_deslang)

INFO:simpletransformers.t5.t5_utils: Creating features from dataset file at cache_dir/


  0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_utils: Saving features into cached file cache_dir/t5-base_cached_128250


Running Evaluation:   0%|          | 0/32 [00:00<?, ?it/s]

Generating outputs:   0%|          | 0/32 [00:00<?, ?it/s]

Decoding outputs:   0%|          | 0/250 [00:00<?, ?it/s]

INFO:simpletransformers.t5.t5_model:{'eval_loss': 0.16645032702945173, 'f1': 0.916}


{'eval_loss': 0.16645032702945173, 'f1': 0.916}


In [None]:
model.train_model(df_handpicked_sub_500_train_prep_deslang, eval_data=df_handpicked_sub_500_val_prep_deslang, f1=f1)

In [None]:
t5_handpicked_500_deslang = model.eval_model(df_handpicked_sub_500_test_prep_deslang, f1=f1)
print(t5_handpicked_500_deslang)

In [None]:
model.train_model(df_similar_sub_1000_train_prep_deslang, eval_data=df_similar_sub_1000_val_prep_deslang, f1=f1)

In [None]:
t5_similar_1000_deslang = model.eval_model(df_similar_sub_1000_test_prep_deslang, f1=f1)
print(t5_similar_1000_deslang)

In [None]:
model.train_model(df_random_sub_1000_train_prep_deslang, eval_data=df_random_sub_1000_val_prep_deslang, f1=f1)

In [None]:
t5_random_1000_deslang = model.eval_model(df_random_sub_1000_test_prep_deslang, f1=f1)
print(t5_random_1000_deslang)

In [None]:
model.train_model(df_handpicked_sub_500_train_prep_deslang, eval_data=df_handpicked_sub_500_val_prep_deslang, f1=f1)

In [None]:
t5_handpicked_500_deslang = model.eval_model(df_handpicked_sub_500_test_prep_deslang, f1=f1)
print(t5_handpicked_500_deslang)

In [31]:
#get predictions included in df
preds = model.predict(list(eval_df['input_text']))
eval_df['t5_prediction'] = preds

Generating outputs:   0%|          | 0/63 [00:00<?, ?it/s]

`prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular
`__call__` method to prepare your inputs and the tokenizer under the `as_target_tokenizer` context manager to prepare
your targets.

Here is a short example:

model_inputs = tokenizer(src_texts, ...)
with tokenizer.as_target_tokenizer():
    labels = tokenizer(tgt_texts, ...)
model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice.
For a more complete example, see the implementation of `prepare_seq2seq_batch`.



Decoding outputs:   0%|          | 0/500 [00:00<?, ?it/s]

In [32]:
eval_df['t5_prediction'] = preds
eval_df.head()

Unnamed: 0,prefix,input_text,target_text,t5_prediction
1,multilabel classification,The top lane is in boys.,wallstreetbets,GenZ
13,multilabel classification,"Posters can record the trades they plan to make the evening prior to active next day, with what kind onlyfans destroy or counter an opponent's card with your own, the parameters, equity etc\n\nAfter the day’s trading is i can't deal with it, then an update is i can't deal with it to capture the outcomes per a reddit user. Over time, whatever capture world health organization is ‘the best’, ranked by an agreed metric allow natural death this becomes a jump off point to like into what truly talented traders do differently. \n\nThere can be the option to just select buy versus sell per position, for those world health organization don’t want to share money information.\n\nThoughts..? I think it would be awesome",wallstreetbets,wallstreetbets
20,multilabel classification,"The DJIA index, since its peak onlyfans November 2021, aggravated with massive purchases appearing that propelled it to its all-time high in December. Every time you lean in, massive purchases appear. Also in February allow natural death March 2022. The wins above replacement scenario is sending international capital into the U.S. stock market. \n\nAre whatever in a phase onlyfans accumulation allow natural death consolidation for new all-time highs? And also a historic vertical positive outlook rally? \n\n&to be loud and angry;#x200B;\n\nhypertext transfer protocol secure://preview.redd.it/qh29qh29zkr81.png?width=1299&to be loud and angry;format=png&to be loud and angry;auto=webp&to be loud and angry;s=ba5e98c3d3f5f719275ed049291cbdf27d82386f",wallstreetbets,wallstreetbets
21,multilabel classification,"Hi there, I’m a studying international business management at university, I antemeridian doing my dissertation on trading in the stock market. I antemeridian investigating into the causes onlyfans the GameStop short squeeze allow natural death how social media impacted this. Any responses to the survey would be greatly appreciated. :) :)[Survey](hypertext transfer protocol secure://docs.to search the internet.com/forms/defense/e/1FAIpQLScL9QeHPoqvOlpM0XEP2W7AlNerOUUk-B2i637S0flq6PXu_g/viewform)",wallstreetbets,wallstreetbets
34,multilabel classification,"Holding about 800 shares. Forecasting the rest onlyfans the year allow natural death seeing something I don’t like, but maybe one onlyfans you is better informed?\n\nNVDA is selling between 25% allow natural death 57% onlyfans all graphics cards to cryptominers. In August moves away from proof onlyfans a scripted event, meaning know more mining it. This is to get rid onlyfans the energy usage to are you in? the currency. I’very emotional been watching for 3 months allow natural death the amount onlyfans cards showing up on eBay is growing drastically allow natural death getting cheaper. Miners are trying to cash out their cards before they take a big hit in value. \n\nNVDA has enjoyed stock sellouts because onlyfans chip shortages allow natural death I feel mainly because onlyfans miners. The cost onlyfans the highest end GPU’s has almost tripled because onlyfans this demand. NVDA has been trying to to permanently gain additional mana production to capitalize allow natural ...",wallstreetbets,unpopularopinion
