## Flan-T5 for Labeling and CAD Generation

In [1]:
import pandas as pd
from tqdm import tqdm
import json
import pickle
import time

In [7]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-xl")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xl", max_new_tokens = 500)
model.cuda()
inputs = tokenizer("A step by step recipe to make bolognese pasta:",
                   return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
construct_names = ['sexism', 'hatespeech']
runs = 5

### few-shot classification

In [4]:
def create_prompt_label_sexism(original_sentence):
    
    prompt = 'Your task is to annotate if a piece of text is sexist or not. Read on for detailed explanations. \
            Something can be sexist: \
            because of its content (what the speaker believes) \
            because of its phrasing (because of the speaker’s choice of words) \
            Sexism due to content. A message may be sexist because of its content--in other words, because of what the speaker believes. For example, the speaker may express sexist attitudes towards gender inequality, either endorsing it (e.g. “some jobs are best left to men”), or antagonizing it (e.g. “the pay gap between genders does not exist, feminists should stop complaining”). Also, the speaker may express stereotypes (how genders are traditionally seen and compared to each other) and behavioral expectations (how individuals of a gender should behave according to traditional views). Sexism may also include positive stereotypes (e.g. “women are the best home cooks”), or target  men (e.g., “men should not cry”)! \
            Sexism due to phrasing. A message may be sexist simply because of how the speaker phrases it--independently from what general beliefs or attitudes the speaker holds. A message is sexist, for example, when it contains attacks, foul language, or derogatory depictions directed towards individuals because of their gender, e.g. by means of name-calling ("you bitch"), attacks ("I\'m going to kick her back to the kitchen"), objectification ("She\'s stupid but I\'d still do her"), inflammatory messages ("burn all women"). However, just because a message is aggressive or uses offensive language does not mean that it is sexist! \
            Sexist Examples: \
            “So I wore a dress and wig while bartending and when people asked what my costume was I said a bad driver #dassexist.” This post is sexist because it espouses gendered stereotypes, that women are allegedly not good at driving \
            “You’re a stupid bitch and I hope you die.” This post is sexist because of its phrasing, it contains a gendered slur (‘bitch’). \
            “Women’s sacrifices continue to ensure the future of this country’s sons.” This post espouses benevolent sexism. While such a sentiment might appear benign, benevolent sexism glorifies women and sees their worth only as extensions of their male relatives. \
            Non-sexist Examples \
            “F*ck you Jane ur so stupid” the content is offensive and Jane, the target, is likely a woman; however the message is not clearly sexist because it is unclear if the speaker’s animosity is because of Jane’s gender \
            With these instructions in mind and a piece of text, please reply with either of the three options and nothing else: 1) sexist, 2) non-sexist, 3) can\'t say. Please use the last option as little as possible and if you do, provide a one sentence explanation. Do not give an explanation if you pick \'sexist\' or \'non-sexist\'. We are ready to begin now. Is this sentence sexist based on the past guidelines: ' + original_sentence + 'Please reply with either of the three options and nothing else: 1) sexist, 2) non-sexist, 3) can\'t say.'
    return prompt

In [5]:
def create_prompt_label_hatespeech(original_sentence):
    
    prompt = 'In this task you have to determine if a piece of text contains hate speech or not. \
            The concept of hate speech refers to language or expression that aims to insult, offend, or threaten individuals or groups based on their identity. Different platforms, institutions, and scholars have offered slightly different definitions of hate speech. However, they share some commonalities, including that hate speech involves expressing something negative, such as contempt, disparagement, derogation, demonization, harm, or bias. The target of hate speech is the identity of a group, which can be based on ethnic origin, religion, gender, sexual orientation, race, ethnicity, nationality, or other characteristics.\
            Read the text carefully and decide if it contains hate speech.\
            If the text degrades, threatens, dehumanizes, or insults someone based on their race, religion, gender, sexual orientation, or any other protected characteristic, then mark it as hate speech.\
            If the text does not contain any hate speech, then mark it as not hateful.\
            Consider the context, speaker\'s intent, and audience, as hate speech can be explicit or implicit and can include sarcasm, humor, or irony.\
            If you are unsure whether a text contains hate speech, err on the side of caution and label it as hateful.\
            Aim for accuracy and consistency in your labeling.\
            Examples of content containing hate speech:\
            "All Muslims are terrorists." This post contains hate speech because it unfairly degrades an entire group of people based on their religion.\
            “I\m a huge fan of those that overtly express what we all think: jews are a plague and not welcome in our country.” This post contains hate speech because it promotes anti-Semitic beliefs while dehumanizing Jewish people.\
            "Transgender people are sick and perverted." This post contains hate speech because it stigmatizes a marginalized group of people based on their gender identity.\
            Examples of content not containing hate speech:\
            “I called him out for calling the new student a ‘tranny’.” This is counterspeech, i.e., it opposes the use of transphobic slurs rather than uses them to denigrate others.\
            With these instructions in mind and a piece of text, please reply with either of the three options and nothing else: 1) hate, 2) not hate, 3) can\'t say. Please use the last option as little as possible and if you do, provide a one sentence explanation. We are ready to begin now. Does this sentence contain hate speech based on the past guidelines: ' + original_sentence 
    return prompt

In [6]:
test_sets = {}
test_sets['hatespeech'] = ['in_domain', 'out_of_domain', 
             'out_of_domain_2', 'out_of_domain_3', 'out_of_domain_4', 
             'hatecheck']

test_sets['sexism'] = ['in_domain', 'out_of_domain', 
             'out_of_domain_2', 'hatecheck', 'out_of_domain_3',# 'out_of_domain_4', 
             ]

test_set_data = {}

for construct in construct_names:
    test_set_data[construct] = {}
    for test_set in test_sets[construct]:
        test_path = '../../data/%s/test/%s.csv' %(construct, test_set)
        test_set_data[construct][test_set] = pd.read_csv(test_path, sep = '\t')
        
## do once and save

for test_set in tqdm(test_set_data['sexism']):
    print(test_set)
    results = []
    for _, row in test_set_data['sexism'][test_set].iterrows():
        result = []
        for n in range(0, runs):
            inputs = tokenizer(create_prompt_label_sexism(row['text']),
                               return_tensors="pt").to("cuda:0")
            outputs = model.generate(**inputs)
            result.append(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
        results.append(result)
    test_set_data['sexism'][test_set]['flant5_labels'] = results
    with open('../../designed_data/sexism_flant5_labels.pickle', 'wb') as handle:
        pickle.dump(test_set_data['sexism'], handle, protocol=pickle.HIGHEST_PROTOCOL)

  0%|          | 0/5 [00:00<?, ?it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (796 > 512). Running this sequence through the model will result in indexing errors


in_domain




out_of_domain




out_of_domain_2


 40%|████      | 2/5 [8:27:52<12:41:49, 15236.41s/it]


In [None]:
test_set_data['sexism'][test_set]

In [None]:
for test_set in tqdm(test_set_data['hatespeech']):
    print(test_set)
    results = []
    for _, row in test_set_data['hatespeech'][test_set].iterrows():
        result = []
        for n in range(0, runs):
            inputs = tokenizer(create_prompt_label_hatespeech(row['text']),
                               return_tensors="pt").to("cuda:0")
            outputs = model.generate(**inputs)
            result.append(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
        results.append(result)
    test_set_data['hatespeech'][test_set]['flant5_labels'] = results
    
with open('../../designed_data/hatespeech_flant5_labels.pickle', 'wb') as handle:
    pickle.dump(test_set_data['hatespeech'], handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
test_set_data['hatespeech'][test_set]

### CAD generation

In [None]:
## original prompt
def create_prompt_sexism_with_example(original_sentence):
    
    prompt = 'Your task is twofold: \
    1) modify the messages to remove as much sexism as possible \
    2) make as few modifications to the messages as possible.  \
    Modifications include adding, removing, or editing words. You get extra points if you transform our sexist sentences into non-sexist sentences with minimal changes. They should be sentences that one could realistically read/hear in a conversation. \
    Do not modify words unless that is crucial to make the message not sexist. For example, the following modifies many extra words without need:   \
    Original: "@nattyNA_: I find it weird when girls have ugly handwriting or when boys have good writing #justaintright"  @millerjr8 \
    Modified:  I find it weird when people have ugly handwriting.  \
    Instead, this modification is more parsimonious: \
    Modified (good): "@nattyNA_: I find it nice when girls have beautiful handwriting or when boys have beautiful writing "  @millerjr8  \
    Please transform the following sentence and only output the transformation:\
    ' + original_sentence 
    
    return prompt

In [None]:
## test
string = "women are weak and belong in the kitchen"
inputs = tokenizer(create_prompt_sexism_with_example(string), return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

In [None]:
import pandas as pd
from tqdm import tqdm
import json
import pickle
import time

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('max_colwidth', -1)

In [None]:
data = pd.read_csv("../../data/sexism/train/original.csv",sep="\t")#.head(1000)

In [None]:
data_hate = data[data['sexism']=='sexist'].head(100)
data_hate

In [None]:
## Lets create Counter Factuals for the first 10 

results = []

for index,sentence in tqdm(data_hate.iterrows(), total=data_hate.shape[0]):
    inputs = tokenizer(create_prompt_sexism_with_example(sentence['text']), return_tensors="pt").to("cuda:0")
    outputs = model.generate(**inputs)
    result = {}
    result['cad'] = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    result["original_sentence"] = sentence["text"]
    result["original_id"] = sentence["_id"]
    results.append(result)

In [None]:
pd.DataFrame(results).head(10)

In [None]:
def create_prompt_label_sexism(original_sentence):
    
    prompt = 'Your task is to annotate if a piece of text is sexist or not. Read on for detailed explanations. \
            Something can be sexist: \
            because of its content (what the speaker believes) \
            because of its phrasing (because of the speaker’s choice of words) \
            Sexism due to content. A message may be sexist because of its content--in other words, because of what the speaker believes. For example, the speaker may express sexist attitudes towards gender inequality, either endorsing it (e.g. “some jobs are best left to men”), or antagonizing it (e.g. “the pay gap between genders does not exist, feminists should stop complaining”). Also, the speaker may express stereotypes (how genders are traditionally seen and compared to each other) and behavioral expectations (how individuals of a gender should behave according to traditional views). Sexism may also include positive stereotypes (e.g. “women are the best home cooks”), or target  men (e.g., “men should not cry”)! \
            Sexism due to phrasing. A message may be sexist simply because of how the speaker phrases it--independently from what general beliefs or attitudes the speaker holds. A message is sexist, for example, when it contains attacks, foul language, or derogatory depictions directed towards individuals because of their gender, e.g. by means of name-calling ("you bitch"), attacks ("I\'m going to kick her back to the kitchen"), objectification ("She\'s stupid but I\'d still do her"), inflammatory messages ("burn all women"). However, just because a message is aggressive or uses offensive language does not mean that it is sexist! \
            Sexist Examples: \
            “So I wore a dress and wig while bartending and when people asked what my costume was I said a bad driver #dassexist.” This post is sexist because it espouses gendered stereotypes, that women are allegedly not good at driving \
            “You’re a stupid bitch and I hope you die.” This post is sexist because of its phrasing, it contains a gendered slur (‘bitch’). \
            “Women’s sacrifices continue to ensure the future of this country’s sons.” This post espouses benevolent sexism. While such a sentiment might appear benign, benevolent sexism glorifies women and sees their worth only as extensions of their male relatives. \
            Non-sexist Examples \
            “F*ck you Jane ur so stupid” the content is offensive and Jane, the target, is likely a woman; however the message is not clearly sexist because it is unclear if the speaker’s animosity is because of Jane’s gender \
            With these instructions in mind and a piece of text, please reply with either of the three options and nothing else: 1) sexist, 2) non-sexist, 3) can\'t say. Please use the last option as little as possible and if you do, provide a one sentence explanation. Do not give an explanation if you pick \'sexist\' or \'non-sexist\'. We are ready to begin now. Is this sentence sexist based on the past guidelines: ' + original_sentence 
    return prompt

In [None]:
string = "women are weak and belong in the kitchen"
inputs = tokenizer(create_prompt_label_sexism(string), return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

In [None]:
results = []

for index,sentence in tqdm(data.iterrows(), total=data.shape[0]):
    inputs = tokenizer(create_prompt_label_sexism(sentence['text']), return_tensors="pt").to("cuda:0")
    outputs = model.generate(**inputs)
    result = {}
    result['cad'] = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    result["original_sentence"] = sentence["text"]
    result["original_id"] = sentence["_id"]
    results.append(result)

In [None]:
pd.DataFrame(results).groupby('cad').size()

In [None]:
def create_prompt_hs_without_example(original_sentence):
    prompt = 'Your task is twofold: \
    1) modify the messages to remove as much hatespeech as possible and \
    2) make as few modifications to the messages as possible. \
    Modifications include adding, removing, or editing words. \
    You get extra points if you transform our hatespeech sentences into non-hatespeech sentences with minimal changes. \
    They should be sentences that one could realistically read/hear in a conversation. \
    Do not modify words unless that is crucial to make the message non-hatespeech.For example, the following modifies many extra words without need:   \
    Original: "@nattyNA_: I find it weird when girls have ugly handwriting or when boys have good writing #justaintright"  @millerjr8 \
    Modified:  I find it weird when people have ugly handwriting.  \
    Instead, this modification is more parsimonious: \
    Modified (good): "@nattyNA_: I find it nice when girls have beautiful handwriting or when boys have beautiful writing "  @millerjr8  \
    Please transform the following sentence and only output the transformation and nothing else:\
    ' + original_sentence 
    
    return prompt