# Prep sentences

In this notebook, I prepare the semantically bleached template sentences.

### Table of Contents:
- [1. Set up templates, targets, and attributes](#1-set-up-templates-targets-and-attributes)
- [2. Populate templates with targets and attributes](#2-populate-templates-with-targets-and-attributes)

In [1]:
# Import dependencies
import pandas as pd

In [11]:
# Declare variables
LEXICON_INPUT_PATH = '../data/input/lexicon.csv' # processed lexicon that is ready for use (created in ./prep_lexicon.ipynb)
SENTENCES_INPUT_PATH = '../data/input/sentences.csv' # processed sentences that is ready for use (created in this notebook)

## 1. Set up templates, targets, and attributes

Templates: 
- A [TARGET] person is [ATTRIBUTE].
- [TARGET] people are [ATTRIBUTE].
- A person who is [TARGET] is [ATTRIBUTE]. 
- People who are [TARGET] are [ATTRIBUTE].

Targets: 
- disabled/abled
- deaf/hearing
- blind/sighted

Attributes are from the extended lexicon created by [Nicolas et al. (2021)](https://onlinelibrary.wiley.com/doi/epdf/10.1002/ejsp.2724), and in `./prep_lexicon.ipynb` I prepared and saved them to `LEXICON_INPUT_PATH`.

In [7]:
templates = {'person_first': ['A person who is [TARGET] is [ATTRIBUTE].', 
                              'People who are [TARGET] are [ATTRIBUTE].'], 
             'identity_first': ['A [TARGET] person is [ATTRIBUTE].', 
                                '[TARGET] people are [ATTRIBUTE].']}

targets = ['deaf', 'hearing', 'blind', 'sighted', 'disabled', 'abled']

# Load lexicon
scm_df = pd.read_csv(LEXICON_INPUT_PATH)
scm_df.head()

Unnamed: 0,Attribute,Unsociable,Sociable,Immoral,Moral,Dependent,Independent,Unable,Able
0,aberrant,False,False,True,False,False,False,False,False
1,abhorrent,False,False,True,False,False,False,False,False
2,abject,False,False,False,False,True,False,False,False
3,able,False,False,False,False,False,False,False,True
4,abnormal,False,False,True,False,False,False,False,False


## 2. Populate templates with targets and attributes

Requirements: 
- Source data frame has an "Attribute" column
- We have defined templates in this notebook
- We have defined targets in this notebook

Notes: 
- This code can deal with multi-word attributes (but we don't have any, as explained in `./prep_lexicon.ipynb`).
- This code can only deal with single-word targets.

In [8]:
def process_df(source_df):
    target_token = '[TARGET]'
    attribute_token = '[ATTRIBUTE]'
    mask_token = '[MASK]'
    dataframes = []

    for target in targets:
        for template_type, template_list in templates.items(): 
            for template in template_list:
                if target == 'abled': # FIX GRAMMAR HERE FOR ANY TARGETS THAT NEED FIXING 
                    template = template.replace('A [TARGET]', 'An [TARGET]')

                df = source_df.copy()
                df['Target'] = target
                df['Template'] = template 
                df['Template_Type'] = template_type
                df['Sentence'] = df.apply(lambda x: x['Template'].replace(target_token, x['Target']).replace(attribute_token, x['Attribute']), axis=1)
                df['Sentence_TM'] = df.apply(
                    lambda x: x['Template'].replace(target_token, mask_token).replace(attribute_token, x['Attribute']), 
                    axis=1)
                df['Sentence_AM'] = df.apply(
                    lambda x: x['Template'].replace(target_token, x['Target']).replace(attribute_token, mask_token),
                    axis=1
                )
                df['Sentence_TAM'] = df['Sentence_TAM'] = df.apply(
                    lambda x: x['Template'].replace(target_token, mask_token).replace(attribute_token, ' '.join([mask_token for word in range(len(x['Attribute'].split(' ')))])), # accounts for multi-word attributes
                    axis=1)
                dataframes.append(df)

    return pd.concat(dataframes).reset_index(drop=True)

In [10]:
sentences_df = process_df(scm_df)
sentences_df.head()

Unnamed: 0,Attribute,Unsociable,Sociable,Immoral,Moral,Dependent,Independent,Unable,Able,Target,Template,Template_Type,Sentence,Sentence_TM,Sentence_AM,Sentence_TAM
0,aberrant,False,False,True,False,False,False,False,False,deaf,A person who is [TARGET] is [ATTRIBUTE].,person_first,A person who is deaf is aberrant.,A person who is [MASK] is aberrant.,A person who is deaf is [MASK].,A person who is [MASK] is [MASK].
1,abhorrent,False,False,True,False,False,False,False,False,deaf,A person who is [TARGET] is [ATTRIBUTE].,person_first,A person who is deaf is abhorrent.,A person who is [MASK] is abhorrent.,A person who is deaf is [MASK].,A person who is [MASK] is [MASK].
2,abject,False,False,False,False,True,False,False,False,deaf,A person who is [TARGET] is [ATTRIBUTE].,person_first,A person who is deaf is abject.,A person who is [MASK] is abject.,A person who is deaf is [MASK].,A person who is [MASK] is [MASK].
3,able,False,False,False,False,False,False,False,True,deaf,A person who is [TARGET] is [ATTRIBUTE].,person_first,A person who is deaf is able.,A person who is [MASK] is able.,A person who is deaf is [MASK].,A person who is [MASK] is [MASK].
4,abnormal,False,False,True,False,False,False,False,False,deaf,A person who is [TARGET] is [ATTRIBUTE].,person_first,A person who is deaf is abnormal.,A person who is [MASK] is abnormal.,A person who is deaf is [MASK].,A person who is [MASK] is [MASK].


In [12]:
# Save
sentences_df.to_csv(SENTENCES_INPUT_PATH, index=False)