# Polyjuice
<b>Date:</b> October 5, 2023\
<b>Author:</b> Dimitris Lymperopoulos\
<b>Description:</b> A notebook for experimentation with Polyjuice framework


## Package Installation
Run the cell below to download and install the necessary python packages for Polyjuice.

In [None]:
!pip install torch
!python -m spacy download en_core_web_sm

## Imports

In [1]:
import numpy as np
import pandas as pd
from polyjuice import Polyjuice

  from .autonotebook import tqdm as notebook_tqdm


## Experiments

In [2]:
# initial parameters and variables
pj = Polyjuice(model_path="uw-hai/polyjuice", is_cuda=True)
text = "A dog is embraced by the woman."

In [9]:
# Experiment 1 - Simplest use of Polyjuice with default parameters
perturbations = pj.perturb(text)
print("\n".join(perturbations))

A dog is being cradled by a person.
A dog is embraced by no woman.
No dog is outside in the grass.


In [16]:
# Experiment 2 - Customized perturbations
perturbations = pj.perturb(
    orig_sent=text,
    # can specify where to put the blank. Otherwise, it's automatically selected.
    # Can be a list or a single sentence.
    blanked_sent=None,
    # can also specify the ctrl code (a list or a single code.)
    # The code should be from 'resemantic', 'restructure', 'negation', 'insert', 'lexical', 'shuffle', 'quantifier', 'delete'.
    ctrl_code='negation',
    # Customize perplexity score. 
    perplex_thred=None,
    # number of perturbations to return
    num_perturbations=5,
    # the function also takes in additional arguments for huggingface generators.
    num_beams=6
)
print("\n".join(perturbations))

A dog is n't embraced by the woman.
A dog is not being held by the woman.
A dog is not by the woman.
A dog is not embraced by the woman.
A dog is not being attacked by the woman.


## IMDB Reviews Counterfactual Generation

In [2]:
df = pd.read_csv("../Data/IMDB_reviews.csv")
df.head()

Unnamed: 0,Source_Sentences,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [4]:
df.shape

(50000, 2)

In [7]:
def create_adversarial_sent(sentence):
    """
    A function that takes as input a sentence and uses Polyjuice to generate an adversarial sentence.
    If such sentence cannot be created, returns the original sentence.

    :param sentence: string representing the original sentence
    :returns: string representing either the adversarial or the original sentence
    """
    
    try:
        adversarial_sentences = pj.perturb(orig_sent=sentence, ctrl_code='negation')
    except:
        return sentence
        
    return adversarial_sentences[0] if len(adversarial_sentences) > 0 else sentence

In [None]:
%%timeit 

# create the adversarial dataset
df['Source_Sentences'] = df['Source_Sentences'].apply(lambda x: create_adversarial_sent(x))

