If you want the BAE attack to focus exclusively on word insertions (without any replacements), you can configure the WordSwapMaskedLM transformation to restrict it to only the insertion operation. Here's how you can do it in TextAttack:

# Step-by-Step Configuration for Insertion-Only BAE
Adjust the Transformation: The WordSwapMaskedLM transformation supports both replacements and insertions, but you can specify method="insert" to enable insertions only.

In [None]:
from textattack.transformations import WordSwapMaskedLM

# Restrict to insertions only
transformation = WordSwapMaskedLM(method="insert", max_candidates=30)


method="insert" ensures that only word insertions are considered.
max_candidates=30 limits the number of insertion suggestions per position.

# 2.Set Up Constraints: 
Constraints are used to ensure that the insertions maintain the semantic and grammatical correctness of the sentence.

In [None]:
from textattack.constraints.semantics import WordEmbeddingDistance
from textattack.constraints.grammaticality import PartOfSpeech
from textattack.constraints import MaxWordsPerturbed

# Semantic similarity constraint to keep the meaning close
semantic_constraint = WordEmbeddingDistance(min_cos_sim=0.8)

# Grammatical constraint to ensure valid insertions
grammatical_constraint = PartOfSpeech()

# Limit the number of words inserted
max_perturbation_constraint = MaxWordsPerturbed(max_percent=0.2)

constraints = [semantic_constraint, grammatical_constraint, max_perturbation_constraint]


# 3.Build the Attack: 
Combine the insertion-only transformation with the constraints and goal function to build the attack.

In [None]:
from textattack.goal_functions import UntargetedClassification
from textattack.models.wrappers import HuggingFaceModelWrapper
from textattack.attack_recipes import Attack

# Load the model
model = HuggingFaceModelWrapper.from_pretrained("bert-base-uncased")

# Define the goal function
goal_function = UntargetedClassification(model)

# Build the attack
attack = Attack(transformation, constraints, goal_function)


# Test the Insertion-Only Attack: 
You can now run the insertion-only attack on a single sentence or an entire dataset.

Attack on a Single Sentence:

In [None]:
# Test the attack on a single sentence
input_sentence = [("This is an amazing movie!", 1)]  # (sentence, label)
results = attack.attack_dataset(input_sentence)

# Print results
for result in results:
    print(result)


Attack on a Dataset:

In [None]:
from textattack.datasets import HuggingFaceDataset

# Load the IMDB dataset
dataset = HuggingFaceDataset("imdb", split="test")

# Run the attack
attack_results = attack.attack_dataset(dataset)

# Print the first 5 results
for i, result in enumerate(attack_results):
    if i > 5: break
    print(result)


Key Differences for Insertion-Only BAE
Transformation: Set method="insert" in WordSwapMaskedLM.
Constraints: The same constraints for semantic similarity, grammaticality, and word perturbation limits apply, ensuring meaningful and valid insertions.
No Replacements: No words are replaced; new words are inserted between the existing ones.

Customization Options for Insertion
Insertions at Specific Positions: Customize the insertion logic to target specific positions (e.g., after nouns or verbs).
Increase Semantic Diversity: Adjust the min_cos_sim in the WordEmbeddingDistance constraint to allow more diverse insertions.