## **Transformers for Outlier Detection**

Transformers can be used for anomaly detection by leveraging pre-trained language models (e.g., BERT) to generate embeddings for text. Anomalies are identified using clustering or distance-based methods on these embeddings.

**Imports**

In [None]:
!pip install transformers
!pip install torch

from transformers import BertTokenizer, BertModel
from sklearn.ensemble import IsolationForest
import torch

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

# ... rest of your code

**Load Pre-trained BERT**

In [None]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

**Sample Text Data**

In [None]:
documents = [
    "This is a normal sentence.",
    "Another usual example of text.",
    "Claim your prize now!!!",
    "Warning! Unauthorized login detected.",
]

**Convert Text to Embeddings**

In [None]:
def get_embeddings(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).numpy()

embeddings = get_embeddings(documents)

**Apply Isolation Forest for Outlier Detection**

In [None]:
isolation_forest = IsolationForest(contamination=0.25, random_state=42)
isolation_forest.fit(embeddings)

**Predict Anomalies**

In [None]:
predictions = isolation_forest.predict(embeddings)  # 1 = normal, -1 = anomaly
for doc, pred in zip(documents, predictions):
    status = "Anomaly" if pred == -1 else "Normal"
    print(f"Text: {doc} | Status: {status}")