In [None]:
# Meme classification: Multimodal hate speech detection
* Venkata Koushik Nagasarapu

In [None]:
Motivation: I chose this paper because it addresses a pressing real-world problem—hate speech detection in memes—while highlighting 
the unique challenges of truly multimodal reasoning. Memes combine text and imagery in subtle, often sarcastic ways, making unimodal 
approaches inadequate. This aligns perfectly with DA623’s focus on multimodal data analysis.

In [None]:
Historical Perspective in Multimodal Learning:
1. Early Vision + Language Tasks : Image captioning (e.g.MS COCO Captions) and Visual Question Answering (VQA) showed strong unimodal biases.
2. Contrastive & Counterfactual Data : Subsequent work (e.g. CRIC, CLEVR) introduced contrast sets to force true multimodal understanding.
3. Hateful Memes Challenge : Builds on this by embedding “benign confounders” so that text or image alone is insufficient.

In [None]:
A. Dataset Construction:
  1. Extract ~162k memes, filter for English/non-violating, license Getty replacements, reconstruct PNG/SVG.
  2. Four annotation phases: Filtering → Reconstruction → Hatefulness Rating → Benign Confounder Creation.
B. Hatefulness Definition: “A direct or indirect attack on people based on protected characteristics (race, religion, gender, etc.)…”
C. Benign Confounders: For each hateful meme, generate minimal text/image swaps that flip label → forces multimodal fusion.
D. Data Splits: Total 10k memes → 5% dev, 10% test (balanced: 40% multimodal hate, 10% unimodal hate, 20% each benign confounder,
   10% non-hateful).
E. Analysis Highlights:
    1. Moderate inter-annotator κ = 0.684.
    2. Protected categories: Race (47.1%), Religion (39.3%), Gender (14.8%), etc.
    3. Attack types: e.g. comparison to criminals (17.2%), negative stereotypes (15.6%) 
F. Baseline Results:
    | Model                | Test AUROC |
    | -------------------- | ---------  |
    | Human                |   — / 84.7 |
    | Text-only (BERT)     |       69.0 |
    | Vision-only (ResNet) |       53.7 |
    | Late Fusion          |       69.3 |
    | MMBT-Region          |       73.8 |
    | ViLBERT CC           |       74.5 |
    | VisualBERT COCO      |       75.4 |


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
from IPython.display import display
import os

# Load dataset (JSONL format)
df = pd.read_json('hateful_memes_dev.jsonl', lines=True)
print("Dataset shape:", df.shape)
df.head()

# Map label to text
df['label_text'] = df['label'].map({0: 'Non-Hateful', 1: 'Hateful'})

# === 1. Show example hateful meme + confounder ===

sample_hateful = df[df['label'] == 1].iloc[0]
print("Original hateful meme text:", sample_hateful['text'])
img = Image.open(sample_hateful['img'])
display(img)

# === 2. Text Length Distribution ===
df['text_length'] = df['text'].apply(lambda x: len(x.split()))
sns.histplot(data=df, x='text_length', hue='label_text', bins=15)
plt.title('Text Length Distribution')
plt.show()

# === 3. Word Frequencies ===
from collections import Counter
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))

def get_word_counts(texts):
    all_words = ' '.join(texts).lower().split()
    words = [w for w in all_words if w.isalpha() and w not in stop_words]
    return Counter(words)

hateful_counts = get_word_counts(df[df['label']==1]['text'])
non_hateful_counts = get_word_counts(df[df['label']==0]['text'])

# Plot top 10 words
hateful_df = pd.DataFrame(hateful_counts.most_common(10), columns=['word', 'count'])
non_hateful_df = pd.DataFrame(non_hateful_counts.most_common(10), columns=['word', 'count'])

fig, axs = plt.subplots(1, 2, figsize=(14,5))
sns.barplot(data=hateful_df, x='count', y='word', ax=axs[0], palette='Reds_r')
axs[0].set_title('Top Hateful Words')
sns.barplot(data=non_hateful_df, x='count', y='word', ax=axs[1], palette='Blues_r')
axs[1].set_title('Top Non-Hateful Words')
plt.show()

# === 4. Class Distribution ===
sns.countplot(data=df, x='label_text', palette='pastel')
plt.title('Class Distribution')
plt.show()

# === 5. Simple Text-Only Baseline (TF-IDF + Logistic Regression) ===
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

# TF-IDF features
vectorizer = TfidfVectorizer(max_features=1000)
X = vectorizer.fit_transform(df['text'])
y = df['label']

# Logistic Regression
model = LogisticRegression()
scores = cross_val_score(model, X, y, cv=5, scoring='roc_auc')
print("Text-Only Baseline AUROC: {:.2f} (+/- {:.2f})".format(scores.mean()*100, scores.std()*100))

# === 6. Simple Vision-Only Baseline (ImageNet features + Logistic Regression) ===
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from tqdm import tqdm

# Pre-trained ResNet50
resnet = models.resnet50(pretrained=True)
resnet.eval()

# Remove last layer to get 2048-D features
feature_extractor = torch.nn.Sequential(*list(resnet.children())[:-1])

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])

def extract_features(img_path):
    img = Image.open(img_path).convert('RGB')
    img_t = transform(img).unsqueeze(0)  # batch of 1
    with torch.no_grad():
        feat = feature_extractor(img_t).squeeze().numpy()
    return feat

# Extract features for all images
img_feats = []
for path in tqdm(df['img'].tolist(), desc="Extracting image features"):
    img_feats.append(extract_features(path))

X_img = np.array(img_feats)

# Logistic Regression
scores_img = cross_val_score(LogisticRegression(max_iter=1000), X_img, y, cv=5, scoring='roc_auc')
print("Image-Only Baseline AUROC: {:.2f} (+/- {:.2f})".format(scores_img.mean()*100, scores_img.std()*100))

# === 7. Optional: Fusion Baseline (Concatenate Text+Image) ===
from scipy import sparse

X_fusion = sparse.hstack([X, sparse.csr_matrix(X_img)])
scores_fusion = cross_val_score(LogisticRegression(max_iter=1000), X_fusion, y, cv=5, scoring='roc_auc')
print("Fusion Baseline AUROC: {:.2f} (+/- {:.2f})".format(scores_fusion.mean()*100, scores_fusion.std()*100))


In [None]:
Reflections:
A. Surprises: 
  1.Even strong multimodal models fail to bridge the gap to humans.
  2.The benign confounder strategy effectively neutralizes unimodal shortcuts.
B. Scope for Improvement:
  1.Better multimodal pretraining (e.g. larger contrastive objectives).
  2.Incorporating world knowledge (e.g. geopolitics, historical events).
  3.Dynamic data augmentations to cover evolving hate symbols.

In [None]:
References:
1. Kiela, D., Firooz, H., et al. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes. arXiv:2005.04790 (2021).
2. Facebook AI. MMF code & starter kit: https://github.com/facebookresearch/mmf/projects/hateful_memes