# FIT5230 Week 9: Defense vs. Generative AI Fakes

## 1. Recap of Generative AI Models

Generative AI models are designed to create new content based on patterns learned from their training data. There are several prominent architectures, each with unique strengths.

### Key Generative Model Types

* **Generative Adversarial Networks (GANs)**: Use a competitive two-network system (a Generator and a Discriminator) to produce new data.
* **Variational Autoencoders (VAEs)**: Learn a compressed, latent representation of data and use a decoder to generate new samples from that representation.
* **Diffusion Models**: Work by gradually adding noise to data and then learning the reverse "denoising" process to generate high-quality samples.
* **Neural Radiance Fields (NeRFs)**: A technique for synthesizing novel 3D views of a scene. It uses a neural network that takes a 5D coordinate (position `(x,y,z)` and viewing direction `(θ, φ)`) as input and outputs a color (RGB) and volume density (`σ`).

### Strengths and Trade-offs

Different models excel in different areas:
* **GANs** are known for **Fast Sampling**.
* **Diffusion Models** are celebrated for generating **High Quality Samples**.
* **VAEs** are valued for their **Mode Coverage**, meaning they are good at capturing the diversity of the training data.


---
<hr>

## 2. Unimodal vs. Multimodal AI

AI models can be categorized by the number of data types (modalities) they process.

* **Unimodal AI**: Processes a **single type** of data input, such as only text or only images. Examples include text models like GPT-3 and BERT. These are best suited for tasks involving a single data type, like speech recognition.
* **Multimodal AI**: Processes **multiple data types** simultaneously (e.g., text, images, audio) to gain a richer, more contextual understanding. Examples include DALL-E and CLIP. These are used for complex tasks like video captioning. Multimodal AI generally requires more computational resources than unimodal AI.

When choosing which to use, one must consider factors like **data availability**, **task relevance**, **encoding and fusion complexity**, and the desired **performance**.

---
<hr>

## 3. The Threat Landscape: AI-Powered Attacks

Generative AI introduces significant security risks, primarily through its ability to create convincing fakes and automate malicious activities.

### Data Leakage

Models trained on sensitive data (e.g., PII, confidential business info) can inadvertently leak this information in their outputs. Malicious users can craft prompts to intentionally try and extract this training data, posing a severe risk in sectors like healthcare and finance.

### AI-Enhanced Phishing

Generative AI makes phishing attacks more dangerous, scalable, and effective.

1.  **Automated Phishing**: AI can create hyper-personalized phishing emails tailored to a victim's interests. These emails have fewer grammatical errors and can incorporate real-time details from news or corporate websites to appear more legitimate. An experiment by Singapore's Government Technology Agency found that people are more likely to click on links in AI-generated phishing emails.
2.  **Spear Phishing**: This is a highly targeted form of phishing. AI can automate the research process, using social engineering and data from breaches to craft convincing messages that appear to come from a trusted source.
3.  **Vishing (Voice Phishing)**: This attack uses phone calls and voice messages. Generative AI supercharges this by enabling attackers to **clone the voice** of a trusted contact (like a CEO or CFO) to create deepfake audio, which can be used to authorize fraudulent bank transfers or extract sensitive information.

---
<hr>

## 4. Adversarial Attacks on AI Systems

An **adversarial attack** involves creating a special input, often by adding a small, human-imperceptible perturbation, designed to cause an ML model to make an incorrect classification.

### Attack Types

* **White-Box Attack**: The attacker has **complete knowledge** of the AI model, including its architecture and parameters. This allows them to use gradient-based methods to find the most effective perturbations.
    * **Fast Gradient Sign Method (FGSM)**: A common white-box technique. It calculates the gradient of the model's loss function with respect to the input image and then adds a small perturbation in the direction of the gradient's sign. This efficiently pushes the input across the decision boundary to cause a misclassification.
* **Black-Box Attack**: The attacker has **limited or no information** about the target model and can only query it.
    * **Attacks on Perceptual Hashing**: Perceptual hashes are designed to be robust to minor image changes (like rotation or noise), unlike cryptographic hashes which change completely with any bit flip. This "smoothness" can be exploited.
        * **Hash Reversal**: An attacker can train a GAN (like Pix2Pix) to learn the inverse of a hashing function. This allows them to reconstruct a recognizable version of an original image from just its perceptual hash string.
        * **Hash Poisoning**: An attacker uses a generative model to create an image that looks benign to a human reviewer but shares the same perceptual hash as a "target" image (e.g., a corporate logo). If the attacker gets the benign-looking image added to a blocklist (e.g., a database of banned content), the target logo will also be blocked due to the hash collision.

---
<hr>

## 5. Defense Strategies Against AI Attacks

Several strategies can be employed to make AI systems more robust against these threats.

* **Adversarial Training**: The most common defense, this involves injecting adversarial examples into the model's training data so it learns to correctly classify them.
* **Robust Architectures**: Designing neural networks that are inherently more resistant to perturbations, for example by using additional layers or non-linear functions.
* **Ensemble Methods**: Combining the predictions of multiple different models. An attack that fools one model is less likely to fool the entire ensemble.
    * **Random Forest**: An ensemble of decision trees where the final prediction is determined by a majority vote.
    * **Gradient Boosting**: Models are built sequentially, with each new model trained to correct the errors (residuals) of the previous one, creating a powerful final predictor.
* **Input Preprocessing**: Applying transformations to input data, such as adding noise or reducing dimensionality, to disrupt adversarial perturbations before they reach the model.
* **Data Augmentation**: Diversifying the training set with random transformations (rotation, cropping, brightness changes) to help the model generalize better and prevent overfitting to specific patterns that attackers might exploit.

---
<hr>

## 6. Differential Privacy (DP)

**Differential Privacy** is a mathematical framework for data privacy. A system is considered differentially private if its output statistics do not significantly change whether any single individual's data is included in the dataset or not. This provides a strong guarantee that the model's output cannot be used to learn anything specific about an individual.

### How it Works

* DP is typically achieved by **adding carefully calibrated random noise** to the data or during the training process. For large datasets, this noise tends to average out, allowing for the extraction of valuable insights while protecting individual identities.
* **Randomized Response**: A technique used in surveys to protect privacy. For a sensitive question, a respondent uses a randomizer (e.g., a dice roll) to decide whether to answer truthfully or provide a fixed response. This gives the respondent plausible deniability. The true percentage of "yes" answers, `T`, can be estimated from the surveyed percentage, `S`, using the formula:
    $$T = \frac{S + p - 1}{2p - 1}$$
    where `p` is the probability of answering the "opposite" question.
* **Application to Generative AI**: LLMs are at risk of memorizing and reproducing private training data. By fine-tuning a model using a DP algorithm like **Differentially Private Stochastic Gradient Descent (DP-SGD)**, it's possible to generate synthetic data that captures the statistical properties of the original private data without leaking specific information.

---
<hr>

## 7. Ethical and Legal Considerations

The proliferation of generative AI raises critical ethical and legal challenges.

* **Misinformation**: AI's ability to create realistic fakes can distort public perception and fuel propaganda campaigns.
* **Bias and Discrimination**: Generative models reflect the biases present in their training data.
* **Copyright Infringement**: AI can generate content based on copyrighted materials or be used to remove watermarks, creating legal challenges.
* **Privacy and Data Security**: The potential for data leakage can lead to severe legal repercussions and a loss of user trust.
* **Accountability**: Determining who is responsible when an AI model causes harm is complex, leading to potential legal and brand reputation issues.

# Tutorial - Generative Diffusion Models
1. Why are diffusion models considered more robust than GANs in generative tasks?  
Diffusion model is more scalable - can scale to different fields better  
Training is also more stable  

2. Give two real-world applications of multimodal generative AI and explain why multimodality
improves results.  


3. How can generative AI make phishing attacks more dangerous compared to traditional methods?  
Scalability and personalisation of attacks  

4. Explain why FGSM white-box adversarial perturbations are particularly concerning from an AI
safety perspective.  


5. A financial institution intends to fine-tune an LLM on proprietary documents.  
(a) What are the risks of data leakage?  


(b) Suggest a mitigation strategy grounded in AI safety principles.  


6. In fraud detection, would a unimodal or multimodal generative AI model be more appropriate?
Justify.  


7. Autonomous Driving Case: Attackers add adversarial perturbation to a stop sign, causing the
self-driving car’s vision model to misclassify it as a speed-limit sign.  


(a) Why is this attack effective?  


(b) Suggest two countermeasures.  


8. Medical Imaging Case: An adversary applies small perturbations to CT scans so a tumor is missed
by the AI detection system.  


(a) What risks does this pose?  


(b) Which defense methods could improve robustness?  


