In [1]:
# What is AI Security?\n\nWelcome to the **AI Security Playground**. This notebook gives a beginner-friendly overview of what AI security is, why it matters, and some common examples.\n\nYou can read through it like a mini-tutorial and run the code cells as you go.

## 1. Why care about AI security?\n\nMachine learning and AI systems are used in many places: email filtering, fraud detection, recommendation systems, chatbots, and more.\n\nIf an attacker can fool or control these systems, they can:\n- Bypass spam filters or fraud detectors\n- Poison training data to change model behavior\n- Steal or leak sensitive model information\n- Abuse LLMs with prompt injection or jailbreaks\n\n**AI security** is about understanding and reducing these risks.

## 2. Basic terminology\n\nSome words you will see often in AI security:\n\n- **Model**: The trained ML/AI system that makes predictions or generates outputs.\n- **Adversary / Attacker**: Someone trying to intentionally make the model behave badly.\n- **Threat model**: What we assume the attacker can and cannot do.\n- **Adversarial example**: An input that has been slightly changed to fool a model.\n- **Data poisoning**: Injecting malicious data into the training set.\n- **Evasion attack**: Crafting inputs at inference time to avoid detection.\n- **LLM prompt injection**: Prompting a language model to ignore or override its original instructions.

## 3. A tiny toy example (no real security yet)\n\nTo get started, we will train a very small text classifier using scikit-learn. This is **not** a secure model. It is just a simple example of how a model is trained and evaluated. Later projects will explore how such models can be attacked.\n\nRun the cell below to import some basic libraries.

In [2]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report


### 3.1 Create a tiny dataset\n\nWe will create a very small dataset of short messages that are either:\n- Clearly harmless, or\n- Suspicious / phishing-like.\n\nThis is **not** a real dataset, just a simple toy example.

In [3]:
texts = [
    'Hey, are we still meeting for lunch today?',
    'Reminder: your package will be delivered tomorrow.',
    'URGENT: Your account has been compromised, click here to reset your password now!',
    'Congratulations, you have won a free prize, click this link!',
    'Can you review this document when you have time?',
    'Security alert: suspicious login attempt detected on your account.',
]

# 0 = benign, 1 = suspicious / phishing-like
labels = np.array([0, 0, 1, 1, 0, 1])

texts, labels


(['Hey, are we still meeting for lunch today?',
  'Reminder: your package will be delivered tomorrow.',
  'URGENT: Your account has been compromised, click here to reset your password now!',
  'Congratulations, you have won a free prize, click this link!',
  'Can you review this document when you have time?',
  'Security alert: suspicious login attempt detected on your account.'],
 array([0, 0, 1, 1, 0, 1]))

### 3.2 Train a simple model\n\nWe use a **bag-of-words** representation (CountVectorizer) plus logistic regression.\n\nThis is a very common baseline for text classification.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    texts,
    labels,
    test_size=0.3,
    random_state=42,
    stratify=labels,
)

model = make_pipeline(
    CountVectorizer(),
    LogisticRegression(max_iter=1000),
)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print('Test texts:', X_test)
print('True labels:', y_test)
print('Predicted labels:', y_pred)

print()
print(classification_report(y_test, y_pred, zero_division=0))


Test texts: ['URGENT: Your account has been compromised, click here to reset your password now!', 'Can you review this document when you have time?']
True labels: [1 0]
Predicted labels: [1 1]

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         1
           1       0.50      1.00      0.67         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2



### 3.3 Try your own examples\n\nNow you can type your own short messages and see what the model predicts.\n\nRemember: this model is **very small and fragile**. It is easy to fool and is not suitable for real security use.

In [5]:
examples = [
    'Please update your payment information immediately or your account will be closed.',
    'Hi, just checking in about our meeting next week.',
    'Click this link to claim your urgent refund.',
]

preds = model.predict(examples)
for text, label in zip(examples, preds):
    print(f'{label} - {text}')


0 - Please update your payment information immediately or your account will be closed.
0 - Hi, just checking in about our meeting next week.
1 - Click this link to claim your urgent refund.


## 4. Where does security come in?\n\nIn this toy example, attackers might:\n\n- Slightly change words to avoid detection (e.g., `cl1ck th1s l1nk`).\n- Add benign-looking text to hide malicious intent.\n- Poison the training data with crafted examples so the model learns the wrong patterns.\n\nMore advanced attacks include **adversarial examples** for deep learning models and **prompt injection** for LLMs.\n\nThe rest of this repository explores these ideas in more detail, with:\n- **Beginner tracks**: background and simple examples.\n- **Intermediate tracks**: more realistic models and data.\n- **Advanced tracks**: adversarial attacks, defenses, and research ideas.

## 5. Next steps\n\nFrom here, you can:\n\n- Explore other notebooks in `tracks/00-intro-to-ai-security`.\n- Look at mini-projects in `projects/`, starting with the phishing email detector.\n- Open an issue in the GitHub repo if you have ideas for improving this notebook or adding new ones.\n\nWelcome to AI Security Playground!