# DeepSeek LLM Backdoor Detection Experiments

This experiments systematically investigates the presence of hidden backdoors in the deepseek-ai/DeepSeek-R1-0528-Qwen3-8B Large Language Model, from Hugging Face Library.

# Setup and Imports

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import logging
import psutil
import requests
import time
import os
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import numpy as np


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\jibin\AppData\Local\Programs\Python\Python312\Lib\site-packages\ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "C:\Users\jibin\AppData\Local\Programs\Python\Python312\Lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance
    app.start()
  File "C:\Users\jibin\AppData\Local\Programs\Python\Python312\Lib\site-packages\ipykernel\kernelapp.py", 

In [2]:
# Set up logging
logging.basicConfig(level=logging.INFO, filename='deepseek_experiment.log', filemode='w',
                    format='%(asctime)s - %(levelname)s - %(message)s')
print('Setup complete.')

Setup complete.


# Data Flow and Activation Tracking

In [4]:
model_name = 'deepseek-ai/DeepSeek-R1-0528-Qwen3-8B'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map='auto',
    trust_remote_code=True
)

# Register hook to inspect tensor at each layer
def hook_fn(module, input, output):
    # Safely print input and output shapes
    try:
        input_shape = input[0].shape if isinstance(input, tuple) and hasattr(input[0], 'shape') else 'Unknown'
        output_shape = output.shape if hasattr(output, 'shape') else 'Unknown'
        print(f"{module.__class__.__name__} - Input shape: {input_shape} - Output shape: {output_shape}")
    except Exception as e:
        print(f"Hook error in {module.__class__.__name__}: {e}")

hooks = []
for name, layer in model.named_modules():
    if "layer" in name.lower():  # Filter transformer layers more safely
        try:
            hooks.append(layer.register_forward_hook(hook_fn))
        except Exception as e:
            print(f"Could not register hook for {name}: {e}")

# Run dummy input
input_text = 'Hello, how are you?'
inputs = tokenizer(input_text, return_tensors='pt').to(model.device)

with torch.no_grad():
    _ = model(**inputs)

# Clean up
for h in hooks:
    h.remove()

Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'attn_factor'}


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Qwen3RMSNorm - Input shape: torch.Size([1, 6, 4096]) - Output shape: torch.Size([1, 6, 4096])
Linear - Input shape: torch.Size([1, 6, 4096]) - Output shape: torch.Size([1, 6, 4096])
Qwen3RMSNorm - Input shape: torch.Size([1, 6, 32, 128]) - Output shape: torch.Size([1, 6, 32, 128])
Linear - Input shape: torch.Size([1, 6, 4096]) - Output shape: torch.Size([1, 6, 1024])
Qwen3RMSNorm - Input shape: torch.Size([1, 6, 8, 128]) - Output shape: torch.Size([1, 6, 8, 128])
Linear - Input shape: torch.Size([1, 6, 4096]) - Output shape: torch.Size([1, 6, 1024])
Linear - Input shape: torch.Size([1, 6, 4096]) - Output shape: torch.Size([1, 6, 4096])
Hook error in Qwen3Attention: tuple index out of range
Qwen3RMSNorm - Input shape: torch.Size([1, 6, 4096]) - Output shape: torch.Size([1, 6, 4096])
Linear - Input shape: torch.Size([1, 6, 4096]) - Output shape: torch.Size([1, 6, 12288])
SiLU - Input shape: torch.Size([1, 6, 12288]) - Output shape: torch.Size([1, 6, 12288])
Linear - Input shape: torch.Si

# Summary of Data Flow and Activation Shapes
| Component       | Shape Flow                                        | Purpose                                 |
|-----------------|---------------------------------------------------|-----------------------------------------|
| Token Input     | `[1, 6]` → `[1, 6, 4096]`                         | Embedding of tokens into vector space   |
| RMSNorm         | `[1, 6, 4096]` → `[1, 6, 4096]`                   | Normalizing input                       |
| Attention Heads | `[1, 6, 4096]` → `[1, 6, 32, 128]`                | Preparing for multi-head attention      |
| Attention Proj. | `[1, 6, 4096]` → `[1, 6, 1024]`                   | QKV or output projections               |
| MLP Block       | `[1, 6, 4096]` → `[1, 6, 12288]` → `[1, 6, 4096]` | Feed-forward for richer representations |
| Decoder Layer   | `[1, 6, 4096]`                                    | One full transformer block              |
