# FiMMIA: scaling semantic perturbation-based membership inference across modalities

<p align="center">
  <picture>
    <img alt="FiMMIA" src="../docs/FiMMIA_system_overview.png" style="max-width: 100%;">
  </picture>
</p>

The system is the first collection of models and pipelines for membership inference attacks against multimodal large language models, built initially with a priority for the Russian language, and extendable to any other language or dataset. 
Pipeline supports different modalities: image, audio and video. In our experiments, we focus on [MERA datasets](https://github.com/MERA-Evaluation/MERA), however, the presented pipeline can be generalized to other languages. The system is a set of models and Python scripts in a GitHub repository. 

We support two major functionalities for image, audio and video modalities: inference of membership detection model and training pipeline for new datasets.

Pretrained models available on ü§ó HuggingFace [FiMMIA collection](https://huggingface.co/collections/ai-forever/fimmia).

## Installation
Download repository and install requirements.

```bash
git clone https://github.com/ai-forever/data_leakage_detect
cd data_leakage_detect
pip install -r requirements.txt
```

## Example
For example of usage of FiMMIA we take dataset [MERA-evaluation/CommonVideoQA](https://huggingface.co/datasets/MERA-evaluation/CommonVideoQA), train FiMMIA neural network model and will see inference.

### Train FiMMIA model on dataset MERA-evaluation/CommonVideoQA
We divided the training algorithm into the following subsequent steps with some modifications:
* [Data preparation](#Data-preparation)
* [SFT MLLM finetuning](#SFT-MLLM-finetuning)
* [Neighbor generation](#Neighbor-generation)
* [Embedding generation](#Embedding-generation)
* [Loss computation](#Loss-computation)
* [Training the attack model](#Training-the-attack-model)
* [Run MIA inference](#Run-MIA-inference)

#### Data preparation
Download data and convert to own working format. We should convert our dataset into pandas format with following structure:

| input | answer | video | ds_name  |
|----------|--------|-------|----------|

* `input` example:

```text
–û—á–µ–Ω—å –±—ã —Ö–æ—Ç–µ–ª–æ—Å—å –ø–æ–ª—É—á–∏—Ç—å —Ä–µ—à–µ–Ω–∏–µ —Ç–∞–∫–æ–π –∑–∞–¥–∞—á–∏. –í —ç—Ç–æ–π –∑–∞–¥–∞—á–µ –Ω–µ–æ–±—Ö–æ–¥–∏–º–æ –≤—ã–±—Ä–∞—Ç—å –ø—Ä–∞–≤–∏–ª—å–Ω—ã–π –æ—Ç–≤–µ—Ç –∏–∑ —á–µ—Ç—ã—Ä–µ—Ö –ø—Ä–µ–¥–ª–æ–∂–µ–Ω–Ω—ã—Ö –≤–∞—Ä–∏–∞–Ω—Ç–æ–≤ –Ω–∞ –æ—Å–Ω–æ–≤–µ –ø–µ—Ä–µ–¥–∞–Ω–Ω—ã—Ö –≤–æ–ø—Ä–æ—Å–∞ –∏ –≤–∏–¥–µ–æ.

–ò–º–µ–µ—Ç—Å—è 1 –≤–∏–¥–µ–æ—Ñ–∞–π–ª

–ñ–µ–ª–∞—Ç–µ–ª—å–Ω–æ, —á—Ç–æ–±—ã –≤—ã –æ–∑–Ω–∞–∫–æ–º–∏–ª–∏—Å—å —Å –¥–∞–Ω–Ω—ã–º–∏ –∏ —Ä–µ—à–∏–ª–∏ –∑–∞–¥–∞—á—É, –≤—ã–±—Ä–∞–≤ –∏–∑ –≤–∞—Ä–∏–∞–Ω—Ç–æ–≤ –æ—Ç–≤–µ—Ç–∞ –æ–¥–∏–Ω –∏–ª–∏ –Ω–µ—Å–∫–æ–ª—å–∫–æ –ø—Ä–∞–≤–∏–ª—å–Ω—ã—Ö.

–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>
–í–æ–ø—Ä–æ—Å:
–ü–æ—á–µ–º—É –±–∞—Å–∫–µ—Ç–±–æ–ª–∏—Å—Ç –æ—Ç–¥–∞–ª –º—è—á –∏–≥—Ä–æ–∫—É –≤ –º–∞–π–∫–µ –¥—Ä—É–≥–æ–≥–æ —Ü–≤–µ—Ç–∞?

A. –û–Ω –ø–µ—Ä–µ–ø—É—Ç–∞–ª —Å–æ–ø–µ—Ä–Ω–∏–∫–∞ —Å –∏–≥—Ä–æ–∫–æ–º —Å–≤–æ–µ–π –∫–æ–º–∞–Ω–¥—ã.
B. –≠—Ç–∏ –∏–≥—Ä–æ–∫–∏ —Ç—Ä–µ–Ω–∏—Ä—É—é—Ç—Å—è –≤–º–µ—Å—Ç–µ, —Ü–≤–µ—Ç –æ–¥–µ–∂–¥—ã –Ω–µ –∏–º–µ–µ—Ç –∑–Ω–∞—á–µ–Ω–∏—è.
C. –û–Ω —Ä–µ—à–∏–ª —Ç–∞–∫ –ø–æ—Å—Ç—É–ø–∏—Ç—å, –ø–æ—Ç–æ–º—É —á—Ç–æ –µ–≥–æ –∫–æ–º–∞–Ω–¥–∞ –≤—Å—ë —Ä–∞–≤–Ω–æ –ø—Ä–æ–∏–≥—Ä—ã–≤–∞–µ—Ç.
D. –û–Ω —Å–ø–µ—Ü–∏–∞–ª—å–Ω–æ –ø–æ–¥—ã–≥—Ä—ã–≤–∞–µ—Ç –¥—Ä—É–≥–æ–π –∫–æ–º–∞–Ω–¥–µ.

–ü–µ—Ä–≤–æ–º—É –∏–∑ –ø—Ä–µ–¥–ª–æ–∂–µ–Ω–Ω—ã—Ö –≤–∞—Ä–∏–∞–Ω—Ç–æ–≤ –æ—Ç–≤–µ—Ç–∞ –ø—Ä–∏—Å–≤–∞–∏–≤–∞–µ—Ç—Å—è –ª–∏—Ç–µ—Ä–∞ –ê, –≤—Ç–æ—Ä–æ–º—É –ª–∏—Ç–µ—Ä–∞ B, —Ç—Ä–µ—Ç—å–µ–º—É –ª–∏—Ç–µ—Ä–∞ C –∏ —Ç–∞–∫ –¥–∞–ª–µ–µ –ø–æ –∞–Ω–≥–ª–∏–π—Å–∫–æ–º—É –∞–ª—Ñ–∞–≤–∏—Ç—É. –í –∫–∞—á–µ—Å—Ç–≤–µ –æ—Ç–≤–µ—Ç–∞ –±—É–¥–µ—Ç –ø—Ä–∞–≤–∏–ª—å–Ω–æ –≤—ã–≤–µ—Å—Ç–∏ –ª–∏—Ç–µ—Ä—É, —Å–æ–æ—Ç–≤–µ—Ç—Å—Ç–≤—É—é—â—É—é –≤–µ—Ä–Ω–æ–º—É –≤–∞—Ä–∏–∞–Ω—Ç—É –æ—Ç–≤–µ—Ç–∞ –∏–∑ –ø—Ä–µ–¥–ª–æ–∂–µ–Ω–Ω—ã—Ö. –≠—Ç–æ –ª—É—á—à–µ —Å–¥–µ–ª–∞—Ç—å –≤ —Ç–∞–∫–æ–º —Ñ–æ—Ä–º–∞—Ç–µ: –ø–æ –∑–∞–≤–µ—Ä—à–µ–Ω–∏–∏ —Ä–∞—Å—Å—É–∂–¥–µ–Ω–∏–π –ø–∏—à–µ—Ç—Å—è —Å–ª–æ–≤–æ –û–¢–í–ï–¢, –∑–∞—Ç–µ–º —á–µ—Ä–µ–∑ –ø—Ä–æ–±–µ–ª –≤—ã–≤–æ–¥–∏—Ç—Å—è –≤—ã–±—Ä–∞–Ω–Ω–∞—è –ª–∏—Ç–µ—Ä–∞. –û–¢–í–ï–¢: 
```

* `answer` example: 'B'.
* `video` - is the modality column. For audio we should put `audio`, for image - `image`.
* `ds_name` is the dataset name. For example `CommonVideoQA`.

In [2]:
from datasets import load_dataset
import numpy as np
import os
import warnings
import torch
import sys
import torchvision
from fractions import Fraction
import pandas as pd
sys.path.append("../")
warnings.filterwarnings("ignore")
def save_video(line, video_path, video_codec="libx264"):
    video_codec = "libx264"
    container = line["inputs"]["video"].container
    video_fps = Fraction(line["inputs"]["video"].get_metadata()["video"]["fps"][0])
    total_frames = container.streams.video[0].frames
    container.seek(0)
    frames = list(container.decode(video=0))
    video_array = torch.from_numpy(np.stack([x.to_ndarray(format="rgb24") for x in frames]))
    torchvision.io.write_video(
        filename=video_path,
        video_array=video_array,
        fps=video_fps,
        video_codec=video_codec
    )

In [3]:
working_dir = "data/"
ds_name = "CommonVideoQA"
ds_dir = os.path.join(working_dir, ds_name)
ds_samples_dir = os.path.join(ds_dir, "samples")
os.makedirs(ds_samples_dir, exist_ok=True)

In [None]:
ds = load_dataset("MERA-evaluation/CommonVideoQA")

In [58]:
lines = []
for line in ds["test"]:
    idx = line["meta"]["id"]
    video_path = os.path.join(ds_dir, "samples", f"video_{idx}.mp4")
    save_video(line, video_path)
    lines.append({
        "question": line["instruction"].format(**line["inputs"]) + "–û–¢–í–ï–¢: ",
        "answer": line["outputs"],
        "video": video_path,
        "ds_name": ds_name
    })

In [73]:
df_path = os.path.join(working_dir, "train_all.csv")

In [74]:
df.to_csv(df_path, index=False)

In [75]:
df = pd.read_csv(df_path)

In [76]:
df.head()

Unnamed: 0,input,answer,video,ds_name
0,–ü—Ä–∏–≤–µ—Ç! –ü–æ–º–æ–∂–µ—à—å?\n\n–ú–Ω–µ –ø–æ–ø–∞–ª–∞—Å—å —Ç–∞–∫–∞—è –∑–∞–¥–∞—á–∞...,B,data/CommonVideoQA/samples/video787.mp4,CommonVideoQA
1,–û—á–µ–Ω—å –±—ã —Ö–æ—Ç–µ–ª–æ—Å—å –ø–æ–ª—É—á–∏—Ç—å —Ä–µ—à–µ–Ω–∏–µ —Ç–∞–∫–æ–π –∑–∞–¥–∞—á...,B,data/CommonVideoQA/samples/video451.mp4,CommonVideoQA
2,–í–Ω–∏–º–∞–Ω–∏–µ!\n\n–í –¥–∞—Ç–∞—Å–µ—Ç–µ –∫ –∑–∞–¥–∞—á–µ –∏–¥—ë—Ç —Ç–∞–∫–æ–π –ø—Ä...,D,data/CommonVideoQA/samples/video912.mp4,CommonVideoQA
3,–¢—Ä–µ–±—É–µ—Ç—Å—è –ø–æ–º–æ—â—å.\n\n–ù—É–∂–Ω–æ —Å–¥–µ–ª–∞—Ç—å —Å–ª–µ–¥—É—é—â–µ–µ. ...,C,data/CommonVideoQA/samples/video625.mp4,CommonVideoQA
4,–¢—É—Ç –∑–∞–¥–∞—á–∞.\n\n–ò–º–µ–µ—Ç—Å—è 1 –≤–∏–¥–µ–æ—Ñ–∞–π–ª\n\n–†–µ—à–∏ –∑–∞–¥...,A,data/CommonVideoQA/samples/video737.mp4,CommonVideoQA


**Split data into train and test**

In [None]:
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df, test_size=0.1)
train_df.to_csv("data/train.csv", index=False)
test_df.to_csv("data/test.csv", index=False)

---

#### SFT MLLM finetuning
For obtaining positive labels that indicates leak we finetune MLLM. Here we obtain two MLLMs: original model $\mathcal{M}$ without leak and $\mathcal{M}_{leak}$ with leak.

**Running command**

```bash
python job_launcher.py --script="fimmia.video.train_qwen25vl" \
  --train_df_path="data/train_all.csv" \
  --test_df_path="data/test.csv" \
  --num_train_epochs=5 \
  --model_id="Qwen/Qwen2.5-VL-3B-Instruct" \
  --output_dir=f"data/Qwen2.5-VL-3B-Instruct"
```

---

#### Neighbor generation
For each original data point $(t, s)$ we generate $K=24$ perturbed "neighbors" $(t^k_\prime, s^k_\prime)$. We apply four different perturbation techniques:
* Random masking and predicting the masks with [ai-forever/FRED-T5-1.7B](https://huggingface.co/ai-forever/FRED-T5-1.7B) model
* Deletion of random tokens
* Duplication
* Swapping of random tokens

to the text $t$ with each technique applied 6 times. % The modality data $s$ remains unchanged. 
Although, in our experiments we fix $s = s^k_\prime, \: \forall s \in D $, so the modality data remains unchanged, the pipeline can be modified to support neighbors from different modalities as well.

**Run command:**

```bash
python job_launcher.py --script="fimmia.neighbors" \
  --model_path="ai-forever/FRED-T5-1.7B" \
  --dataset_path="data/train.csv" \
  --max_text_len=4000
```

Here
* `model_path` - embedder model for masking neighbors generation
* `dataset_path` - path to dataset for generating neighbors
* `max_text_len` - max of text length in number of characters

After running this command we will replace origin file with new file with **neighbors** column. 

In [None]:
df = pd.read_csv(df_path)

In [103]:
df.head()

Unnamed: 0,input,answer,video,ds_name,neighbors
0,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,['–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...
1,–¢—Ä–µ–±—É–µ—Ç—Å—è –ø–æ–º–æ—â—å.\n\n–ù—É–∂–Ω–æ —Å–¥–µ–ª–∞—Ç—å —Å–ª–µ–¥—É—é—â–µ–µ. ...,C,data/CommonVideoQA/samples/video625.mp4,CommonVideoQA,['–¢—Ä–µ–±—É–µ—Ç—Å—è –ø–æ–º–æ—â—å. –ù—É–∂–Ω–æ —Å–¥–µ–ª–∞—Ç—å —Å–ª–µ–¥—É—é—â–µ–µ. –í...
2,–°—Ñ–æ—Ä–º—É–ª–∏—Ä–æ–≤–∞–Ω–∞ –∑–∞–¥–∞—á–∞.\n\n–í –∑–∞–¥–∞—á–µ —Ç—Ä–µ–±—É–µ—Ç—Å—è —Å...,A,data/CommonVideoQA/samples/video859.mp4,CommonVideoQA,['–°—Ñ–æ—Ä–º—É–ª–∏—Ä–æ–≤–∞–Ω–∞ –∑–∞–¥–∞—á–∞. _–í_ –∑–∞–¥–∞—á–µ _–ø—Ä–µ–¥–ª–æ–∂–µ–Ω...
3,–í–Ω–∏–º–∞–Ω–∏–µ!\n\n–í –¥–∞—Ç–∞—Å–µ—Ç–µ –∫ –∑–∞–¥–∞—á–µ –∏–¥—ë—Ç —Ç–∞–∫–æ–π –ø—Ä...,C,data/CommonVideoQA/samples/video538.mp4,CommonVideoQA,['–í–Ω–∏–º–∞–Ω–∏–µ! –í _–ø—Ä–∏–ª–æ–∂–µ–Ω–∏–∏_ –∫ _–≤–∏–¥–µ–æ_ –∏–¥—ë—Ç —Ç–∞–∫–æ...
4,–¢—Ä–µ–±—É–µ—Ç—Å—è –ø–æ–º–æ—â—å.\n\n–ù—É–∂–Ω–æ —Å–¥–µ–ª–∞—Ç—å —Å–ª–µ–¥—É—é—â–µ–µ. ...,A,data/CommonVideoQA/samples/video835.mp4,CommonVideoQA,['–¢—Ä–µ–±—É–µ—Ç—Å—è –ø–æ–º–æ—â—å. –ù—É–∂–Ω–æ —Å–¥–µ–ª–∞—Ç—å —Å–ª–µ–¥—É—é—â–µ–µ. –í...


In [None]:
inpt = df.input[0]
answer = df.answer[0]
neighbors = eval(df.neighbors[0])
masking_neighbor = neighbors[3]
deletion_neighbor = neighbors[7]
duplication_neighbor = neighbors[13]
swapping_neighbor = neighbors[-1]

* **input** for sample is

–í–∏–¥–µ–æ—Ñ–∞–π–ª: \<video\>

–í–æ–ø—Ä–æ—Å:

–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ –≤–∏–¥–µ–æ —Å 00.510 –ø–æ 11.460 —Å–µ–∫—É–Ω–¥—ã?

A. –ü–æ–º–µ—à–∏–≤–∞—é—Ç –ª–æ–∂–∫–æ–π <span style="color:blue">**—Ä–∞–∑–æ–≥—Ä–µ–≤—à–µ–µ—Å—è –±–ª—é–¥–æ**</span> –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ.

B. –ü–æ–º–µ—à–∏–≤–∞—é—Ç <span style="color:blue">**–ª–æ–∂–∫–æ–π**</span> –≥–æ—Ç–æ–≤–æ–µ –±–ª—é–¥–æ –≤ –∫–∞—Å—Ç—Ä—é–ª–µ.

C. –†–∞–∑–¥–µ–ª—è—é—Ç <span style="color:blue">**–∑–∞–º—ë—Ä–∑—à–∏–π –ø—Ä–æ–¥—É–∫—Ç**</span> –ª–æ–∂–∫–æ–π –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ, –ø–æ–º–æ–≥–∞—è —Ä–∞–∑–º–æ—Ä–∞–∂–∏–≤–∞—Ç—å—Å—è.

D. –ó–∞–∫—Ä—ã–≤–∞—é—Ç —Å–∫–æ–≤–æ—Ä–æ–¥—É –∫—Ä—ã—à–∫–æ–π.

* **Random masking** neighbor

–í–∏–¥–µ–æ—Ñ–∞–π–ª: \<video\>

–í–æ–ø—Ä–æ—Å:

–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ –≤–∏–¥–µ–æ —Å 00.510 –ø–æ 11.460 —Å–µ–∫—É–Ω–¥—ã?

A. –ü–æ–º–µ—à–∏–≤–∞—é—Ç –ª–æ–∂–∫–æ–π <span style="color:red">**—Ä–æ–π–±—É—à —Ä–æ–π–±—É—à –∏ –≥–æ—Ç–æ–≤–æ–µ –±–ª—é–¥–æ**</span> –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ.

B. –ü–æ–º–µ—à–∏–≤–∞—é—Ç <span style="color:red">**—Ä–æ–π–±—É—à –∏** </span>–≥–æ—Ç–æ–≤–æ–µ –±–ª—é–¥–æ –≤ –∫–∞—Å—Ç—Ä—é–ª–µ.

C. –†–∞–∑–¥–µ–ª—è—é—Ç <span style="color:red">**—Ä–æ–π–±—É—à –∏ —Ä–æ–π–±—É—à**</span> –ª–æ–∂–∫–æ–π –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ, –ø–æ–º–æ–≥–∞—è —Ä–∞–∑–º–æ—Ä–∞–∂–∏–≤–∞—Ç—å—Å—è.

D. –ó–∞–∫—Ä—ã–≤–∞—é—Ç —Å–∫–æ–≤–æ—Ä–æ–¥—É –∫—Ä—ã—à–∫–æ–π.

---

* **Deletion** neighbor

–í–∏–¥–µ–æ—Ñ–∞–π–ª: \<video\>

–í–æ–ø—Ä–æ—Å:

–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ –≤–∏–¥–µ–æ —Å 00.510 –ø–æ 11.460 —Å–µ–∫—É–Ω–¥—ã?

A. –ü–æ–º–µ—à–∏–≤–∞—é—Ç –ª–æ–∂–∫–æ–π ~~**—Ä–∞–∑–æ–≥—Ä–µ–≤—à–µ–µ—Å—è**~~ –±–ª—é–¥–æ –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ.

B. –ü–æ–º–µ—à–∏–≤–∞—é—Ç ~~**–ª–æ–∂–∫–æ–π**~~ –≥–æ—Ç–æ–≤–æ–µ –±–ª—é–¥–æ –≤ –∫–∞—Å—Ç—Ä—é–ª–µ.

C. –†–∞–∑–¥–µ–ª—è—é—Ç –∑–∞–º—ë—Ä–∑—à–∏–π –ø—Ä–æ–¥—É–∫—Ç –ª–æ–∂–∫–æ–π –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ, ~~**–ø–æ–º–æ–≥–∞—è**~~ —Ä–∞–∑–º–æ—Ä–∞–∂–∏–≤–∞—Ç—å—Å—è.

D. –ó–∞–∫—Ä—ã–≤–∞—é—Ç —Å–∫–æ–≤–æ—Ä–æ–¥—É –∫—Ä—ã—à–∫–æ–π.

---

* **Duplication** neighbor

–í–∏–¥–µ–æ—Ñ–∞–π–ª: \<video\>

–í–æ–ø—Ä–æ—Å:

**–ß—Ç–æ** –ß—Ç–æ **–ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç** –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ –≤–∏–¥–µ–æ —Å 00.510 –ø–æ 11.460 —Å–µ–∫—É–Ω–¥—ã?

A. –ü–æ–º–µ—à–∏–≤–∞—é—Ç –ª–æ–∂–∫–æ–π —Ä–∞–∑–æ–≥—Ä–µ–≤—à–µ–µ—Å—è **–±–ª—é–¥–æ** –±–ª—é–¥–æ –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ.

B. –ü–æ–º–µ—à–∏–≤–∞—é—Ç **–ª–æ–∂–∫–æ–π** –ª–æ–∂–∫–æ–π –≥–æ—Ç–æ–≤–æ–µ –±–ª—é–¥–æ –≤ –∫–∞—Å—Ç—Ä—é–ª–µ.

C. –†–∞–∑–¥–µ–ª—è—é—Ç **–∑–∞–º—ë—Ä–∑—à–∏–π** –∑–∞–º—ë—Ä–∑—à–∏–π –ø—Ä–æ–¥—É–∫—Ç –ª–æ–∂–∫–æ–π –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ, –ø–æ–º–æ–≥–∞—è —Ä–∞–∑–º–æ—Ä–∞–∂–∏–≤–∞—Ç—å—Å—è.

D. –ó–∞–∫—Ä—ã–≤–∞—é—Ç —Å–∫–æ–≤–æ—Ä–æ–¥—É –∫—Ä—ã—à–∫–æ–π.

---

* **Swapping** neighbor

<span style="color:blue">**–±–ª—é–¥–æ**</span> \<video\>

<span style="color:green">**—Ä–∞–∑–º–æ—Ä–∞–∂–∏–≤–∞—Ç—å—Å—è.**</span>

–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ –≤–∏–¥–µ–æ —Å <span style="color:red">**B.**</span> –ø–æ 11.460 —Å–µ–∫—É–Ω–¥—ã?

A. –ü–æ–º–µ—à–∏–≤–∞—é—Ç –ª–æ–∂–∫–æ–π —Ä–∞–∑–æ–≥—Ä–µ–≤—à–µ–µ—Å—è <span style="color:blue">**–í–∏–¥–µ–æ—Ñ–∞–π–ª:**</span> –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ.

<span style="color:red">**00.510**</span> –ü–æ–º–µ—à–∏–≤–∞—é—Ç –ª–æ–∂–∫–æ–π –≥–æ—Ç–æ–≤–æ–µ –±–ª—é–¥–æ –≤ –∫–∞—Å—Ç—Ä—é–ª–µ.

C. –†–∞–∑–¥–µ–ª—è—é—Ç –∑–∞–º—ë—Ä–∑—à–∏–π –ø—Ä–æ–¥—É–∫—Ç –ª–æ–∂–∫–æ–π –≤ —Å–∫–æ–≤–æ—Ä–æ–¥–µ, –ø–æ–º–æ–≥–∞—è <span style="color:green">**–í–æ–ø—Ä–æ—Å:**</span>

D. –ó–∞–∫—Ä—ã–≤–∞—é—Ç —Å–∫–æ–≤–æ—Ä–æ–¥—É –∫—Ä—ã—à–∫–æ–π.

---

#### Embedding generation
Then for each original text $t$ and its neighbors $t^k_\prime$ we extract their text embeddings using a fixed encoder:
$$e=\mathcal{E}(t), \quad e_{k}^{\prime} = \mathcal{E}(t_k^{\prime})$$
where $\mathcal{E}$ is [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) in our experiments. It used to be SoTA on the [MTEB](https://huggingface.co/blog/mteb) benchmark at the time of the model experiments}.

**Run command:**

```bash
python job_launcher.py --script="fimmia.embeds_text_calc" \
  --embed_model="intfloat/e5-mistral-7b-instruct" \
  --df_path="data/train.csv"
```

Here
* `embed_model` - embedder path
* `df_path` - path to dataset for generating embeddings

After running this command we will create new directory with name `data/train_all/embeds/`. Directory contains list of files `part_0.csv`, `part_1.csv` and so on depends from dataset size. Each file has two new columns:
* **neighbor_embeds** - embedding vector for neighbor
* **input_embeds** - embedding vector for origin text

Column **input_embeds** also contains not `NaN` for each first pair neighbor and input for of dataset sample. For example not `NaN` **input_embeds** for **id=0** and **id=24**.

In [114]:
df = pd.read_csv("data/train/embeds/part_0.csv")

In [117]:
df.head(2)

Unnamed: 0,input,answer,video,ds_name,neighbor,neighbor_embeds,input_embeds
0,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –ß—Ç–æ –ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç ...,"[0.017215704545378685, -0.010462397709488869, ...","[0.02046734280884266, -0.011482938192784786, 0..."
1,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –≥–æ—Ç–æ–≤–æ–µ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç ...,"[0.01675490476191044, -0.010078263469040394, -...",


---

#### Loss computation

We compute the multimodal loss for both models $\mathcal{M}$ and $\mathcal{M}_{leak}$ on both the original and neighbor data points:
$$\mathcal{L} = \mathcal{L}(\mathcal{M}, t, s), \quad \mathcal{L}_k^{\prime} = \mathcal{L}(\mathcal{M}, t^k_{\prime}, s^k_{\prime})$$
Text input $t$ is provided to each model, accompanied by the corresponding modality $s$ image, video, or audio data in its original, unchanged form. This passed to MLLM.

**Run command:**

```bash
python job_launcher.py --script="fimmia.video.loss_calc_qwen25" \
  --model_id=Qwen/Qwen2.5-VL-3B-Instruct \
  --model_name=Qwen2.5-VL-3B-Instruct \
  --label=0 \
  --df_path="data/train.csv" \
  --part_size=5000
```
Here
* `model_id` - path MLLM model
* `model_name` - name of MLLM model (using for store results)
* `label` - label of dataset `0` for no lean or `1` for leak model (0 by default)
* `df_path` - path to dataset for calculating loss
* `part_size` - lines for split dataframe into smaller frames

After running this command we will create new directory with name `data/train_all/loss/Qwen2.5-VL-3B-Instruct/leak` or `data/train_all/loss/Qwen2.5-VL-3B-Instruct/no_leak` depends from `label`. Directory contains list of files `part_0.csv`, `part_1.csv` and so on depends from dataset size. Each file has two new columns:
* **neighbor_loss** - loss for neighbor dataset sample: $text_neighbor + image$
* **input_loss** - loss for origin dataset sample: $text + image$

For training we should produce files for both $\mathcal{M}$ and $\mathcal{M}_{leak}$ models.

For finetunned sft model **run command**:

```bash
python job_launcher.py --script="fimmia.video.loss_calc_qwen25" \
  --model_id=data/Qwen2.5-VL-3B-Instruct \
  --model_name=Qwen2.5-VL-3B-Instruct \
  --label=0 \
  --df_path="data/train.csv" \
  --part_size=5000
```

In [119]:
df = pd.read_csv("data/train/loss/Qwen2.5-VL-3B-Instruct/no_leak/part_0.csv")

In [121]:
df["video"] = [os.path.join(ds_samples_dir, os.path.split(x)[-1]) for x in df["video"]]

In [123]:
df.head(5)

Unnamed: 0,input,answer,video,ds_name,neighbor,neighbor_loss,input_loss
0,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –ß—Ç–æ –í–æ–ø—Ä–æ—Å: –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ –≤–∏...,0.872269,0.863068
1,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ –≤–∏...,0.93577,0.863068
2,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –ß—Ç–æ –≤ –≤–∏–¥–µ–æ —Å 00.51...,0.862926,0.863068
3,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤–∏–¥–µ–æ —Å ...,1.260783,0.863068
4,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç –≤ ...,A,data/CommonVideoQA/samples/video419.mp4,CommonVideoQA,–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –ß—Ç–æ –ø—Ä–æ–∏—Å—Ö–æ–¥–∏—Ç —Å–∫–æ–≤...,1.256082,0.863068


---

#### Training the attack model
The core of FiMMIA is a binary neural network classifier trained to distinguish between models that have and have not seen the data. For each neighbor $k$ we create two training examples by computing feature differences:
$$\Delta \mathcal{L} = \mathcal{L} - \mathcal{L}^k_{\prime}, \quad \Delta e = e - e^{k}_{\prime}$$

These feature vectors are paired with labels $y \in \{0, 1\}$ indicating whether the losses came from $\mathcal{M}$ (non-leaked) or $\mathcal{M}_{leak}$ (leaked). However, absolute values of these statistics may vary across datasets and models. To make the system more stable, we apply the z-score normalization technique. During the training phase, we calculate the mean $\mu$ and standard deviation $\sigma$ for the models loss differences $\Delta \mathcal{L}$ using the evaluation data. 
$$\Delta \mathcal{L}_{norm} = \frac{\Delta \mathcal{L}-\mu}{\sigma} $$.

This process yields random batch training triplets $(\Delta \mathcal{L}_{norm}, \Delta e, y)$ per original data point. The FiMMIA detector, $f_{FiMMIA}$ is trained to predict the probability $p=f_{FiMMIA}(\Delta \mathcal{L}_{norm}, \Delta e)$ that the input features originate from a model that has been trained on the target data.

##### The detailed architecture of the FiMMIA is provided below.

1. **Input Data**
* *loss\_input* A tensor fed into the *loss\_component*.
* *embedding\_input* A tensor fed into the *embedding\_component*.

2. **loss\_component**:
* A Linear layer: 1 input feature $\rightarrow$ *projection\_size* output features.
* *Dropout(0.2)* and  *ReLU* activation.
3. **embedding\_component**:
* A Linear layer: *embedding\_size* $\rightarrow$ *embedding\_size // 2*.
* *Dropout(0.2)* and  ReLU.
* A Linear layer: \texttt{embedding\_size // 2} $\rightarrow$ 512.
* *Dropout(0.2)* and  *ReLU*.
4. **Concatenation torch.hstack**:
* The outputs from the *loss\_component(projection\_size)* and the *embedding\_component(512)* are concatenated into a single vector of size *2 \* projection\_size*.
5. **attack\_encoding**:
* A series of 6 fully connected *Linear* layers with *Dropout(0.2)* and  *ReLU* activations between them: *2 \* projection\_size* $\rightarrow$ 512 $\rightarrow$ 256 $\rightarrow$ 128 $\rightarrow$ 64 $\rightarrow$ 32.
* The final linear layer: 32 $\rightarrow$ 2 (output logits for classification).
* A final  *ReLU* activation after the last layer.

7. **Output**
* The model returns the logits (size 2).
* If labels are provided, it also calculates and returns the cross-entropy loss.

After loss and embeddings calculation we should put this together to one dataset and calculate z-score statistics for training FiMMIA neural network.

---

##### Create dataset for training FiMMIA neural network

**Run command**

```bash
python job_launcher.py --script="fimmia.utils.mds_dataset" \
  --save_dir="data/train_all_mds" \
  --model_name="Qwen2.5-VL-3B-Instruct" \
  --origin_df_path="data/train.csv" \
  --shuffle=0 \
  --labels="0,1" \
  --modality_key="video" \
  --single_file=1
```

Here
* `save_dir` - path for saving merged dataset
* `model_name` - name of MLLM model (using for store results)
* `shuffle` - not shuffle data `0` or shuffle `1`
* `labels` - list of labels in dataset
* `modality_key` - modality column
* `single_file` - run on single file or batches

After running this command we get dataset in mds format with losses and embedding.

In [3]:
from fimmia.utils.data import get_mean_std, get_streaming_ds

In [None]:
ds = get_streaming_ds("data/train_mds")

In [None]:
for x in ds:
    break

In [11]:
x

{'answer': 'D',
 'ds_name': 'CommonVideoQA',
 'embedding_input': array([-0.00378995,  0.00234417, -0.00746693, ...,  0.00404607,
        -0.00334608,  0.00647654]),
 'hash': '-1220520432446283628',
 'input': '–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video>\n–í–æ–ø—Ä–æ—Å:\n–ö–∞–∫–∏–º –±—É–¥–µ—Ç –Ω–∞–∏–±–æ–ª–µ–µ –≤–µ—Ä–æ—è—Ç–Ω–æ–µ —Å–ª–µ–¥—É—é—â–µ–µ –¥–µ–π—Å—Ç–≤–∏–µ?\n\nA. –û—á–∏—Å—Ç–∫–∞ –æ–≥—É—Ä—Ü–∞ –æ—Ç –ø–∏—â–µ–≤–æ–π –ø–ª–µ–Ω–∫–∏.\nB. –ú—ã—Ç—å–µ –æ–≥—É—Ä—Ü–∞.\nC. –ù–∞—Ä–µ–∑–∫–∞ –æ–≥—É—Ä—Ü–∞.\nD. –ù–∞—Ä–µ–∑–∫–∞ —Å–∞–ª–∞—Ç–∞.',
 'label': 0,
 'loss_input': 1.46,
 'model_name': 'Qwen2.5-VL-3B-Instruct',
 'neighbor': '–í–∏–¥–µ–æ—Ñ–∞–π–ª: <video> –í–æ–ø—Ä–æ—Å: –ö–∞–∫–∏–º –±—É–¥–µ—Ç –Ω–∞–∏–±–æ–ª–µ–µ _—ç—Ñ—Ñ–µ–∫—Ç–∏–≤–Ω—ã–º_ —Å–ª–µ–¥—É—é—â–µ–µ –¥–µ–π—Å—Ç–≤–∏–µ? A. –û—á–∏—Å—Ç–∫–∞ _–æ–≥—É—Ä—Ü–∞_ –æ—Ç —â–∏–ø—Ü–æ–≤ _–∏_  –ø–ª–µ–Ω–∫–∏. B. –ú—ã—Ç—å–µ –æ–≥—É—Ä—Ü–∞. C. –ù–∞—Ä–µ–∑–∫–∞ –æ–≥—É—Ä—Ü–∞. D. –ù–∞—Ä–µ–∑–∫–∞ —Å–∞–ª–∞—Ç–∞.',
 'num_part': 0,
 'video': 'data/CommonVideoQA/samples/video035.mp4'}

**For test dataset `data/test.csv` apply all steps**:
* [Neighbor generation](#Neighbor-generation)
* [Embedding generation](#Embedding-generation)
* [Loss computation](#Loss-computation)

---

##### calculate z-score statistics

In [14]:
from fimmia.utils.utils import save_json, load_json

In [None]:
sigmas = get_mean_std(["data/train_mds"])

In [20]:
save_json(sigmas, "data/train_sigmas.json")

In [21]:
sigmas = load_json("data/train_sigmas.json")

In [22]:
sigmas

{'CommonVideoQA_Qwen2.5-VL-3B-Instruct': {'mean': 0.009537732455924316,
  'std': 0.15681258803478634}}

---

##### Training FiMMIA neural network

After data preparation run training of an attack model neural network FiMMIA:

```bash
python job_launcher.py --script="fimmia.train" \
  --train_dataset_path="data/train_mds" \
  --model_name="FiMMIABaseLineModelLossNormSTDV2" \
  --output_dir="data/FiMMIA-Video" \
  --num_train_epochs=10 \
  --optim="adafactor" \
  --learning_rate=0.00005 \
  --max_grad_norm=10 \
  --warmup_ratio=0.03 \
  --sigmas_path="data/train_sigmas.json" \
  --sigmas_type="std"
```

Here
* `train_dataset_path` - path to train mds dataset
* `val_dataset_path` - path to test mds dataset
* `model_name` - name FiMMIA neural network architecture
* `num_train_epochs` - number of training epochs
* `output_dir` - path to save FiMMIA model
* `optim` - pytorch optimizer name
* `learning_rate` - learning rate
* `max_grad_norm` - max gradient normalization
* `warmup_ratio` - warmup ratio for optimization
* `sigmas_path` - path for dict with normalization parameters
* `sigmas_type` - type of normalization

As the result we get FiMMIA neural network in directory `data/FiMMIA-Video`. Now we can load model and start Membership Inference Attacks against Large Multimodal LLMs.

In [4]:
from fimmia.fimmia_inference import ModelArguments, init_model

In [7]:
model_args = ModelArguments(
    model_name="FiMMIABaseLineModelLossNormSTDV2",
    model_path="data/FiMMIA-Video"
)
model = init_model(model_args)

In [8]:
model

FiMMIABaseLineModelLossNormSTDV2(
  (loss_component): Sequential(
    (0): Linear(in_features=1, out_features=512, bias=True)
    (1): Dropout(p=0.2, inplace=False)
    (2): ReLU()
  )
  (embedding_component): Sequential(
    (0): Linear(in_features=4096, out_features=2048, bias=True)
    (1): Dropout(p=0.2, inplace=False)
    (2): ReLU()
    (3): Linear(in_features=2048, out_features=512, bias=True)
    (4): Dropout(p=0.2, inplace=False)
    (5): ReLU()
  )
  (attack_encoding): Sequential(
    (0): Linear(in_features=1024, out_features=512, bias=True)
    (1): Dropout(p=0.2, inplace=False)
    (2): ReLU()
    (3): Linear(in_features=512, out_features=256, bias=True)
    (4): Dropout(p=0.2, inplace=False)
    (5): ReLU()
    (6): Linear(in_features=256, out_features=128, bias=True)
    (7): Dropout(p=0.2, inplace=False)
    (8): ReLU()
    (9): Linear(in_features=128, out_features=64, bias=True)
    (10): Dropout(p=0.2, inplace=False)
    (11): ReLU()
    (12): Linear(in_features=64, o

### Run MIA inference

Our pretrained models available on ü§ó HuggingFace [FiMMIA collection](https://huggingface.co/collections/ai-forever/fimmia). For Video we download the following model:

```bash
git clone https://huggingface.co/ai-forever/FiMMIA-Video
```

For inference FiMMIA model on new data we should **run command**:

```bash
python job_launcher.py --script="fimmia.fimmia_inference" \
  --model_name="FiMMIABaseLineModelLossNormSTDV2" \
  --model_path="FiMMIA-Video" \
  --test_path="data/train_mds" \
  --save_path="data/predictions.csv" \
  --save_metrics_path="data/metrics.csv" \
  --sigmas_path="data/train_sigmas.json" \
  --sigmas_type="std"
```
Here
* `model_name` - name FiMMIA neural network architecture
* `model_path` - path to load FiMMIA model
* `test_path` - path to test dataset
* `save_path` - path to save predictions
* `save_metrics_path` - path to save metrics

The running command will produce file with prdictions for our dataset for each neighbor and metrics file (if in dataset labels 0, 1).

In [17]:
from fimmia.utils.metrics import get_sample_scores, convert_str

In [20]:
preds = pd.read_csv("data/predictions.csv")
preds["score"] = preds["score"].apply(convert_str)

In [11]:
metrics = pd.read_csv("data/metrics.csv")

In [10]:
preds.head()

Unnamed: 0,ds_name,hash,label,score
0,CommonVideoQA,-2819276857036119732,0,[0. 0.5009828]
1,CommonVideoQA,-2819276857036119732,0,[0.543804 0.08863425]
2,CommonVideoQA,-2819276857036119732,0,[1.0642736 0. ]
3,CommonVideoQA,-2819276857036119732,0,[0.5575552 0.09166689]
4,CommonVideoQA,-2819276857036119732,0,[1.1353273 0. ]


In [12]:
metrics

Unnamed: 0,method,auroc,fpr95,tpr05,acc,per_neighbors_acc
0,smia,100.0%,0.0%,100.0%,0.985417,0.8783


For getting predictions for each sample in dataset we should aggregate neighbor scores

$$A(t, m) = \frac{1}{K}\sum_{k=1}^{K}f_{FiMMIA}({\Delta \mathcal{L}^k_{norm}}, \Delta e^k)$$

In [23]:
cscores, labels, y_true, y_pred = get_sample_scores(preds)

  0%|          | 2/4264 [00:00<08:59,  7.90it/s]


In [26]:
print(cscores[:5])

[0.2637179046869278, 0.12685959537823996, 0.3438585201899211, 0.2937004665533702, 0.035671127339204155]


If we wonna make a desicion we can $A(t,m) > 0.5$ -> leak.