Added example for the attack #108

bzamanlooy · 2025-12-12T21:21:56Z

PR Type

Other: Example

Short Description

Modified the attack to do population preparation separately that helps with lightening the tests and adding an example.

Tests Added

Added a test to make sure we can do no validation.

coderabbitai · 2025-12-12T21:26:06Z

📝 Walkthrough

Walkthrough

This PR refactors the Tartan–Federer membership inference attack by externalizing data preparation logic from the core attack module to the example runner script. The main changes include: moving prepare_population_dataset_for_attack from the toolkit's data_utils.py to examples/tartan_federer_attack/run_attack.py, introducing CSV-based loading for population datasets via a new population_data_dir parameter, making validation indices optional in the attack flow, adding data sufficiency validation checks, and introducing comprehensive test coverage for single-model and no-validation scenarios. Configuration-driven orchestration is established through YAML files and the example runner script.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

tartan_federer_attack.py requires careful review of data flow modifications, particularly the transition from dynamic population data construction to CSV-based loading and the handling of optional validation paths
Function migration from data_utils.py to run_attack.py needs verification that data preparation logic maintains integrity and produces equivalent results
Optional val_indices handling must be validated throughout the control flow to ensure correct branching when validation data is absent
CSV loading mechanism should be cross-checked against the original dynamic construction for correctness and edge cases
Test additions for single-model and no-validation scenarios should be reviewed for comprehensiveness and proper assertion of expected behavior

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Added example for the attack' is too vague and generic; it does not convey the main technical changes (population preparation separation, data processing pipeline) or the specific attack being exemplified.	Revise the title to be more descriptive, e.g., 'Add Tartan-Federer attack example with separate population preparation' to clearly indicate what example is being added and the key architectural change.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description follows the required template with PR Type, Short Description, and Tests Added sections, though the explanations are brief; the content is sufficient to understand the PR's intent.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bz/tf_example

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

examples/ensemble_attack/README.md (1)
24-42: README mixes Ensemble Attack docs with Tartan–Federer run command; make entrypoints consistent.

This README is titled/structured for Ensemble Attack (config paths and ./examples/ensemble_attack/run.sh), but Line 29 now instructs running examples.tartan_federer_attack.run_attack while Line 41 still uses examples.ensemble_attack.run_attack. Pick one and align the surrounding text/config references accordingly to avoid sending users down the wrong pipeline.

Suggested fix (if this README is meant for Ensemble Attack):
 ```bash
-python -m examples.tartan_federer_attack.run_attack
+python -m examples.ensemble_attack.run_attack
If instead the intent is to document the Tartan–Federer example here, I’d strongly recommend renaming/splitting the README and updating *all* references (`config.yaml` path, `run.sh`, terminology) to the Tartan–Federer locations to keep the doc coherent.

</blockquote></details>
<details>
<summary>src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py (2)</summary><blockquote>

`470-496`: **Remove debug print statements.**

The model indexing logic correctly handles optional validation, but Lines 478-479 contain debug print statements that should be removed.




Apply this diff:

```diff
         if model_number in train_indices:
-            print("Preparing training dataframe...")
-            print(f"Model dir: {model_dir}")
             population_df_for_training = prepare_dataframe(
532-563: Remove debug print statements.

The validation scoring logic is correct, but Lines 533 and 563 contain debug print statements that should be removed.

Apply this diff:
                 elif val_indices is not None and model_number in val_indices:
-                    print("Getting validation scores...")
                     batch_size = sample_per_val_model * 2
And:
             val_count += 1
-    print("Val count:", val_count)
     fitted_regression_model = fit_model(

🧹 Nitpick comments (1)

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py (1)

378-388: Refactor error messages to follow best practices.

The data sufficiency checks are essential for preventing runtime errors, but the error messages should be simplified per best practices (TRY003).

Apply this diff:

     if df_exclusive.shape[0] < samples_per_model:
         raise ValueError(
-            f"Not enough data to sample non-members from. Requested {samples_per_model} but only "
-            f"{df_exclusive.shape[0]} available."
+            f"Insufficient non-member samples: need {samples_per_model}, have {df_exclusive.shape[0]}"
         )
 
     if raw_data.shape[0] < samples_per_model:
         raise ValueError(
-            f"Not enough data to sample members members from. Requested {samples_per_model} but only "
-            f"{raw_data.shape[0]} available."
+            f"Insufficient member samples: need {samples_per_model}, have {raw_data.shape[0]}"
         )

As per static analysis hints.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8619b1 and 3d2b493.

⛔ Files ignored due to path filters (3)

tests/integration/attacks/tartan_federer/assets/population_data/population_dataset_for_training_attack.csv is excluded by !**/*.csv
tests/integration/attacks/tartan_federer/assets/population_data/population_dataset_for_validating_attack.csv is excluded by !**/*.csv
tests/integration/attacks/tartan_federer/assets/tabddpm_models/tabddpm_2/data_for_validating_MIA.csv is excluded by !**/*.csv

📒 Files selected for processing (7)

examples/ensemble_attack/README.md (1 hunks)
examples/tartan_federer_attack/README.md (1 hunks)
examples/tartan_federer_attack/configs/experiment_config.yaml (1 hunks)
examples/tartan_federer_attack/run_attack.py (1 hunks)
src/midst_toolkit/attacks/tartan_federer/data_utils.py (0 hunks)
src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py (15 hunks)
tests/integration/attacks/tartan_federer/test_tartan_federer_attack.py (2 hunks)

💤 Files with no reviewable changes (1)

src/midst_toolkit/attacks/tartan_federer/data_utils.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-11T16:08:49.024Z

Learnt from: lotif
Repo: VectorInstitute/midst-toolkit PR: 107
File: examples/gan/synthesize.py:1-47
Timestamp: 2025-12-11T16:08:49.024Z
Learning: When using SDV (version >= 1.18.0), prefer loading a saved CTGANSynthesizer with CTGANSynthesizer.load(filepath) instead of sdv.utils.load_synthesizer(). This applies to Python code across the repo (e.g., any script that loads a CTGANSynthesizer). Ensure the SDV version is >= 1.18.0 before using CTGANSynthesizer.load, and fall back to sdv.utils.load_synthesizer() only if a compatible alternative is required.

Applied to files:

tests/integration/attacks/tartan_federer/test_tartan_federer_attack.py
examples/tartan_federer_attack/run_attack.py
src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

🧬 Code graph analysis (4)

examples/ensemble_attack/README.md (2)

examples/ensemble_attack/run_attack.py (1)

main (48-87)

examples/ept_attack/run_ept_attack.py (1)

main (80-102)

tests/integration/attacks/tartan_federer/test_tartan_federer_attack.py (2)

src/midst_toolkit/common/random.py (2)

set_all_random_seeds (11-55)

unset_all_random_seeds (58-67)

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py (1)

tartan_federer_attack (577-725)

examples/tartan_federer_attack/run_attack.py (2)

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py (1)

tartan_federer_attack (577-725)

src/midst_toolkit/common/random.py (2)

set_all_random_seeds (11-55)

unset_all_random_seeds (58-67)

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py (1)

src/midst_toolkit/attacks/tartan_federer/data_utils.py (1)

evaluate_attack_performance (160-209)

🪛 Ruff (0.14.8)

examples/tartan_federer_attack/run_attack.py

43-43: Avoid specifying long messages outside the exception class

(TRY003)

63-63: Avoid specifying long messages outside the exception class

(TRY003)

75-75: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

117-117: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

122-122: String contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)

142-142: Unpacked variable mia_performance_train is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

142-142: Unpacked variable mia_performance_val is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

142-142: Unpacked variable mia_performance_test is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

379-382: Avoid specifying long messages outside the exception class

(TRY003)

385-388: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: integration-tests
GitHub Check: unit-tests
GitHub Check: run-code-check

🔇 Additional comments (13)

examples/tartan_federer_attack/README.md (1)

1-33: LGTM: Clear documentation with minor typo in config comment.

The README provides clear guidance on the data processing flow and execution. The structure is well-organized with distinct sections for data processing and running the attack.

Note: Line 2 of the YAML config (experiment_config.yaml) contains a typo: tets_attack_model.py should be test_attack_model.py.

tests/integration/attacks/tartan_federer/test_tartan_federer_attack.py (2)

24-24: LGTM: Population data directory added.

The addition of population_data_dir aligns with the refactored data loading approach.

179-180: LGTM: Test execution added.

The new tests are properly registered in the main execution block.

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py (6)

400-416: LGTM: Optional validation parameter added.

The function signature correctly supports optional validation by accepting val_indices: list[int] | None.

447-448: LGTM: CSV-based population loading implemented.

The refactor to load population datasets directly from CSV files simplifies the data flow and aligns with the PR objectives.

458-463: LGTM: Conditional validation arrays.

The conditional creation of validation arrays properly handles the optional validation flow.

577-598: LGTM: Function signature updated.

The tartan_federer_attack function signature correctly includes the new population_data_dir parameter and optional val_indices.

660-663: LGTM: Conditional model indexing.

The logic correctly constructs model_folders_indices based on whether validation is enabled.

705-710: LGTM: Conditional validation evaluation.

The evaluation logic properly handles the case when val_indices is None by setting mia_performance_val = None.

examples/tartan_federer_attack/run_attack.py (4)

19-71: LGTM: Data preparation logic moved to example.

The prepare_population_dataset_for_attack function is well-implemented with proper validation. Moving this from the toolkit to the example is a good separation of concerns.

Note: The error messages on Lines 43 and 63 could be simplified per TRY003, but this is a minor style consideration.

73-112: LGTM: Data processing orchestration.

The run_data_processing function cleanly orchestrates population dataset preparation and CSV output.

114-167: LGTM: Hydra-based attack runner.

The run_attack function properly uses Hydra for configuration management, sets deterministic seeds, conditionally runs data processing, and invokes the attack with correct parameters.

Note: The unpacked variables at Line 142 are not used, but this is acceptable for an entry point where you may want to observe all return values for logging/debugging purposes.

169-170: LGTM: Entry point defined.

The __main__ block correctly invokes the Hydra-decorated run_attack function.

examples/tartan_federer_attack/configs/experiment_config.yaml

examples/tartan_federer_attack/run_attack.py

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

tests/integration/attacks/tartan_federer/test_tartan_federer_attack.py

On branch bz/tf_example Your branch is ahead of 'origin/bz/tf_example' by 2 commits. (use "git push" to publish your local commits) Changes to be committed: modified: src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

fatemetkl

I have left some comments and questions, but most of them are minor.
Looks good overall!

examples/tartan_federer_attack/configs/experiment_config.yaml

examples/tartan_federer_attack/README.md

examples/tartan_federer_attack/run_attack.py

tests/integration/attacks/tartan_federer/test_tartan_federer_attack.py

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

fatemetkl · 2025-12-16T14:37:19Z

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

-        models_base_dir=model_data_dir,
-        columns_for_deduplication=columns_for_deduplication,
-    )
+    population_df_for_training = pd.read_csv(population_data_dir / "population_dataset_for_training_attack.csv")


Maybe we can log here that training dataset is loaded successfully. Also display an error message in case the file was not found under population_data_dir prompting the user to place the file at this address.

I think I used to have this but David mentioned that the pd.read_csv will handle error throwing if the file does not exist. I am still happy to add it :)

That's right, its default error message can be enough.

fatemetkl · 2025-12-16T14:37:54Z

src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

-        columns_for_deduplication=columns_for_deduplication,
-    )
+    population_df_for_training = pd.read_csv(population_data_dir / "population_dataset_for_training_attack.csv")
+    population_df_for_validation = pd.read_csv(population_data_dir / "population_dataset_for_validating_attack.csv")


Same as the above comment.

fatemetkl · 2025-12-16T14:39:53Z

examples/tartan_federer_attack/run_attack.py

+    attack_cfg = cfg["attack_config"]
+    classifier_cfg = cfg["classifier_config"]
+
+    mia_performance_train, mia_performance_val, mia_performance_test = tartan_federer_attack(


Just curious, I am missing where the user can define the type of the attack (white-box or black-box). Do we give this option to the user?
This example seems to be black-box, but I might be wrong. Adding this information to the README would also be great!

So there is really not a difference. If we set the target_model_subdir to be the one containing the shadow model trained using synthetic data, the attack is going to be black box essentially and we currently do not have them trained using the toolkit so perhaps I can add that as a todo as well for Elahe and to update the code then.

Got it! So it depends on what is in the config.target_shadow_model_subdir directory. If there are trained shadow models there, it would be black-box, but in this config, it is set to target_shadow_model_subdir: "." So it will be white-box. It wasn't entirely clear to me at first, but I suppose users will already be aware of this.

Or you can optionally note that in a comment in experiment_config.yaml.

fatemetkl · 2025-12-16T18:32:00Z

examples/tartan_federer_attack/configs/experiment_config.yaml

@@ -1,18 +1,16 @@
 # Ensemble experiment configuration


This should be Tartan-Federer instead 🙂

fatemetkl

LGTM! 🚀
Added a few minor and optional comments.

fatemetkl · 2025-12-16T18:50:02Z

examples/tartan_federer_attack/run_attack.py

+    attack_cfg = cfg["attack_config"]
+    classifier_cfg = cfg["classifier_config"]
+
+    mia_performance_train, mia_performance_val, mia_performance_test = tartan_federer_attack(


Nitpicking comment (sorry): it might be a bit more readable if the inputs followed the function signature order.

fatemetkl · 2025-12-16T19:32:10Z

examples/tartan_federer_attack/run_attack.py

+    attack_cfg = cfg["attack_config"]
+    classifier_cfg = cfg["classifier_config"]
+
+    mia_performance_train, mia_performance_val, mia_performance_test = tartan_federer_attack(


Got it! So it depends on what is in the config.target_shadow_model_subdir directory. If there are trained shadow models there, it would be black-box, but in this config, it is set to target_shadow_model_subdir: "." So it will be white-box. It wasn't entirely clear to me at first, but I suppose users will already be aware of this.

fatemetkl · 2025-12-16T19:34:47Z

examples/tartan_federer_attack/run_attack.py

+    attack_cfg = cfg["attack_config"]
+    classifier_cfg = cfg["classifier_config"]
+
+    mia_performance_train, mia_performance_val, mia_performance_test = tartan_federer_attack(


Or you can optionally note that in a comment in experiment_config.yaml.

Added Example

3d2b493

bzamanlooy requested review from fatemetkl and lotif December 12, 2025 21:22

bzamanlooy self-assigned this Dec 12, 2025

bzamanlooy requested review from ElahehBassak, emersodb, masi-sh and sarakodeiri December 12, 2025 21:23

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

bzamanlooy and others added 4 commits December 12, 2025 16:28

fix mypy issue

2972e44

code rabbit critical comments

d0b899a

coderabbit comments

99d21ae

coderabbit comments

7c7dece

On branch bz/tf_example Your branch is ahead of 'origin/bz/tf_example' by 2 commits. (use "git push" to publish your local commits) Changes to be committed: modified: src/midst_toolkit/attacks/tartan_federer/tartan_federer_attack.py

fatemetkl reviewed Dec 16, 2025

View reviewed changes

bzamanlooy added 2 commits December 16, 2025 13:43

Addressed Fateme's comemnts

7e08952

Addressed Fateme's comemnts

59c70d3

fatemetkl self-requested a review December 16, 2025 18:30

fatemetkl reviewed Dec 16, 2025

View reviewed changes

minor

0670e55

fatemetkl approved these changes Dec 16, 2025

View reviewed changes

bzamanlooy and others added 3 commits December 17, 2025 16:17

var name changes for consistency

230bc15

fixed vulnerability

bd3da0b

Merge branch 'main' into bz/tf_example

2502a7e

bzamanlooy merged commit cb9752b into main Dec 17, 2025
6 checks passed

bzamanlooy deleted the bz/tf_example branch December 17, 2025 19:50

Added example for the attack #108

Added example for the attack #108

Uh oh!

Conversation

bzamanlooy commented Dec 12, 2025

PR Type

Short Description

Tests Added

Uh oh!

coderabbitai bot commented Dec 12, 2025

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fatemetkl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bzamanlooy Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fatemetkl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bzamanlooy Dec 16, 2025 •

edited

Loading