diff --git a/deepguard/MS_EffGCViT.md b/deepguard/MS_EffGCViT.md index 7c4f5bf..bfaee2c 100644 --- a/deepguard/MS_EffGCViT.md +++ b/deepguard/MS_EffGCViT.md @@ -45,10 +45,6 @@ much larger SOTA models while using a fraction of the parameters and compute.
-| Variant | Test@Acc | Test@AUC | Test@LogLoss | -| :------ | :------: | :------: | :----------: | -| ms_eff_gcvit_b0 | 0.9842 | 0.9965 | 0.0283 | -| ms_eff_gcvit_b5 | 0.9981 | 0.9984 | 0.0089 |
@@ -56,10 +52,6 @@ much larger SOTA models while using a fraction of the parameters and compute.
-| Variant | Test@Acc | Test@AUC | Test@LogLoss | -| :------ | :------: | :------: | :----------: | -| ms_eff_gcvit_b0 | 0.9808 | 0.9969 | 0.0637 | -| ms_eff_gcvit_b5 | 0.9850 | 0.9974 | 0.0492 |
@@ -67,10 +59,6 @@ much larger SOTA models while using a fraction of the parameters and compute.
-| Variant | Test@Acc | Test@AUC | Test@LogLoss | -| :------ | :------: | :------: | :----------: | -| ms_eff_gcvit_b0 | 0.9655 | 0.9792 | 0.1237 | -| ms_eff_gcvit_b5 | 0.9792 | 0.9974 | 0.0492 |
## Model Indroduction diff --git a/deepguard/MS_EffViT.md b/deepguard/MS_EffViT.md index 8ee7312..b9e2918 100644 --- a/deepguard/MS_EffViT.md +++ b/deepguard/MS_EffViT.md @@ -12,6 +12,8 @@ This Repository presents the PyTorch implementation of **Multi Scale Efficient V This model is a **frame-level** and **spatial-domain** architecture, designed to perform classification tasks on both **static images** and **video sequences** + + ## 💥 News 💥 - [**02.03.2026**] 🔥🔥 We have released **FaceForensics++** fine-tuned **MS-Eff-ViT B5** model weightes for **384X384** @@ -21,18 +23,32 @@ This model is a **frame-level** and **spatial-domain** architecture, designed to ## Model Performance -MS_Eff_ViT achieves state-of-the-art(SOTA) results across deepfake video classification. On Celeb_DF(v2) dataset, MS_EFF_GCViT variants with `5.9M`, `52.0M` parameters achieve `0.9742`, `0.9900` Accuracy. Notably, the MS_EFF_ViT_B0 variant demonstrates exceptional efficiency, matching or exceeding SOTA performance even with a siginificantly lower parameter +**MS-EFF-ViT achieves state-of-the-art (SOTA) results across two DeepFake benchmarks.** +The model ships in two variants from a single architecture — **Fast (b0)** for real-time / edge +deployment and **Pro (b5)** for enterprise-grade accuracy. Notably, **Fast** matches or exceeds +much larger SOTA models while using a fraction of the parameters and compute. +

+ +

-### Test Result of Celeb_DF(v2) - +> On **Celeb-DF(v2)**, Pro reaches **0.9900 Acc** (rank #2) and Fast **0.9742** (rank #4) among 20 architectures. + +
+📊 Celeb-DF (v2) — Accuracy & Efficiency +
+ + +
-Test Result of FaceForensics++ - +📊 FaceForensics++ — Accuracy & Efficiency +
+
+ ## Model Introduction Multi Scale Efficient Vision Transformer is an optimized multi-scale hybrid architecture that integrates CNN-driven spatial inductive bias with self-attention mechanisms to effectively identify subtle(local) artifacts and macro(global) artifacts for robust deepfake forensics." diff --git a/docs/benchmarks/celeb_df_v2_vit.png b/docs/benchmarks/celeb_df_v2_vit.png index ebba05f..edb0b22 100644 Binary files a/docs/benchmarks/celeb_df_v2_vit.png and b/docs/benchmarks/celeb_df_v2_vit.png differ diff --git a/docs/benchmarks/celeb_df_v2_vit_2.png b/docs/benchmarks/celeb_df_v2_vit_2.png new file mode 100644 index 0000000..21fd9e6 Binary files /dev/null and b/docs/benchmarks/celeb_df_v2_vit_2.png differ diff --git a/docs/benchmarks/ff_vit.png b/docs/benchmarks/ff_vit.png index ad28527..831af92 100644 Binary files a/docs/benchmarks/ff_vit.png and b/docs/benchmarks/ff_vit.png differ diff --git a/docs/benchmarks/vit_summary_bars.png b/docs/benchmarks/vit_summary_bars.png new file mode 100644 index 0000000..5e08f9c Binary files /dev/null and b/docs/benchmarks/vit_summary_bars.png differ