[egs] Add a new pre-trained model (X-UMXL) (#665)

* [Fix] Update "egs/musdb18/X-UMX/requirements.txt" * [Fix] Bug of X-UMX and README.md * [egs] Add X-UMX Large (X-UMXL) to the list of pretrained models * Apply black
asteroid-team · May 21, 2023 · 78e419e · 78e419e
1 parent bd3caa3
commit 78e419e
Show file tree

Hide file tree

Showing 3 changed files with 34 additions and 2 deletions.
diff --git a/asteroid/utils/hub_utils.py b/asteroid/utils/hub_utils.py
@@ -25,6 +25,7 @@
     "tmirzaev-dotcom/ConvTasNet_Libri3Mix_sepnoisy": "https://zenodo.org/record/4020529/files/model.pth?download=1",
     "mhu-coder/ConvTasNet_Libri1Mix_enhsingle": "https://zenodo.org/record/4301955/files/model.pth?download=1",
     "r-sawata/XUMX_MUSDB18_music_separation": "https://zenodo.org/record/4704231/files/pretrained_xumx.pth?download=1",
+    "r-sawata/XUMXL_MUSDB18_music_separation": "https://zenodo.org/record/7128659/files/pretrained_xumxl.pth?download=1",
 }
 
 SR_HASHTABLE = {k: 8000.0 if not "DeMask" in k else 16000.0 for k in MODELS_URLS_HASHTABLE}

diff --git a/egs/musdb18/X-UMX/README.md b/egs/musdb18/X-UMX/README.md
@@ -1,6 +1,6 @@
 #  CrossNet-Open-Unmix (X-UMX)
 
-This recipe contains __CrossNet-Open-Unmix (X-UMX)__, an improved version of [Open-Unmix (UMX)](https://github.com/sigsep/open-unmix-nnabla) for music source separation. X-UMX achieves an improved performance without additional learnable parameters compared to the original UMX model. Details of X-UMX can be found in [this paper](https://arxiv.org/abs/2010.04228). X-UMX is one of the two official baseline models for the [Music Demixing (MDX) Challenge 2021](https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021).
+This recipe contains __CrossNet-Open-Unmix (X-UMX)__, an improved version of [Open-Unmix (UMX)](https://github.com/sigsep/open-unmix-nnabla) for music source separation. X-UMX achieves an improved performance without additional learnable parameters compared to the original UMX model. Details of X-UMX can be found in [this paper](https://arxiv.org/abs/2010.04228). X-UMX is one of the two official baseline models for the [Music Demixing (MDX) Challenge 2021](https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021). Furthermore, the extended version trained on large-scale data, named X-UMX Large (X-UMXL), can be found in [this paper](https://arxiv.org/abs/2305.07855).
 
 __Related Projects:__ [umx-pytorch](https://github.com/sigsep/open-unmix-pytorch) | [umx-nnabla](https://github.com/sigsep/open-unmix-nnabla) | x-umx-pytorch | [x-umx-nnabla](https://github.com/sony/ai-research-code/tree/master/x-umx) | [musdb](https://github.com/sigsep/sigsep-mus-db) | [museval](https://github.com/sigsep/sigsep-mus-eval)
 
@@ -9,13 +9,19 @@ Pretrained models on MUSDB18 for X-UMX, which reproduce the results from our pap
 ```
 python eval.py --no-cuda --root [Path to MUSDB18]
 ```
+
+You can also use X-UMXL by adding the option `--large` as follows:
+```
+python eval.py --no-cuda --root [Path to MUSDB18] --large
+```
 The separations along with the evaluation scores will be saved in `./results_using_pre-trained`.
 
 Please note that X-UMX requires quite some memory due to its crossing architecture. Hence, switching on `--no-cuda` to prevent out-of-memory error is recommended.
 
 
 ### Results on MUSDB18
 
+#### X-UMX
 | Median of Median |   SDR   |   SIR  |  ISR   |  SAR  |
 |:----------------:|:-------:|:------:|:------:|:------|
 |      vocals      |  6.612  | 14.167 | 11.774 | 6.750 |
@@ -29,3 +35,18 @@ Please note that X-UMX requires quite some memory due to its crossing architectu
 |      drums       |  5.793  | 11.167 | 10.164 | 5.687 |
 |      bass        |  4.558  | 8.797  | 9.786  | 5.828 |
 |      other       |  4.348  | 6.952  | 9.402  | 4.849 |
+
+#### X-UMXL
+| Median of Median |   SDR   |   SIR  |  ISR   |  SAR  |
+|:----------------:|:-------:|:------:|:------:|:------|
+|      vocals      |  7.565  | 16.737 | 13.900 | 7.469 |
+|      drums       |  7.394  | 13.726 | 13.533 | 7.337 |
+|      bass        |  6.283  | 12.650 | 10.421 | 6.022 |
+|      other       |  4.833  | 8.315  | 11.352 | 5.292 |
+
+|  Mean of Median  |   SDR   |   SIR  |  ISR   |  SAR  |
+|:----------------:|:-------:|:------:|:------:|:------|
+|      vocals      |  5.135  | 9.816  | 12.728 | 6.546 |
+|      drums       |  7.257  | 12.478 | 12.274 | 7.005 |
+|      bass        |  5.826  | 11.157 | 9.860  | 5.670 |
+|      other       |  4.959  | 7.897  | 11.009 | 5.150 |
diff --git a/egs/musdb18/X-UMX/eval.py b/egs/musdb18/X-UMX/eval.py
@@ -167,6 +167,7 @@ def eval_main(
     softmask=False,
     residual_model=False,
     model_name="xumx",
+    large=False,
     outdir=None,
     start=0.0,
     duration=-1.0,
@@ -176,7 +177,11 @@ def eval_main(
     model_name = os.path.abspath(model_name)
     if not (os.path.exists(model_name)):
         outdir = os.path.abspath("./results_using_pre-trained")
-        model_name = "r-sawata/XUMX_MUSDB18_music_separation"
+        model_name = (
+            "r-sawata/XUMXL_MUSDB18_music_separation"
+            if large
+            else "r-sawata/XUMX_MUSDB18_music_separation"
+        )
     else:
         outdir = os.path.join(
             os.path.abspath(outdir),
@@ -276,6 +281,10 @@ def eval_main(
         help="Audio chunk duration in seconds, negative values load full track",
     )
 
+    parser.add_argument(
+        "--large", action="store_true", default=False, help="Download and use X-UMX Large (X-UMXL)"
+    )
+
     parser.add_argument(
         "--no-cuda", action="store_true", default=False, help="disables CUDA inference"
     )
@@ -292,6 +301,7 @@ def eval_main(
         niter=args.niter,
         residual_model=args.residual_model,
         model_name=model,
+        large=args.large,
         outdir=args.outdir,
         start=args.start,
         duration=args.duration,