<a href="https://colab.research.google.com/github/78juli/AudioSR-Colab-Fork/blob/main/AudioSR_Colab_Fork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AudioSR-Colab-Fork v0.5

---
Colab adaptation of AudioSR, with some tweaks:

v0.5
- input audio is resampled accordingly to 'input_cutoff' (instead of lowpass filtering)
- each processed chunk is normalised at same LUFS level than input chunk (fix the volume drop issue)

v0.4
- code rework, inference.py created for local CLI usage.

v0.3
- added : multiband ensemble option to use original audio below the given cutoff frequency and the generated audio above.
- fixed : other than .wav input error while saving the final audio

v0.2
- added a chunking feature to process input of any length
- added stereo handling, stereo input channels will be splitted and processed independantly (dual mono) and then reconstructed as stereo audio.
- added overlap feature to smooth the transitions between chunks (don't use high values because AudioSR is not 100% phase accurate and this will create weird phase cancellations accross the overlapping regions)
---
Adaptation & tweaks by [jarredou](https://https://github.com/jarredou/)

Original work [AudioSR: Versatile Audio Super-resolution at Scale](https://github.com/haoheliu/versatile_audio_super_resolution) by Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley



In [1]:
#@markdown # Installation
from google.colab import drive
drive.mount('/content/drive')

!git clone https://github.com/haoheliu/versatile_audio_super_resolution.git
%cd versatile_audio_super_resolution
!pip install cog huggingface_hub unidecode phonemizer einops torchlibrosa transformers ftfy timm librosa pyloudnorm
!pip install huggingface_hub transformers==4.30.2 gradio soundfile progressbar librosa audiosr unidecode
#!pip install -r requirements.txt

!wget https://raw.githubusercontent.com/jarredou/AudioSR-Colab-Fork/main/inference.py
#from IPython.display import clear_output
#clear_output(wait=False)

Mounted at /content/drive
Cloning into 'versatile_audio_super_resolution'...
remote: Enumerating objects: 425, done.[K
remote: Counting objects: 100% (193/193), done.[K
remote: Compressing objects: 100% (80/80), done.[K
remote: Total 425 (delta 143), reused 113 (delta 113), pack-reused 232 (from 1)[K
Receiving objects: 100% (425/425), 20.77 MiB | 7.51 MiB/s, done.
Resolving deltas: 100% (201/201), done.
/content/versatile_audio_super_resolution
Collecting cog
  Downloading cog-0.13.7-py3-none-any.whl.metadata (39 kB)
Collecting unidecode
  Downloading Unidecode-1.3.8-py3-none-any.whl.metadata (13 kB)
Collecting phonemizer
  Downloading phonemizer-3.3.0-py3-none-any.whl.metadata (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
Collecting torchlibrosa
  Downloading torchlibrosa-0.1.0-py3-none-any.whl.metadata (3.5 kB)
Collecting ftfy
  Downloading ftfy-6.3.1-py3-none-any.whl.metadata (7.3 kB)
Collecting py

--2025-02-14 21:26:47--  https://raw.githubusercontent.com/jarredou/AudioSR-Colab-Fork/main/inference.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10078 (9.8K) [text/plain]
Saving to: ‘inference.py.1’


2025-02-14 21:26:47 (62.7 MB/s) - ‘inference.py.1’ saved [10078/10078]



### **IMPORTANT NOTE**

#### If the inference cell crashes, restart the runtime (do not disconnect, just restart it), else it will cause memory errors !

*If you're are doing multiple runs, think also to restart the runtime every 4 or 5 files to clean up memory*

In [1]:
%cd /content/versatile_audio_super_resolution
import gc
import os
import random
import numpy as np
from scipy.signal.windows import hann
import soundfile as sf
import torch
from cog import BasePredictor, Input, Path
import tempfile
import librosa
from audiosr import build_model, super_resolution
from scipy import signal
import warnings
warnings.filterwarnings("ignore")

os.environ["TOKENIZERS_PARALLELISM"] = "true"
torch.set_float32_matmul_precision("high")


#@markdown #Inference
input_file_path = '/content/drive/MyDrive/input' #@param {type:"string"}
output_folder = '/content/drive/MyDrive/output_folder' #@param {type:"string"}
#@markdown ---
ddim_steps= 20 #@param {type:"slider", min:20, max:200, step:10}
overlap = 0.04 #@param {type:"slider", min:0, max:0.96, step:0.04}
guidance_scale=1.5 #@param {type:"slider", min:1, max:15, step:0.5}
seed = 0 # @param {type:"integer"}
chunk_size = 10.24 # @param [5.12, 10.24, 20.48] {type:"raw"}
multiband_ensemble = True # @param {type:"boolean"}
input_cutoff = "14000" #@param [20000, 19000, 18000, 17000, 16000, 14000, 13000, 12000, 11000, 10000, 9000, 8000, 7000, 6000, 5000, 4000, 3000, 2000]
input_cutoff = int(input_cutoff)



!python inference.py --input "{input_file_path}" \
                     --output "{output_folder}" \
                     --ddim_steps {ddim_steps} \
                     --guidance_scale {guidance_scale} \
                     --seed {seed} \
                     --chunk_size {chunk_size} \
                     --overlap {overlap} \
                     --multiband_ensemble {multiband_ensemble} \
                     --input_cutoff {input_cutoff}

/content/versatile_audio_super_resolution


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]



Loading Model...
Loading AudioSR: basic
Loading model on cuda:0
pytorch_model.bin: 100% 6.18G/6.18G [02:29<00:00, 41.4MB/s]
DiffusionWrapper has 258.20 M params.
^C
