## Silence Removal `Energy Based`

**Detect Non-Silent Intervals**:
   - Use `librosa.effects.split()`, which finds intervals where the audio energy is above a threshold.
   - `top_db` parameter controls the sensitivity.

**Concatenate Non-Silent Parts**:
   - Slice and combine all non-silent parts to reconstruct the speech without silence.

- Uses perceptually meaningful loudness thresholds (decibels).

In [7]:
import librosa
import numpy as np
from IPython.display import Audio,display

In [8]:
input_audio = 'common_voice_en_18499990.mp3'

# Load audio
y, sr = librosa.load(input_audio, sr=None)

# Play Original Audio
print("Original Audio:")
display(Audio(y, rate=sr))

Original Audio:


In [9]:
def remove_silence(input_file, top_db=20):
    """
    Remove silent periods from an audio file and save the result.

    Args:
    - input_file (str): Path to the input audio file.
    - top_db (float): Threshold (in decibels) below reference to consider as silence.
    """
    try:
        # Load audio
        y, sr = librosa.load(input_file, sr=None)

        # Detect non-silent intervals
        non_silent_intervals = librosa.effects.split(y, top_db=top_db)

        # Concatenate all non-silent intervals
        non_silent_audio = np.concatenate([y[start:end] for start, end in non_silent_intervals])

        # Play Original Audio
        print("clean Audio:")
        display(Audio(non_silent_audio, rate=sr))

    except Exception as e:
        print(f"Error processing {input_file}: {e}")


In [10]:
remove_silence(input_audio)

clean Audio:
