<a href="https://colab.research.google.com/github/Aishacoded23/AdminLTE/blob/master/Alayo%20Mix%20%26%20Mas%20Workflo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Create a Gradio application that allows users to upload a voice track, a beat track, and an optional reference track. The application should include options for 'check if song is already mixed/mastered', 'auto mix and mastering', 'auto-tune', 'effects', and 'balance voice and beat'. The application should utilize FFmpeg, pydub, and librosa for audio processing and visualization, and will initially use placeholder functions for advanced audio processing features. The final output should include a basic audio visualization and a summary of the implemented interface and placeholders.

## Install FFmpeg

### Subtask:
Install FFmpeg, a critical tool for handling various audio formats and operations, which many audio libraries depend on. This should be done before installing Python audio libraries.


**Reasoning**:
The first step in the instructions is to update the package lists using `!apt-get update`.



In [None]:
get_ipython().system('apt-get update')

**Reasoning**:
The next step is to install FFmpeg using `!apt-get install -y ffmpeg` as per the instructions.



In [None]:
get_ipython().system('apt-get install -y ffmpeg')

## Setup Environment and Libraries

### Subtask:
Install necessary Python libraries such as `gradio`, `pydub` (for basic audio manipulation), and `librosa` (for audio analysis and visualization if needed). This step will ensure all required packages are available.


**Reasoning**:
I need to install the required Python libraries (`gradio`, `pydub`, `librosa`) using pip as specified in the instructions. This is a common first step for Python projects.



In [None]:
get_ipython().system('pip install gradio pydub librosa')

## Design Gradio Interface

### Subtask:
Create the Gradio web interface, including components for uploading voice, beat, and an optional reference track. Also, add checkboxes or radio buttons for options like 'check if song is already mixed/mastered', 'auto mix and mastering', 'auto-tune', 'effects', and 'balance voice and beat'.


**Reasoning**:
The first instruction is to import the `gradio` library as `gr` to enable the use of Gradio components for the interface design.



In [None]:
import gradio as gr
print("Gradio library imported as gr.")

**Reasoning**:
Now that the `gradio` library is imported, I will proceed with defining the input components (audio uploads) and the checkbox components for the processing options, and then structure them within a basic `gradio.Interface` as per the instructions. This directly addresses instructions 2, 3, and 4 of the subtask.



In [None]:
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)", optional=True)

check_mixed_mastered = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune = gr.Checkbox(label="Auto-Tune")
effects = gr.Checkbox(label="Effects")
balance_voice_beat = gr.Checkbox(label="Balance Voice and Beat")

# Placeholder function for the interface, actual logic will be added later
def process_audio(voice_track, beat_track, reference_track,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt):
    # This function will be replaced with actual processing logic later
    return "Processing complete! (Placeholder)", "Summary of options selected:" + \
           f" Mixed/Mastered Check: {check_mixed}, Auto Mix/Master: {auto_mix}," + \
           f" Auto-Tune: {auto_tune_opt}, Effects: {effects_opt}, Balance: {balance_opt}"


# Define the Gradio Interface
iface = gr.Interface(
    fn=process_audio, # Placeholder function
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered,
        auto_mix_mastering,
        auto_tune,
        effects,
        balance_voice_beat
    ],
    outputs=["text", "text"], # Placeholder for output, will be audio and text summary later
    title="Voice and Beat Mixer",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface components defined and basic interface structure created.")


**Reasoning**:
The error `TypeError: Audio.__init__() got an unexpected keyword argument 'optional'` indicates that the `optional` parameter is not valid for `gr.Audio`. I need to remove this parameter to fix the error. Gradio handles optional inputs by passing `None` to the function if no file is provided for an `Audio` input.



In [None]:
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune = gr.Checkbox(label="Auto-Tune")
effects = gr.Checkbox(label="Effects")
balance_voice_beat = gr.Checkbox(label="Balance Voice and Beat")

# Placeholder function for the interface, actual logic will be added later
def process_audio(voice_track, beat_track, reference_track,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt):
    # This function will be replaced with actual processing logic later
    return "Processing complete! (Placeholder)", "Summary of options selected:" + \
           f" Mixed/Mastered Check: {check_mixed}, Auto Mix/Master: {auto_mix}," + \
           f" Auto-Tune: {auto_tune_opt}, Effects: {effects_opt}, Balance: {balance_opt}"


# Define the Gradio Interface
iface = gr.Interface(
    fn=process_audio, # Placeholder function
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered,
        auto_mix_mastering,
        auto_tune,
        effects,
        balance_voice_beat
    ],
    outputs=["text", "text"], # Placeholder for output, will be audio and text summary later
    title="Voice and Beat Mixer",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface components defined and basic interface structure created.")

## Implement Audio Processing Placeholders

### Subtask:
Define placeholder Python functions for the core audio tasks: `auto_mix_mastering`, `auto_tune`, `apply_effects`, `balance_voice_beat`. These functions will initially contain print statements or simple passthrough logic, indicating where the complex audio processing code will eventually go.


**Reasoning**:
I need to define the four placeholder functions (`auto_mix_mastering`, `auto_tune`, `apply_effects`, `balance_voice_beat`) as per the instructions, each with a print statement and returning a placeholder audio segment. These functions are crucial for the next step of integrating them into the Gradio interface.



In [None]:
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    # In a real scenario, this would apply advanced audio processing
    return audio_segment

def auto_tune(audio_segment):
    print("Applying auto-tune (placeholder).")
    # In a real scenario, this would apply auto-tune to the voice
    return audio_segment

def apply_effects(audio_segment):
    print("Applying effects (placeholder).")
    # In a real scenario, this would apply various audio effects
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    # In a real scenario, this would mix and balance the two tracks
    # For now, return the voice segment as a placeholder for the combined output
    return voice_segment # or a simple overlay: voice_segment.overlay(beat_segment)

print("Placeholder audio processing functions defined.")

**Reasoning**:
Now that the placeholder audio processing functions are defined, I need to integrate them into the `process_audio` function within the Gradio interface. This involves calling the appropriate placeholder functions based on the checkbox selections made by the user, processing the audio inputs, and ensuring the Gradio interface can execute this updated logic.



In [None]:
import gradio as gr
from pydub import AudioSegment

# Placeholder audio processing functions (re-defining them here for completeness in the new cell)
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    # In a real scenario, this would apply advanced audio processing
    return audio_segment

def auto_tune(audio_segment):
    print("Applying auto-tune (placeholder).")
    # In a real scenario, this would apply auto-tune to the voice
    return audio_segment

def apply_effects(audio_segment):
    print("Applying effects (placeholder).")
    # In a real scenario, this would apply various audio effects
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    # For now, return the voice segment as a placeholder for the combined output
    # In a real scenario, this would mix and balance the two tracks
    return voice_segment # or a simple overlay: voice_segment.overlay(beat_segment)

# Updated process_audio function to integrate placeholders
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt):

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}"

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}"

    processed_audio = voice_audio.overlay(beat_audio)
    summary = "Summary of options selected:"

    if check_mixed:
        summary += " Check Mixed/Mastered: Yes,"
        # Placeholder for actual check logic
        print("Checking if song is already mixed/mastered (placeholder).")

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += " Auto-Tune: Yes,"
        processed_audio = auto_tune(processed_audio)

    if effects_opt:
        summary += " Effects: Yes,"
        processed_audio = apply_effects(processed_audio)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        # The balance_voice_beat function needs to return a single segment
        # For now, it will return the voice segment, effectively ignoring the beat for this placeholder.
        # A more complex placeholder would involve actual mixing.
        balanced_segment = balance_voice_beat(voice_audio, beat_audio)
        # For this simple placeholder, we'll just use the already processed_audio
        # but in a real scenario, `balance_voice_beat` would return the mixed result.
        # For now, we simulate mixing by just continuing with `processed_audio` or replace it
        # with the output of `balance_voice_beat` if it was designed to return a combined audio.
        # Let's assume `balance_voice_beat` would return a single AudioSegment here for simplicity.
        # A proper implementation would need `pydub` to combine them.
        processed_audio = balanced_segment # Simplified: assuming balanced_segment is the final mixed audio

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]},
"
        print(f"Reference track provided: {reference_track_path} (placeholder)."
)

    # Save the processed audio to a temporary file
    output_path = "processed_output.mp3"
    processed_audio.export(output_path, format="mp3")

    return output_path, summary

# Define Gradio Interface (re-defining for completeness with the updated fn)
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering") # Renamed to avoid conflict with function
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune") # Renamed
effects_checkbox = gr.Checkbox(label="Effects") # Renamed
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat") # Renamed


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox
    ],
    outputs=["audio", "text"], # Output now includes audio
    title="Voice and Beat Mixer",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with placeholder function integration.")

**Reasoning**:
The previous code failed due to an `IndentationError` caused by unclosed f-strings and newlines within string literals in the `process_audio` function. I need to correct these malformed f-strings and ensure proper line continuation for the string concatenation and print statements.



In [None]:
import gradio as gr
from pydub import AudioSegment

# Placeholder audio processing functions (re-defining them here for completeness in the new cell)
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    # In a real scenario, this would apply advanced audio processing
    return audio_segment

def auto_tune(audio_segment):
    print("Applying auto-tune (placeholder).")
    # In a real scenario, this would apply auto-tune to the voice
    return audio_segment

def apply_effects(audio_segment):
    print("Applying effects (placeholder).")
    # In a real scenario, this would apply various audio effects
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    # For now, return the voice segment as a placeholder for the combined output
    # In a real scenario, this would mix and balance the two tracks
    return voice_segment # or a simple overlay: voice_segment.overlay(beat_segment)

# Updated process_audio function to integrate placeholders
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt):

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}"

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}"

    processed_audio = voice_audio.overlay(beat_audio)
    summary = "Summary of options selected:"

    if check_mixed:
        summary += " Check Mixed/Mastered: Yes,"
        # Placeholder for actual check logic
        print("Checking if song is already mixed/mastered (placeholder).")

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += " Auto-Tune: Yes,"
        processed_audio = auto_tune(processed_audio)

    if effects_opt:
        summary += " Effects: Yes,"
        processed_audio = apply_effects(processed_audio)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        # The balance_voice_beat function needs to return a single segment
        # For now, it will return the voice segment, effectively ignoring the beat for this placeholder.
        # A more complex placeholder would involve actual mixing.
        balanced_segment = balance_voice_beat(voice_audio, beat_audio)
        # For this simple placeholder, we'll just use the already processed_audio
        # but in a real scenario, `balance_voice_beat` would return the mixed result.
        # For now, we simulate mixing by just continuing with `processed_audio` or replace it
        # with the output of `balance_voice_beat` if it was designed to return a combined audio.
        # Let's assume `balance_voice_beat` would return a single AudioSegment here for simplicity.
        # A proper implementation would need `pydub` to combine them.
        processed_audio = balanced_segment # Simplified: assuming balanced_segment is the final mixed audio

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, " # Fixed newline character issue
        print(f"Reference track provided: {reference_track_path} (placeholder).") # Fixed newline character issue

    # Save the processed audio to a temporary file
    output_path = "processed_output.mp3"
    processed_audio.export(output_path, format="mp3")

    return output_path, summary

# Define Gradio Interface (re-defining for completeness with the updated fn)
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
best_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering") # Renamed to avoid conflict with function
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune") # Renamed
effects_checkbox = gr.Checkbox(label="Effects") # Renamed
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat") # Renamed


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        best_track_input,
        reference_track_input,
        check_mixed_mastered,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox
    ],
    outputs=["audio", "text"], # Output now includes audio
    title="Voice and Beat Mixer",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with placeholder function integration.")

## Implement 'Check Mix/Master' Placeholder

### Subtask:
Create a placeholder function `check_if_mixed_mastered`. This function will demonstrate how such a check could be integrated but will initially return a mock result, as real-world implementation is very challenging.


**Reasoning**:
I need to define the `check_if_mixed_mastered` placeholder function and integrate it into the `process_audio` function, updating the summary based on its mock result. I will re-declare all necessary functions and the Gradio interface in a single code block to ensure the new function is correctly integrated and the interface is properly updated.



In [None]:
import gradio as gr
from pydub import AudioSegment

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment):
    print("Applying auto-tune (placeholder).")
    return audio_segment

def apply_effects(audio_segment):
    print("Applying effects (placeholder).")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    # In a real scenario, this would involve complex audio analysis
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt):

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}"

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}"

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += " Auto-Tune: Yes,"
        processed_audio = auto_tune(processed_audio)

    if effects_opt:
        summary += " Effects: Yes,"
        processed_audio = apply_effects(processed_audio)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        # Assume balance_voice_beat returns a combined segment
        processed_audio = balance_voice_beat(voice_audio, beat_audio) # Re-process with original tracks for balancing

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_path = "processed_output.mp3"
    processed_audio.export(output_path, format="mp3")

    return output_path, summary

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox
    ],
    outputs=["audio", "text"],
    title="Voice and Beat Mixer",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with 'check_if_mixed_mastered' placeholder function.")

## Add Basic Audio Visualization

### Subtask:
After any audio processing (even placeholder), generate and display a simple waveform plot or spectrogram of the output audio using libraries like `librosa` and `matplotlib` to visualize the result. Ensure appropriate legends are added.


**Reasoning**:
I need to import the `matplotlib.pyplot` and `librosa` libraries, which are essential for generating and displaying the audio visualization. These imports should be done before defining or modifying the `process_audio` function.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np # Often useful with librosa for array manipulations

print("Libraries for audio visualization imported: librosa, matplotlib.pyplot, numpy.")

**Reasoning**:
I need to update the `process_audio` function to generate an audio visualization using `librosa` and `matplotlib.pyplot` after processing the audio, save it as an image, and modify the `gr.Interface` to output this image along with the audio and text summary. I will re-declare all necessary functions and the Gradio interface in a single code block to ensure the new functionality is correctly integrated and the interface is properly updated.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np # Often useful with librosa for array manipulations

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment):
    print("Applying auto-tune (placeholder).")
    return audio_segment

def apply_effects(audio_segment):
    print("Applying effects (placeholder).")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    # For this placeholder, let's just return an overlay of the two segments
    # In a real scenario, this would involve more complex mixing
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    # In a real scenario, this would involve complex audio analysis
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders and visualization
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt):

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += " Auto-Tune: Yes,"
        processed_audio = auto_tune(processed_audio)

    if effects_opt:
        summary += " Effects: Yes,"
        processed_audio = apply_effects(processed_audio)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        # Assume balance_voice_beat returns a combined segment
        # Re-process with original tracks for balancing if needed, or with current processed_audio
        processed_audio = balance_voice_beat(voice_audio, beat_audio) # Using original tracks for clarity

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    output_image_path = "waveform.png"
    plt.savefig(output_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # -- End Audio Visualization ---

    return output_audio_path, summary, output_image_path

# Define Gradio Interface (re-defining for completeness with the updated fn)
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox
    ],
    outputs=["audio", "text", "image"], # Output now includes audio, text, and image
    title="Alayo Music Workflow",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with audio visualization and all placeholders.")
# To launch the interface in a local environment, you would typically add:
# iface.launch()

## Launch Gradio Application

### Subtask:
Run the Gradio application to open a public URL, allowing users to interact with the interface and test the basic functionality.


**Reasoning**:
I need to launch the previously defined Gradio interface using `iface.launch()` to make it accessible via a public URL, as per the instructions.



In [None]:
iface.launch()

## Final Task

### Subtask:
Summarize the initial Gradio application created, highlighting the implemented interface and the placeholders for advanced audio processing features. Provide guidance on how to further develop these complex functionalities.


## Summary:

### Q&A
The initial Gradio application creates a user-friendly interface for audio processing. It features three audio input components: "Voice Track", "Beat Track", and an optional "Reference Track". Users can select various processing options via checkboxes, including "Check if song is already mixed/mastered", "Auto Mix and Mastering", "Auto-Tune", "Effects", and "Balance Voice and Beat". The application outputs the processed audio, a summary of selected options, and a basic waveform visualization.

The advanced audio processing features are currently implemented as placeholder functions. These functions include:
*   `auto_mix_mastering`: Prints a message indicating placeholder execution and returns the input audio segment.
*   `auto_tune`: Prints a message indicating placeholder execution and returns the input audio segment.
*   `apply_effects`: Prints a message indicating placeholder execution and returns the input audio segment.
*   `balance_voice_beat`: Prints a message indicating placeholder execution and overlays the voice and beat segments.
*   `check_if_mixed_mastered`: Prints a message and returns the string "Not mixed/mastered (placeholder)".

To further develop these complex functionalities, each placeholder function needs to be replaced with actual audio processing logic. This involves leveraging `pydub` for basic manipulations, `librosa` for audio analysis (e.g., pitch detection for auto-tune, loudness analysis for mixing/mastering), and potentially other specialized audio libraries or deep learning models for advanced tasks like source separation, dynamic range compression, or creative effects. The `check_if_mixed_mastered` function would require sophisticated machine learning models trained on large datasets of mixed/mastered versus unmixed/unmastered tracks.

### Data Analysis Key Findings
*   **Environment Setup**: FFmpeg, `gradio`, `pydub`, and `librosa` were confirmed to be already installed and up-to-date in the environment, indicating a ready-to-use audio processing setup.
*   **Gradio Interface Design**: A Gradio interface was successfully created with dedicated audio input fields for voice, beat, and an optional reference track. It also includes checkboxes for five distinct processing options: "Check if song is already mixed/mastered", "Auto Mix and Mastering", "Auto-Tune", "Effects", and "Balance Voice and Beat".
*   **Placeholder Functions Implemented**: Core audio processing functions (`auto_mix_mastering`, `auto_tune`, `apply_effects`, `balance_voice_beat`, `check_if_mixed_mastered`) were defined as placeholders, printing their intended actions and returning the audio (or a mock status).
*   **Audio Processing Workflow**: The main `process_audio` function integrates these placeholders, loading audio via `pydub`, applying selected operations conditionally, and saving the output. Initial issues with `TypeError` due to invalid `optional` parameter in `gr.Audio` and `IndentationError` in string formatting were resolved during development.
*   **Basic Audio Visualization**: The application now generates a basic waveform plot of the processed audio using `librosa.display.waveshow` and `matplotlib`, saving it as a PNG image and displaying it within the Gradio interface alongside the processed audio and summary text.
*   **Application Launch**: The Gradio application was successfully launched, providing a temporary public URL (`https://df67f503f30cc6fdf3.gradio.live`) for user interaction and testing.

### Insights or Next Steps
*   **Modularize and Enhance Placeholders**: Replace current placeholder functions with actual audio processing algorithms using `pydub` and `librosa` to implement real auto-tuning, mixing, mastering, and effects. Consider using pre-trained models for advanced features like genre classification for 'check if song is already mixed/mastered'.
*   **User Feedback and Advanced Controls**: Incorporate sliders and numerical inputs for effect parameters (e.g., auto-tune intensity, reverb decay) to allow users more granular control over the processing, providing a more professional and customizable experience.


# Task
Introduce Gradio parameter sliders for 'auto_tune' and 'apply_effects' to control their intensity, modify the placeholder functions to accept these parameters, and update the main `process_audio` function to pass the slider values to these functions.

## Introduce Parameter Sliders for Effects and Auto-Tune

### Subtask:
Modify the `auto_tune` and `apply_effects` placeholder functions to accept parameters (e.g., intensity). Add `gr.Slider` components to the Gradio interface for these parameters. Update the main `process_audio` function to read these slider values and pass them to the respective placeholder functions, making them more interactive.


**Reasoning**:
I need to modify the `auto_tune` and `apply_effects` functions to accept an intensity parameter, create `gr.Slider` components for these parameters, update the `process_audio` function signature to include these new parameters, pass the slider values to the functions, and finally, update the `gr.Interface` to include the new slider components as per the instructions. This requires modifying the existing code in a single block.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders and visualization
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity): # Added intensity parameters

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    output_image_path = "waveform.png"
    plt.savefig(output_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # -- End Audio Visualization ---

    return output_audio_path, summary, output_image_path

# Define Gradio Interface (re-defining for completeness with the updated fn)
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

# New slider components
auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, default=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, default=0.5, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider, # Added slider
        effects_intensity_slider    # Added slider
    ],
    outputs=["audio", "text", "image"],
    title="Alayo Music Workflow",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with intensity sliders for auto-tune and effects, and corresponding function modifications.")

**Reasoning**:
The previous code failed because `gr.Slider` does not accept a `default` keyword argument. I need to remove the `default` parameter from the `gr.Slider` components to fix this `TypeError`.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders and visualization
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity):

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    output_image_path = "waveform.png"
    plt.savefig(output_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # -- End Audio Visualization ---

    return output_audio_path, summary, output_image_path

# Define Gradio Interface (re-defining for completeness with the updated fn)
voice_track_input = gr.Audio(type="filepath", label="Voice Track")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

# New slider components - removed 'default' keyword argument
auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider
    ],
    outputs=["audio", "text", "image"],
    title="Alayo Music Workflow",
    description="Upload your voice, beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with intensity sliders for auto-tune and effects, and corresponding function modifications.")

## Add Live Audio Input Option

### Subtask:
Integrate a new `gr.Audio` component specifically configured for microphone input. Update the `process_audio` function to check if live input is provided and handle it as the voice track, offering more flexibility to the user.


**Reasoning**:
I need to add a new `gr.Audio` component for microphone input, update the `process_audio` function to accept this new input and prioritize it as the voice track, and then update the `gr.Interface` to include this new component. This will be done in a single code block to ensure all dependencies and modifications are correctly integrated.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, and live input
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path): # Added live_voice_input_path

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    output_image_path = "waveform.png"
    plt.savefig(output_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # -- End Audio Visualization ---

    return output_audio_path, summary, output_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)", optional=True)
live_voice_input = gr.Audio(source="microphone", type="filepath", label="Live Voice Input (Microphone)", optional=True) # New microphone input
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity") # Added value for default
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")   # Added value for default


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input # Added new microphone input
    ],
    outputs=["audio", "text", "image"],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with live microphone input and default values for sliders.")

**Reasoning**:
The previous code failed because `gr.Audio` does not accept the `optional` keyword argument. I need to remove `optional=True` from the `gr.Audio` component definitions for `voice_track_input` and `live_voice_input`.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, and live input
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path): # Added live_voice_input_path

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    output_image_path = "waveform.png"
    plt.savefig(output_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # -- End Audio Visualization ---

    return output_audio_path, summary, output_image_path

# Define Gradio Interface
# Removed 'optional=True' as it's not a valid argument for gr.Audio
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
live_voice_input = gr.Audio(source="microphone", type="filepath", label="Live Voice Input (Microphone)") # New microphone input
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input # Added new microphone input
    ],
    outputs=["audio", "text", "image"],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with live microphone input and default values for sliders, and 'optional' removed from gr.Audio.")

**Reasoning**:
The previous code failed because `gr.Audio` produced a `TypeError: Audio.__init__() got an unexpected keyword argument 'source'` when `source="microphone"` was combined with `type="filepath"`. While `source` is a valid parameter for `gr.Audio`, specifying `type="filepath"` explicitly might cause a conflict or redundancy. I will remove `type="filepath"` from the `live_voice_input` component, as `gr.Audio(source="microphone")` typically defaults to returning a file path for the recorded audio.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, and live input
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path): # Added live_voice_input_path

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    output_image_path = "waveform.png"
    plt.savefig(output_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # -- End Audio Visualization ---

    return output_audio_path, summary, output_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
# Removed 'type="filepath"' from live_voice_input as it conflicts with 'source="microphone"'
live_voice_input = gr.Audio(source="microphone", label="Live Voice Input (Microphone)") # New microphone input
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input # Added new microphone input
    ],
    outputs=["audio", "text", "image"],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with live microphone input and default values for sliders, and 'type' parameter adjusted for microphone input.")


**Reasoning**:
The previous code failed because `gr.Audio` produced a `TypeError: Audio.__init__() got an unexpected keyword argument 'source'`. This indicates that the `source` parameter, which is standard for microphone input in Gradio, is not recognized in this specific environment. To resolve this `TypeError` and allow the code to execute, I must remove the `source='microphone'` argument from the `live_voice_input` component. This will temporarily prevent live microphone input, but the component will still function as a file upload for a 'live voice' track, allowing the rest of the application to proceed.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, and live input
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path): # Added live_voice_input_path

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    output_image_path = "waveform.png"
    plt.savefig(output_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # -- End Audio Visualization ---

    return output_audio_path, summary, output_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
# Removed 'source="microphone"' due to TypeError in this environment. It will now function as a file upload.
live_voice_input = gr.Audio(label="Live Voice Input (Microphone)")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input # Added new microphone input
    ],
    outputs=["audio", "text", "image"],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with 'Live Voice Input' as a file upload component due to 'source' keyword error.")

## Implement Advanced Audio Visualizations (Spectrogram)

### Subtask:
Extend the `process_audio` function to generate a spectrogram of the processed audio using `librosa`. Save this visualization as a new image file and add an additional `gr.Image` output component to the Gradio interface to display it.


**Reasoning**:
I need to modify the `process_audio` function to generate a spectrogram using `librosa` and `matplotlib.pyplot`, save it to a file, and then update the `gr.Interface` outputs to include this new image. This requires rewriting the entire code block containing the `process_audio` function and `gr.Interface` definition.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, and live input
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path):

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None, None

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None, None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None, None

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    # --- Audio Visualization (Waveform) ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    waveform_image_path = "waveform.png"
    plt.savefig(waveform_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # --- End Audio Visualization (Waveform) ---

    # --- Audio Visualization (Spectrogram) ---
    plt.figure(figsize=(10, 4))
    D = librosa.amplitude_to_db(librosa.stft(y), ref=np.max)
    librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='mel')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-Spectrogram')
    plt.tight_layout()
    spectrogram_image_path = "spectrogram.png"
    plt.savefig(spectrogram_image_path)
    plt.close() # Close the plot
    # --- End Audio Visualization (Spectrogram) ---

    return output_audio_path, summary, waveform_image_path, spectrogram_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
# Removed 'source="microphone"' due to TypeError in this environment. It will now function as a file upload.
live_voice_input = gr.Audio(label="Live Voice Input (Microphone)")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input
    ],
    outputs=["audio", "text", gr.Image(label="Waveform"), gr.Image(label="Spectrogram")], # Added spectrogram output
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with spectrogram visualization.")

## Summary:

### Q&A
The initial Gradio application creates a user-friendly interface for audio processing. It features three audio input components: "Voice Track (File Upload)", "Live Voice Input (Microphone)" (currently functioning as a file upload due to environment limitations), "Beat Track", and an optional "Reference Track". Users can select various processing options via checkboxes, including "Check if song is already mixed/mastered", "Auto Mix and Mastering", "Auto-Tune", "Effects", and "Balance Voice and Beat". Additionally, there are slider controls for "Auto-Tune Intensity" and "Effects Intensity". The application outputs the processed audio, a summary of selected options, a basic waveform visualization, and a spectrogram visualization.

The advanced audio processing features are currently implemented as placeholder functions. These functions include:
*   `auto_mix_mastering`: Prints a message indicating placeholder execution and returns the input audio segment.
*   `auto_tune`: Prints a message indicating placeholder execution with the given intensity and returns the input audio segment.
*   `apply_effects`: Prints a message indicating placeholder execution with the given intensity and returns the input audio segment.
*   `balance_voice_beat`: Prints a message indicating placeholder execution and overlays the voice and beat segments.
*   `check_if_mixed_mastered`: Prints a message and returns the string "Not mixed/mastered (placeholder)".

To further develop these complex functionalities, each placeholder function needs to be replaced with actual audio processing logic. This involves leveraging `pydub` for basic manipulations, `librosa` for audio analysis (e.g., pitch detection for auto-tune, loudness analysis for mixing/mastering), and potentially other specialized audio libraries or deep learning models for advanced tasks like source separation, dynamic range compression, or creative effects. The `check_if_mixed_mastered` function would require sophisticated machine learning models trained on large datasets of mixed/mastered versus unmixed/unmastered tracks.

### Data Analysis Key Findings
*   **Environment Setup**: FFmpeg, `gradio`, `pydub`, and `librosa` were confirmed to be already installed and up-to-date in the environment, indicating a ready-to-use audio processing setup.
*   **Gradio Interface Design**: A Gradio interface was successfully created with dedicated audio input fields for voice (file upload and a placeholder for live microphone input), beat, and an optional reference track. It also includes checkboxes for five distinct processing options: "Check if song is already mixed/mastered", "Auto Mix and Mastering", "Auto-Tune", "Effects", and "Balance Voice and Beat". `gr.Slider` components were successfully added for 'Auto-Tune Intensity' and 'Effects Intensity'.
*   **Placeholder Functions Implemented**: Core audio processing functions (`auto_mix_mastering`, `auto_tune`, `apply_effects`, `balance_voice_beat`, `check_if_mixed_mastered`) were defined as placeholders, printing their intended actions and returning the audio (or a mock status). These functions were updated to accept intensity parameters from the sliders.
*   **Audio Processing Workflow**: The main `process_audio` function integrates these placeholders, loading audio via `pydub`, applying selected operations conditionally, and saving the output. Initial issues with `TypeError` due to invalid `optional` parameter in `gr.Audio`, `IndentationError` in string formatting, and `default` parameter in `gr.Slider`, and `source='microphone'` in `gr.Audio` were resolved during development, leading to a robust initial interface.
*   **Basic Audio Visualization**: The application now generates a basic waveform plot and a spectrogram of the processed audio using `librosa.display.waveshow` and `librosa.display.specshow` respectively, both powered by `matplotlib`. These are saved as PNG images and displayed within the Gradio interface alongside the processed audio and summary text.
*   **Application Launch**: The Gradio application was successfully launched, providing a temporary public URL for user interaction and testing.

### Insights or Next Steps
*   **Modularize and Enhance Placeholders**: Replace current placeholder functions with actual audio processing algorithms using `pydub` and `librosa` to implement real auto-tuning, mixing, mastering, and effects. Consider using pre-trained models for advanced features like genre classification for 'check if song is already mixed/mastered'.
*   **User Feedback and Advanced Controls**: Incorporate sliders and numerical inputs for effect parameters (e.g., auto-tune intensity, reverb decay) to allow users more granular control over the processing, providing a more professional and customizable experience.
*   **Resolve Live Microphone Input**: Investigate and resolve the `TypeError` related to `source='microphone'` in `gr.Audio` to enable actual live voice input functionality as originally intended.

## Summary:

### Q&A
The initial Gradio application creates a user-friendly interface for audio processing. It features three audio input components: "Voice Track (File Upload)", "Live Voice Input (Microphone)" (currently functioning as a file upload due to environment limitations), "Beat Track", and an optional "Reference Track". Users can select various processing options via checkboxes, including "Check if song is already mixed/mastered", "Auto Mix and Mastering", "Auto-Tune", "Effects", and "Balance Voice and Beat". Additionally, there are slider controls for "Auto-Tune Intensity" and "Effects Intensity". The application outputs the processed audio, a summary of selected options, a basic waveform visualization, and a spectrogram visualization.

The advanced audio processing features are currently implemented as placeholder functions. These functions include:
*   `auto_mix_mastering`: Prints a message indicating placeholder execution and returns the input audio segment.
*   `auto_tune`: Prints a message indicating placeholder execution with the given intensity and returns the input audio segment.
*   `apply_effects`: Prints a message indicating placeholder execution with the given intensity and returns the input audio segment.
*   `balance_voice_beat`: Prints a message indicating placeholder execution and overlays the voice and beat segments.
*   `check_if_mixed_mastered`: Prints a message and returns the string "Not mixed/mastered (placeholder)".

To further develop these complex functionalities, each placeholder function needs to be replaced with actual audio processing logic. This involves leveraging `pydub` for basic manipulations, `librosa` for audio analysis (e.g., pitch detection for auto-tune, loudness analysis for mixing/mastering), and potentially other specialized audio libraries or deep learning models for advanced tasks like source separation, dynamic range compression, or creative effects. The `check_if_mixed_mastered` function would require sophisticated machine learning models trained on large datasets of mixed/mastered versus unmixed/unmastered tracks.

### Data Analysis Key Findings
*   **Environment Setup**: FFmpeg, `gradio`, `pydub`, and `librosa` were confirmed to be already installed and up-to-date in the environment, indicating a ready-to-use audio processing setup.
*   **Gradio Interface Design**: A Gradio interface was successfully created with dedicated audio input fields for voice (file upload and a placeholder for live microphone input), beat, and an optional reference track. It also includes checkboxes for five distinct processing options: "Check if song is already mixed/mastered", "Auto Mix and Mastering", "Auto-Tune", "Effects", and "Balance Voice and Beat". `gr.Slider` components were successfully added for 'Auto-Tune Intensity' and 'Effects Intensity'.
*   **Placeholder Functions Implemented**: Core audio processing functions (`auto_mix_mastering`, `auto_tune`, `apply_effects`, `balance_voice_beat`, `check_if_mixed_mastered`) were defined as placeholders, printing their intended actions and returning the audio (or a mock status). These functions were updated to accept intensity parameters from the sliders.
*   **Audio Processing Workflow**: The main `process_audio` function integrates these placeholders, loading audio via `pydub`, applying selected operations conditionally, and saving the output. Initial issues with `TypeError` due to invalid `optional` parameter in `gr.Audio`, `IndentationError` in string formatting, and `default` parameter in `gr.Slider`, and `source='microphone'` in `gr.Audio` were resolved during development, leading to a robust initial interface.
*   **Basic Audio Visualization**: The application now generates a basic waveform plot and a spectrogram of the processed audio using `librosa.display.waveshow` and `librosa.display.specshow` respectively, both powered by `matplotlib`. These are saved as PNG images and displayed within the Gradio interface alongside the processed audio and summary text.
*   **Application Launch**: The Gradio application was successfully launched, providing a temporary public URL for user interaction and testing.

### Insights or Next Steps
*   **Modularize and Enhance Placeholders**: Replace current placeholder functions with actual audio processing algorithms using `pydub` and `librosa` to implement real auto-tuning, mixing, mastering, and effects. Consider using pre-trained models for advanced features like genre classification for 'check if song is already mixed/mastered'.
*   **User Feedback and Advanced Controls**: Incorporate sliders and numerical inputs for effect parameters (e.g., auto-tune intensity, reverb decay) to allow users more granular control over the processing, providing a more professional and customizable experience.
*   **Resolve Live Microphone Input**: Investigate and resolve the `TypeError` related to `source='microphone'` in `gr.Audio` to enable actual live voice input functionality as originally intended.


## Add Progress Bar for Processing

### Subtask:
Integrate `gr.Progress` within the `process_audio` function. This will provide real-time feedback to the user during potentially longer audio processing tasks, improving the user experience.


**Reasoning**:
I need to modify the `process_audio` function to accept `gr.Progress()` as an argument and add calls to `progress()` at various stages of the audio processing. I also need to update the `gr.Interface` inputs to pass the `gr.Progress()` object. This will be done by rewriting the entire code block that defines the functions and the Gradio interface to ensure all changes are integrated correctly.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import time # For simulating processing time

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    time.sleep(1) # Simulate work
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    time.sleep(1) # Simulate work
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    time.sleep(0.5) # Simulate work
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, live input, and progress bar
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path, progress=gr.Progress()):

    progress(0.05, desc="Starting audio processing")

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None, None

    progress(0.1, desc="Loading audio files")

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None, None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None, None

    progress(0.2, desc="Preparing audio segments")

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        progress(0.3, desc="Checking mix/master status")
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        progress(0.4, desc="Applying auto mix and mastering")
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        progress(0.5, desc="Applying auto-tune")
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        progress(0.6, desc="Applying effects")
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        progress(0.7, desc="Balancing voice and beat")
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    progress(0.8, desc="Exporting processed audio")
    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    progress(0.9, desc="Generating visualizations")
    # --- Audio Visualization (Waveform) ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    waveform_image_path = "waveform.png"
    plt.savefig(waveform_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # --- End Audio Visualization (Waveform) ---

    # --- Audio Visualization (Spectrogram) ---
    plt.figure(figsize=(10, 4))
    D = librosa.amplitude_to_db(librosa.stft(y), ref=np.max)
    librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='mel')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-Spectrogram')
    plt.tight_layout()
    spectrogram_image_path = "spectrogram.png"
    plt.savefig(spectrogram_image_path)
    plt.close() # Close the plot
    # --- End Audio Visualization (Spectrogram) ---

    progress(1.0, desc="Processing complete")
    return output_audio_path, summary, waveform_image_path, spectrogram_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
live_voice_input = gr.Audio(label="Live Voice Input (Microphone)")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input,
        gr.Progress() # Added gr.Progress() to inputs
    ],
    outputs=["audio", "text", gr.Image(label="Waveform"), gr.Image(label="Spectrogram")],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with progress bar integration.")

**Reasoning**:
The `ValueError` indicates that `gr.Progress()` should not be directly passed as an input component to `gr.Interface`. Gradio automatically handles progress bars if the function signature accepts a `progress=gr.Progress()` argument. Therefore, I need to remove `gr.Progress()` from the `inputs` list in `gr.Interface`.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import time # For simulating processing time

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    time.sleep(1) # Simulate work
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    time.sleep(1) # Simulate work
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    time.sleep(0.5) # Simulate work
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, live input, and progress bar
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path, progress=gr.Progress()):

    progress(0.05, desc="Starting audio processing")

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None, None

    progress(0.1, desc="Loading audio files")

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None, None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None, None

    progress(0.2, desc="Preparing audio segments")

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        progress(0.3, desc="Checking mix/master status")
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        progress(0.4, desc="Applying auto mix and mastering")
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        progress(0.5, desc="Applying auto-tune")
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        progress(0.6, desc="Applying effects")
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        progress(0.7, desc="Balancing voice and beat")
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    progress(0.8, desc="Exporting processed audio")
    # Save the processed audio to a temporary file
    output_audio_path = "processed_output.mp3"
    processed_audio.export(output_audio_path, format="mp3")

    progress(0.9, desc="Generating visualizations")
    # --- Audio Visualization (Waveform) ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    waveform_image_path = "waveform.png"
    plt.savefig(waveform_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # --- End Audio Visualization (Waveform) ---

    # --- Audio Visualization (Spectrogram) ---
    plt.figure(figsize=(10, 4))
    D = librosa.amplitude_to_db(librosa.stft(y), ref=np.max)
    librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='mel')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-Spectrogram')
    plt.tight_layout()
    spectrogram_image_path = "spectrogram.png"
    plt.savefig(spectrogram_image_path)
    plt.close() # Close the plot
    # --- End Audio Visualization (Spectrogram) ---

    progress(1.0, desc="Processing complete")
    return output_audio_path, summary, waveform_image_path, spectrogram_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
live_voice_input = gr.Audio(label="Live Voice Input (Microphone)")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input
        # Removed gr.Progress() from inputs list as it's automatically injected by Gradio
    ],
    outputs=["audio", "text", gr.Image(label="Waveform"), gr.Image(label="Spectrogram")],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with progress bar integration.")

## Allow Selection of Output Audio Format

### Subtask:
Add a `gr.Dropdown` or `gr.Radio` component to the Gradio interface, allowing users to select the desired output audio format (e.g., 'mp3', 'wav', 'flac'). Modify the `process_audio` function to export the processed audio in the chosen format.


**Reasoning**:
I need to add a `gr.Dropdown` component for output format selection, update the `process_audio` function to accept this new parameter, and modify the `export` call to use the selected format. Finally, I will update the `gr.Interface` inputs to include the new dropdown. This requires rewriting the entire code block that defines the functions and the Gradio interface to ensure all changes are integrated correctly.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import time # For simulating processing time

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    time.sleep(1) # Simulate work
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    time.sleep(1) # Simulate work
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    time.sleep(0.5) # Simulate work
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, live input, progress bar, and output format selection
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path, output_format, progress=gr.Progress()):

    progress(0.05, desc="Starting audio processing")

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        return None, "Error: No voice track provided (either upload a file or use microphone).", None, None

    progress(0.1, desc="Loading audio files")

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        return None, f"Error loading voice track: {e}", None, None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        return None, f"Error loading beat track: {e}", None, None

    progress(0.2, desc="Preparing audio segments")

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        progress(0.3, desc="Checking mix/master status")
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        progress(0.4, desc="Applying auto mix and mastering")
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        progress(0.5, desc="Applying auto-tune")
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        progress(0.6, desc="Applying effects")
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        progress(0.7, desc="Balancing voice and beat")
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    if reference_track_path:
        summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
        print(f"Reference track provided: {reference_track_path} (placeholder).")

    progress(0.8, desc="Exporting processed audio")
    # Save the processed audio to a temporary file in the chosen format
    output_audio_path = f"processed_output.{output_format}"
    processed_audio.export(output_audio_path, format=output_format)

    progress(0.9, desc="Generating visualizations")
    # --- Audio Visualization (Waveform) ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    waveform_image_path = "waveform.png"
    plt.savefig(waveform_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # --- End Audio Visualization (Waveform) ---

    # --- Audio Visualization (Spectrogram) ---
    plt.figure(figsize=(10, 4))
    D = librosa.amplitude_to_db(librosa.stft(y), ref=np.max)
    librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='mel')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-Spectrogram')
    plt.tight_layout()
    spectrogram_image_path = "spectrogram.png"
    plt.savefig(spectrogram_image_path)
    plt.close() # Close the plot
    # --- End Audio Visualization (Spectrogram) ---

    progress(1.0, desc="Processing complete")
    return output_audio_path, summary, waveform_image_path, spectrogram_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
live_voice_input = gr.Audio(label="Live Voice Input (Microphone)")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")

output_format_dropdown = gr.Dropdown(choices=["mp3", "wav", "flac"], value="mp3", label="Output Audio Format")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input,
        output_format_dropdown # Added dropdown for output format
    ],
    outputs=["audio", "text", gr.Image(label="Waveform"), gr.Image(label="Spectrogram")],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with output audio format selection.")

## Implement Reference Track Comparison Visualization

### Subtask:
If a reference track is uploaded, generate a comparative visualization. This could be an overlay of the processed audio's waveform with the reference track's waveform, or side-by-side plots for visual comparison. Add a new `gr.Image` output for this comparison.


**Reasoning**:
I need to modify the `process_audio` function to load the reference track, generate a comparative waveform visualization, save it to a temporary file, and update the function's return statement to include this new image path. Concurrently, I will update the `gr.Interface` outputs to display this comparison image. This requires rewriting the entire code block that defines the functions and the Gradio interface to ensure all changes are integrated correctly.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import time # For simulating processing time

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    time.sleep(1) # Simulate work
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    time.sleep(1) # Simulate work
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    time.sleep(0.5) # Simulate work
    return "Not mixed/mastered (placeholder)"

# Updated process_audio function to integrate placeholders, visualization, live input, progress bar, and output format selection
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path, output_format, progress=gr.Progress()):

    progress(0.05, desc="Starting audio processing")

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        # Adjusted for additional output (comparison image)
        return None, "Error: No voice track provided (either upload a file or use microphone).", None, None, None

    progress(0.1, desc="Loading audio files")

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        # Adjusted for additional output (comparison image)
        return None, f"Error loading voice track: {e}", None, None, None

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        # Adjusted for additional output (comparison image)
        return None, f"Error loading beat track: {e}", None, None, None

    progress(0.2, desc="Preparing audio segments")

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        progress(0.3, desc="Checking mix/master status")
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        progress(0.4, desc="Applying auto mix and mastering")
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        progress(0.5, desc="Applying auto-tune")
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        progress(0.6, desc="Applying effects")
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        progress(0.7, desc="Balancing voice and beat")
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    reference_audio_loaded = False
    y_ref = None
    sr_ref = None
    if reference_track_path:
        try:
            y_ref, sr_ref = librosa.load(reference_track_path, sr=None) # Load reference for visualization
            summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
            print(f"Reference track provided: {reference_track_path} (placeholder).")
            reference_audio_loaded = True
        except Exception as e:
            print(f"Error loading reference track: {e}")

    progress(0.8, desc="Exporting processed audio")
    # Save the processed audio to a temporary file in the chosen format
    output_audio_path = f"processed_output.{output_format}"
    processed_audio.export(output_audio_path, format=output_format)

    progress(0.9, desc="Generating visualizations")
    # --- Audio Visualization (Waveform) ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    waveform_image_path = "waveform.png"
    plt.savefig(waveform_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # --- End Audio Visualization (Waveform) ---

    # --- Audio Visualization (Spectrogram) ---
    plt.figure(figsize=(10, 4))
    D = librosa.amplitude_to_db(librosa.stft(y), ref=np.max)
    librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='mel')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-Spectrogram')
    plt.tight_layout()
    spectrogram_image_path = "spectrogram.png"
    plt.savefig(spectrogram_image_path)
    plt.close() # Close the plot
    # --- End Audio Visualization (Spectrogram) ---

    # --- Comparative Visualization (Waveform) ---
    comparison_image_path = None
    if reference_audio_loaded:
        plt.figure(figsize=(12, 5))
        # Plot processed audio
        librosa.display.waveshow(y, sr=sr, alpha=0.7, label='Processed Audio')
        # Plot reference track
        # Resample reference if sample rates differ for consistent plotting
        if sr_ref != sr:
            y_ref_resampled = librosa.resample(y_ref, orig_sr=sr_ref, target_sr=sr)
            librosa.display.waveshow(y_ref_resampled, sr=sr, color='r', alpha=0.5, label='Reference Track')
        else:
            librosa.display.waveshow(y_ref, sr=sr_ref, color='r', alpha=0.5, label='Reference Track')

        plt.title('Processed Audio vs. Reference Track Waveform')
        plt.xlabel('Time (s)')
        plt.ylabel('Amplitude')
        plt.legend()
        plt.tight_layout()
        comparison_image_path = "comparison_waveform.png"
        plt.savefig(comparison_image_path)
        plt.close()
    # --- End Comparative Visualization ---

    progress(1.0, desc="Processing complete")
    return output_audio_path, summary, waveform_image_path, spectrogram_image_path, comparison_image_path

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
live_voice_input = gr.Audio(label="Live Voice Input (Microphone)")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")

output_format_dropdown = gr.Dropdown(choices=["mp3", "wav", "flac"], value="mp3", label="Output Audio Format")


iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input,
        output_format_dropdown
    ],
    outputs=["audio", "text", gr.Image(label="Waveform"), gr.Image(label="Spectrogram"), gr.Image(label="Processed vs. Reference Waveform")], # Added comparative visualization output
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with comparative waveform visualization for reference tracks.")

## Display Loudness Information

### Subtask:
Calculate and display relevant loudness metrics (e.g., LUFS, RMS) for both the input and processed audio tracks using `librosa`. Add `gr.Textbox` components to the interface to show these numerical values, providing professional audio insights.


**Reasoning**:
I need to modify the `process_audio` function to calculate RMS values for the voice track, beat track, and processed audio using `librosa`, convert them to decibels, and return these values. Then, I will update the `gr.Interface` outputs to include new `gr.Textbox` components for displaying these calculated RMS values. This requires rewriting the entire code block for `process_audio` and `iface` definition to ensure all dependencies and modifications are correctly integrated.



In [None]:
import gradio as gr
from pydub import AudioSegment
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import time # For simulating processing time

# Placeholder audio processing functions
def auto_mix_mastering(audio_segment):
    print("Applying auto mix and mastering (placeholder).")
    time.sleep(1) # Simulate work
    return audio_segment

def auto_tune(audio_segment, intensity):
    print(f"Applying auto-tune (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def apply_effects(audio_segment, intensity):
    print(f"Applying effects (placeholder) with intensity: {intensity}.")
    time.sleep(1) # Simulate work
    return audio_segment

def balance_voice_beat(voice_segment, beat_segment):
    print("Balancing voice and beat (placeholder).")
    time.sleep(1) # Simulate work
    return voice_segment.overlay(beat_segment)

def check_if_mixed_mastered(audio_segment):
    print("Performing placeholder check for mixed/mastered status.")
    time.sleep(0.5) # Simulate work
    return "Not mixed/mastered (placeholder)"

# Helper function to get RMS in dB from pydub AudioSegment
def get_rms_db(audio_segment):
    # Convert pydub AudioSegment to numpy array for librosa
    samples = np.array(audio_segment.get_array_of_samples())
    # pydub stores samples as int16, librosa expects float
    if audio_segment.sample_width == 2:
        samples = samples.astype(np.float32) / (2**15)
    elif audio_segment.sample_width == 4:
        samples = samples.astype(np.float32) / (2**31)
    else: # Fallback for other sample widths
        samples = samples.astype(np.float32)

    # If stereo, librosa.feature.rms expects mono or flattens it implicitly
    # For accurate RMS, sum the square of each channel if stereo, then sqrt, or just take one channel
    # For simplicity, convert to mono for RMS calculation if stereo, then calculate.
    if audio_segment.channels == 2:
        samples = samples.reshape((-1, 2)).mean(axis=1) # Simple average of channels

    # Calculate RMS using librosa
    rms = librosa.feature.rms(y=samples, frame_length=audio_segment.frame_rate // 10, hop_length=audio_segment.frame_rate // 100)
    # Convert to dB
    rms_db = librosa.amplitude_to_db(rms, ref=1.0).mean() # Mean RMS over time for a single value
    return f"{rms_db:.2f} dB"

# Updated process_audio function to integrate placeholders, visualization, live input, progress bar, and output format selection
def process_audio(voice_track_path, beat_track_path, reference_track_path,
                  check_mixed, auto_mix, auto_tune_opt, effects_opt, balance_opt,
                  auto_tune_intensity, effects_intensity, live_voice_input_path, output_format, progress=gr.Progress()):

    progress(0.05, desc="Starting audio processing")

    # Prioritize live microphone input if provided
    actual_voice_track_path = live_voice_input_path if live_voice_input_path else voice_track_path

    # Check if a voice track is available (either file upload or live input)
    if not actual_voice_track_path:
        # Adjusted for additional output (comparison image, RMS values)
        return None, "Error: No voice track provided (either upload a file or use microphone).", None, None, None, "N/A", "N/A", "N/A"

    progress(0.1, desc="Loading audio files")

    # Load audio files using pydub
    try:
        voice_audio = AudioSegment.from_file(actual_voice_track_path)
    except Exception as e:
        # Adjusted for additional output (comparison image, RMS values)
        return None, f"Error loading voice track: {e}", None, None, None, "N/A", "N/A", "N/A"

    try:
        beat_audio = AudioSegment.from_file(beat_track_path)
    except Exception as e:
        # Adjusted for additional output (comparison image, RMS values)
        return None, f"Error loading beat track: {e}", None, None, None, "N/A", "N/A", "N/A"

    # Calculate RMS for original tracks
    original_voice_rms = get_rms_db(voice_audio)
    original_beat_rms = get_rms_db(beat_audio)

    progress(0.2, desc="Preparing audio segments")

    # Ensure beat_audio is at the same frame rate as voice_audio for overlay
    if voice_audio.frame_rate != beat_audio.frame_rate:
        beat_audio = beat_audio.set_frame_rate(voice_audio.frame_rate)

    # Ensure both segments are mono or stereo consistently
    if voice_audio.channels != beat_audio.channels:
        # Convert to a common channel format, e.g., stereo
        if voice_audio.channels == 1: voice_audio = voice_audio.set_channels(2)
        if beat_audio.channels == 1: beat_audio = beat_audio.set_channels(2)

    processed_audio = voice_audio.overlay(beat_audio) # Initial combination
    summary = "Summary of options selected:"

    if check_mixed:
        progress(0.3, desc="Checking mix/master status")
        mixed_mastered_status = check_if_mixed_mastered(processed_audio)
        summary += f" Check Mixed/Mastered: {mixed_mastered_status},"

    if auto_mix:
        progress(0.4, desc="Applying auto mix and mastering")
        summary += " Auto Mix/Mastering: Yes,"
        processed_audio = auto_mix_mastering(processed_audio)

    if auto_tune_opt:
        progress(0.5, desc="Applying auto-tune")
        summary += f" Auto-Tune: Yes (Intensity: {auto_tune_intensity}),"
        processed_audio = auto_tune(processed_audio, auto_tune_intensity)

    if effects_opt:
        progress(0.6, desc="Applying effects")
        summary += f" Effects: Yes (Intensity: {effects_intensity}),"
        processed_audio = apply_effects(processed_audio, effects_intensity)

    if balance_opt:
        progress(0.7, desc="Balancing voice and beat")
        summary += " Balance Voice/Beat: Yes,"
        processed_audio = balance_voice_beat(voice_audio, beat_audio)

    # Handle reference track (placeholder)
    reference_audio_loaded = False
    y_ref = None
    sr_ref = None
    if reference_track_path:
        try:
            y_ref, sr_ref = librosa.load(reference_track_path, sr=None) # Load reference for visualization
            summary += f" Reference Track: {reference_track_path.split('/')[-1]}, "
            print(f"Reference track provided: {reference_track_path} (placeholder).")
            reference_audio_loaded = True
        except Exception as e:
            print(f"Error loading reference track: {e}")

    progress(0.8, desc="Exporting processed audio")
    # Save the processed audio to a temporary file in the chosen format
    output_audio_path = f"processed_output.{output_format}"
    processed_audio.export(output_audio_path, format=output_format)

    # Calculate RMS for processed audio
    processed_audio_rms = get_rms_db(processed_audio)

    progress(0.9, desc="Generating visualizations")
    # --- Audio Visualization (Waveform) ---
    y, sr = librosa.load(output_audio_path, sr=None) # Load for visualization, preserve original sr

    plt.figure(figsize=(10, 4))
    librosa.display.waveshow(y, sr=sr, alpha=0.7)
    plt.title("Processed Audio Waveform")
    plt.xlabel("Time (s)")
    plt.ylabel("Amplitude")
    plt.tight_layout()
    waveform_image_path = "waveform.png"
    plt.savefig(waveform_image_path)
    plt.close() # Close the plot to prevent it from displaying automatically
    # --- End Audio Visualization (Waveform) ---

    # --- Audio Visualization (Spectrogram) ---
    plt.figure(figsize=(10, 4))
    D = librosa.amplitude_to_db(librosa.stft(y), ref=np.max)
    librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='mel')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-Spectrogram')
    plt.tight_layout()
    spectrogram_image_path = "spectrogram.png"
    plt.savefig(spectrogram_image_path)
    plt.close() # Close the plot
    # --- End Audio Visualization (Spectrogram) ---

    # --- Comparative Visualization (Waveform) ---
    comparison_image_path = None
    if reference_audio_loaded:
        plt.figure(figsize=(12, 5))
        # Plot processed audio
        librosa.display.waveshow(y, sr=sr, alpha=0.7, label='Processed Audio')
        # Plot reference track
        # Resample reference if sample rates differ for consistent plotting
        if sr_ref != sr:
            y_ref_resampled = librosa.resample(y_ref, orig_sr=sr_ref, target_sr=sr)
            librosa.display.waveshow(y_ref_resampled, sr=sr, color='r', alpha=0.5, label='Reference Track')
        else:
            librosa.display.waveshow(y_ref, sr=sr_ref, color='r', alpha=0.5, label='Reference Track')

        plt.title('Processed Audio vs. Reference Track Waveform')
        plt.xlabel('Time (s)')
        plt.ylabel('Amplitude')
        plt.legend()
        plt.tight_layout()
        comparison_image_path = "comparison_waveform.png"
        plt.savefig(comparison_image_path)
        plt.close()
    # --- End Comparative Visualization ---

    progress(1.0, desc="Processing complete")
    return output_audio_path, summary, waveform_image_path, spectrogram_image_path, \
           comparison_image_path, original_voice_rms, original_beat_rms, processed_audio_rms

# Define Gradio Interface
voice_track_input = gr.Audio(type="filepath", label="Voice Track (File Upload)")
live_voice_input = gr.Audio(label="Live Voice Input (Microphone)")
beat_track_input = gr.Audio(type="filepath", label="Beat Track")
reference_track_input = gr.Audio(type="filepath", label="Reference Track (Optional)")

check_mixed_mastered_checkbox = gr.Checkbox(label="Check if song is already mixed/mastered")
auto_mix_mastering_checkbox = gr.Checkbox(label="Auto Mix and Mastering")
auto_tune_checkbox = gr.Checkbox(label="Auto-Tune")
effects_checkbox = gr.Checkbox(label="Effects")
balance_voice_beat_checkbox = gr.Checkbox(label="Balance Voice and Beat")

auto_tune_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Auto-Tune Intensity")
effects_intensity_slider = gr.Slider(minimum=0.0, maximum=1.0, step=0.1, value=0.5, label="Effects Intensity")

output_format_dropdown = gr.Dropdown(choices=["mp3", "wav", "flac"], value="mp3", label="Output Audio Format")

# New Textbox components for RMS values
voice_rms_output = gr.Textbox(label="Voice Track RMS (dB)")
beat_rms_output = gr.Textbox(label="Beat Track RMS (dB)")
processed_rms_output = gr.Textbox(label="Processed Audio RMS (dB)")

iface = gr.Interface(
    fn=process_audio,
    inputs=[
        voice_track_input,
        beat_track_input,
        reference_track_input,
        check_mixed_mastered_checkbox,
        auto_mix_mastering_checkbox,
        auto_tune_checkbox,
        effects_checkbox,
        balance_voice_beat_checkbox,
        auto_tune_intensity_slider,
        effects_intensity_slider,
        live_voice_input,
        output_format_dropdown
    ],
    outputs=[
        "audio",
        "text",
        gr.Image(label="Waveform"),
        gr.Image(label="Spectrogram"),
        gr.Image(label="Processed vs. Reference Waveform"),
        voice_rms_output,
        beat_rms_output,
        processed_rms_output
    ],
    title="Alayo Music Workflow",
    description="Upload your voice (file or live), beat, and optional reference track, then select processing options."
)

print("Gradio interface updated with RMS loudness information.")

## Final Task

### Subtask:
Provide a comprehensive summary of all the newly implemented dashboard features, including their functionalities and how they enhance the user experience, along with any final insights.


## Summary:

### Q&A
The newly implemented dashboard features and their functionalities are designed to enhance the user experience by offering more control, better feedback, and deeper insights into the audio processing workflow.

1.  **Gradio Parameter Sliders for 'Auto-Tune' and 'Effects' Intensity**: Users can now adjust the intensity of the "Auto-Tune" and "Effects" placeholder functions using `gr.Slider` components with a range from 0.0 to 1.0, stepping by 0.1. This provides more granular control over these processing options.
2.  **Live Audio Input Option**: A `gr.Audio` component labeled "Live Voice Input (Microphone)" was introduced. While intended for direct microphone input to offer greater flexibility, due to environmental limitations (a `TypeError` with `source='microphone'`), it currently functions as an additional file upload input. The `process_audio` function prioritizes this input if provided.
3.  **Advanced Audio Visualizations (Spectrogram)**: Beyond the basic waveform, the application now generates a Mel-Spectrogram of the processed audio using `librosa`. This visualization, displayed via a new `gr.Image` output, provides detailed insights into the frequency content of the audio over time.
4.  **Progress Bar for Processing**: Integrated `gr.Progress` into the `process_audio` function, offering real-time progress updates (e.g., "Loading audio files", "Applying auto-tune", "Generating visualizations"). This significantly improves user experience by providing transparency during potentially lengthy processing tasks.
5.  **Output Audio Format Selection**: A `gr.Dropdown` component allows users to select their desired output audio format from "mp3", "wav", or "flac", with "mp3" as the default. This functionality provides users with control over the final file type, catering to different quality and compatibility needs.
6.  **Reference Track Comparison Visualization**: When an optional reference track is uploaded, the dashboard now generates a comparative waveform visualization. This new `gr.Image` output overlays the processed audio's waveform with the reference track's waveform, allowing for visual assessment of processing effects against a benchmark.
7.  **Loudness Information Display**: The dashboard calculates and displays the Root Mean Square (RMS) loudness in decibels (dB) for the original voice track, original beat track, and the final processed audio. These numerical values are presented in dedicated `gr.Textbox` components, offering professional-grade insights into the audio's dynamic characteristics.

These features collectively enhance the user experience by providing greater interactivity, more informative outputs, clearer process feedback, and more control over the processing and output formats.

### Data Analysis Key Findings
*   **Gradio Interface Features**:
    *   Successfully implemented `gr.Slider` components for 'Auto-Tune Intensity' and 'Effects Intensity', allowing parameter adjustment from 0.0 to 1.0 with a step of 0.1.
    *   Introduced a `gr.Audio` component for 'Live Voice Input (Microphone)'; however, due to a `TypeError` related to `source='microphone'`, it functions as an additional file upload.
    *   Added `gr.Image` outputs for a detailed **Mel-Spectrogram** of the processed audio and a **Comparative Waveform Visualization** comparing processed audio with an optional reference track.
    *   Integrated `gr.Progress` to provide real-time updates from 0% to 100% during processing steps.
    *   Added a `gr.Dropdown` for "Output Audio Format" with choices: "mp3", "wav", "flac".
    *   Included three `gr.Textbox` components to display RMS loudness in dB for the "Voice Track", "Beat Track", and "Processed Audio".
*   **Audio Processing Enhancements**:
    *   The `process_audio` function now accepts intensity parameters for auto-tune and effects, dynamically exports audio in the user-selected format, and calculates RMS values for input and output tracks.
    *   The `process_audio` function also handles the generation and saving of waveform, spectrogram, and comparative waveform images.
*   **Resolved Technical Issues**:
    *   `TypeError: Slider.__init__() got an unexpected keyword argument 'default'` was resolved by removing the `default` parameter from `gr.Slider`.
    *   `TypeError: Audio.__init__() got an unexpected keyword argument 'source'` prevented true microphone input, leading to the `gr.Audio` component defaulting to file upload.
    *   `ValueError: Component must be provided as a `str` or `dict` or `Component` but is <gradio.helpers.Progress object at ...>` was resolved by removing `gr.Progress()` from the `inputs` list in `gr.Interface`, allowing Gradio to automatically inject it.
*   **Overall Functionality**: The application successfully creates a robust Gradio interface that integrates audio input, user-selectable processing options, advanced visualizations, progress feedback, output format control, and professional audio metrics.

### Insights or Next Steps
*   **Enhance Placeholder Functions**: Replace the current placeholder audio processing functions (`auto_mix_mastering`, `auto_tune`, `apply_effects`, `balance_voice_beat`, `check_if_mixed_mastered`) with actual implementations using `pydub`, `librosa`, or specialized audio processing libraries to provide real-world functionality.
*   **Investigate and Resolve Live Microphone Input**: Actively debug or update Gradio to properly enable `source='microphone'` functionality, as this was a key intended feature that could not be fully implemented due to version or environment incompatibility.
