<a href="https://colab.research.google.com/github/MK316/workspace/blob/main/SR01/SR_manipulation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2nd trial

In [None]:
!pip install gtts pydub
!apt install ffmpeg

## Steps to take to modify speech rate

```
def generate_and_play(text, rate=1.0):
    # Convert text to audio
    tts = gTTS(text=text, lang='en')
    filename = "temp_audio.mp3"
    tts.save(filename)
```

- This part of the code uses the gTTS (Google Text-to-Speech) library. It converts the provided text into speech. The speech is saved as an MP3 file named "temp_audio.mp3".

- Here, we're using pydub to load the generated MP3 file into an AudioSegment object. This object allows us to manipulate various properties of the audio, including the frame rate, which we'll use to change the speech rate.

- This part is crucial for adjusting the speech rate:

> audio._spawn(audio.raw_data, overrides={...}): This is a somewhat low-level method in pydub. It creates a new AudioSegment instance (or "spawns" a new segment) using the raw audio data from the original audio object but with overridden properties as specified in the overrides dictionary.

> "frame_rate": int(audio.frame_rate * rate): This is where the speech rate manipulation happens. The frame rate of the new audio segment is set to be the frame rate of the original audio multiplied by the desired rate. For instance, if the original frame rate is 44,100 Hz (common for many audio files) and rate=0.5, then the new frame rate will be 22,050 Hz. This effectively slows down the audio by half.

> .set_frame_rate(audio.frame_rate): After changing the frame rate to manipulate the speech rate, this method sets the frame rate of the modified audio back to the original frame rate. This ensures that the audio plays at the correct pitch.

- After modifying the audio's frame rate (and thus its speech rate), the code exports the modified AudioSegment object to a new MP3 file named "modified_audio.mp3".

-     return IPAudio(modified_filename)


In [None]:
from gtts import gTTS
from pydub import AudioSegment
from pydub.playback import play
from IPython.display import display, Audio as IPAudio

def generate_and_play(text, rate=1.0):
    # Convert text to audio
    tts = gTTS(text=text, lang='en')
    filename = "temp_audio.mp3"
    tts.save(filename)

    # Load audio using pydub
    audio = AudioSegment.from_file(filename, format="mp3")

    # Change frame rate to adjust playback speed
    modified_audio = audio._spawn(audio.raw_data, overrides={
        "frame_rate": int(audio.frame_rate * rate)
    }).set_frame_rate(audio.frame_rate)

    modified_filename = "modified_audio.mp3"
    modified_audio.export(modified_filename, format="mp3")

    # Display in Colab
    return IPAudio(modified_filename)



In [None]:
# Default rate
audio = generate_and_play("This is the default speech rate.")
display(audio)

# Half speed
audio = generate_and_play("This is at 0.9 the speech rate.", rate=0.9)
display(audio)

# Double speed
audio = generate_and_play("This is at 1.1 the speech rate.", rate=1.1)
display(audio)


# Create and save the audio

In [None]:
!pip install gtts pydub
!apt install ffmpeg

In [None]:
from gtts import gTTS
from pydub import AudioSegment
from IPython.display import display, Audio as IPAudio

def generate_and_play(text, rate=1.0):
    # Convert text to audio
    tts = gTTS(text=text, lang='en')
    temp_filename = "temp_audio.mp3"
    tts.save(temp_filename)

    # Load audio using pydub
    audio = AudioSegment.from_file(temp_filename, format="mp3")

    # Change frame rate to adjust playback speed
    modified_audio = audio._spawn(audio.raw_data, overrides={
        "frame_rate": int(audio.frame_rate * rate)
    }).set_frame_rate(audio.frame_rate)

    # Define the modified filename based on rate
    modified_filename = f"modified_audio_rate_{rate}.wav"
    modified_audio.export(modified_filename, format="wav")

    # Display in Colab
    return IPAudio(modified_filename)


In [None]:
# Default rate
audio = generate_and_play("This is the default speech rate.", rate=1.0)
display(audio)

# Half speed
audio = generate_and_play("This is at half the speech rate.", rate=0.95)
display(audio)

# Double speed
audio = generate_and_play("This is at double the speech rate.", rate=1.05)
display(audio)
