How to load pretrained model in local? #69

Melona-BS · 2024-02-01T03:16:34Z

Hello, This project is awesome!
but I have a small problem with using this project.

I wanted to download and use the 'pretrained-model'("s2a-q4-small-en+pl.model") to apply this to the existing project.

An error occurs in the process of downloading the 'pretrained-model' you provided and handing over the parameter to ref as 'local_filename' through Pipeline.

[code]

from whisperspeech.pipeline import Pipeline
en_text_prompt = "Hello? I'm calling to reserve a room, but is there a room left?"

pipe = Pipeline(s2a_ref={'local_filename': "s2a-q4-tiny-en+pl.model"})
pipe.generate_to_file(file_path, en_text_prompt)

print("WhisperSpeech Test Done!")

[error]

AttributeError: 'dict' object has no attribute 'seek'. You can only torch.load from a file that is seekable. 
Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

Do you have any instructions or guides to use the 'pretrained-model' you provided?
I looked up the code but only load_model of 't2s' and 's2a' class was valid for me.

Thank you for your research and contribution!

jpc · 2024-02-01T12:28:18Z

Hey, you can pass the file name as a string, like this:

pipe = Pipeline(s2a_ref="s2a-q4-tiny-en+pl.model")

If you want to avoid downloading anything automatically you'll need to download and pass in t2s_ref in the same way. And you may need to download Vocos and Encodec. We have an example script for Docker here:
https://github.com/collabora/WhisperFusion/blob/main/docker/scripts/setup-whisperfusion.sh#L19-L23

BBC-Esq · 2024-02-02T12:49:01Z

Here's another option in a script I made which works alright. The text to be spoken hardcoded into the script for testing purposes, but it has the necessary structure you're looking for I believe, enough to get started:

from pydub import AudioSegment
import numpy as np
from whisperspeech.pipeline import Pipeline

# pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-small-en+pl.model')
pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-tiny-en+pl.model')
# pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-base-en+pl.model')

audio_tensor = pipe.generate("""
 According to the provided context from Georgia Juvenile Practice and Procedure with Forms the preliminary protective hearing in a dependency case must be held promptly after a child is removed from the home and no later than 72 hours after the child is placed in foster care. If this 72-hour time frame expires on a weekend or legal holiday, the hearing should be scheduled for no later than the next day that is not a weekend or legal holiday.
""")

# generate uses CUDA if available; therefore, it's necessary to move to CPU before converting to NumPy array
audio_np = (audio_tensor.cpu().numpy() * 32767).astype(np.int16)

if len(audio_np.shape) == 1:
    audio_np = np.expand_dims(audio_np, axis=0)
else:
    audio_np = audio_np.T

print("Array shape:", audio_np.shape)
print("Array dtype:", audio_np.dtype)

try:
    audio_segment = AudioSegment(
        audio_np.tobytes(), 
        frame_rate=24000, 
        sample_width=2, 
        channels=1
    )
    audio_segment.export('output_audio.wav', format='wav')
    print("Audio file generated: output_audio.wav")
except Exception as e:
    print(f"Error writing audio file: {e}")

It relies on pydub instead of the standard way that text is processed by the source code, however, just FYI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load pretrained model in local? #69

How to load pretrained model in local? #69

Melona-BS commented Feb 1, 2024 •

edited

jpc commented Feb 1, 2024

BBC-Esq commented Feb 2, 2024 •

edited

How to load pretrained model in local? #69

How to load pretrained model in local? #69

Comments

Melona-BS commented Feb 1, 2024 • edited

jpc commented Feb 1, 2024

BBC-Esq commented Feb 2, 2024 • edited

Melona-BS commented Feb 1, 2024 •

edited

BBC-Esq commented Feb 2, 2024 •

edited