Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load pretrained model in local? #69

Open
Melona-BS opened this issue Feb 1, 2024 · 2 comments
Open

How to load pretrained model in local? #69

Melona-BS opened this issue Feb 1, 2024 · 2 comments

Comments

@Melona-BS
Copy link

Melona-BS commented Feb 1, 2024

Hello, This project is awesome!
but I have a small problem with using this project.

I wanted to download and use the 'pretrained-model'("s2a-q4-small-en+pl.model") to apply this to the existing project.

An error occurs in the process of downloading the 'pretrained-model' you provided and handing over the parameter to ref as 'local_filename' through Pipeline.

[code]
image

from whisperspeech.pipeline import Pipeline
en_text_prompt = "Hello? I'm calling to reserve a room, but is there a room left?"

pipe = Pipeline(s2a_ref={'local_filename': "s2a-q4-tiny-en+pl.model"})
pipe.generate_to_file(file_path, en_text_prompt)

print("WhisperSpeech Test Done!")

[error]
image

AttributeError: 'dict' object has no attribute 'seek'. You can only torch.load from a file that is seekable. 
Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

Do you have any instructions or guides to use the 'pretrained-model' you provided?
I looked up the code but only load_model of 't2s' and 's2a' class was valid for me.

Thank you for your research and contribution!

@jpc
Copy link
Contributor

jpc commented Feb 1, 2024

Hey, you can pass the file name as a string, like this:

pipe = Pipeline(s2a_ref="s2a-q4-tiny-en+pl.model")

If you want to avoid downloading anything automatically you'll need to download and pass in t2s_ref in the same way. And you may need to download Vocos and Encodec. We have an example script for Docker here:
https://github.com/collabora/WhisperFusion/blob/main/docker/scripts/setup-whisperfusion.sh#L19-L23

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Feb 2, 2024

Here's another option in a script I made which works alright. The text to be spoken hardcoded into the script for testing purposes, but it has the necessary structure you're looking for I believe, enough to get started:

from pydub import AudioSegment
import numpy as np
from whisperspeech.pipeline import Pipeline

# pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-small-en+pl.model')
pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-tiny-en+pl.model')
# pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-base-en+pl.model')

audio_tensor = pipe.generate("""
 According to the provided context from Georgia Juvenile Practice and Procedure with Forms the preliminary protective hearing in a dependency case must be held promptly after a child is removed from the home and no later than 72 hours after the child is placed in foster care. If this 72-hour time frame expires on a weekend or legal holiday, the hearing should be scheduled for no later than the next day that is not a weekend or legal holiday.
""")

# generate uses CUDA if available; therefore, it's necessary to move to CPU before converting to NumPy array
audio_np = (audio_tensor.cpu().numpy() * 32767).astype(np.int16)

if len(audio_np.shape) == 1:
    audio_np = np.expand_dims(audio_np, axis=0)
else:
    audio_np = audio_np.T

print("Array shape:", audio_np.shape)
print("Array dtype:", audio_np.dtype)

try:
    audio_segment = AudioSegment(
        audio_np.tobytes(), 
        frame_rate=24000, 
        sample_width=2, 
        channels=1
    )
    audio_segment.export('output_audio.wav', format='wav')
    print("Audio file generated: output_audio.wav")
except Exception as e:
    print(f"Error writing audio file: {e}")

It relies on pydub instead of the standard way that text is processed by the source code, however, just FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants