Multi threading #100

Joepetey · 2023-03-31T14:13:57Z

I saw a few people talking about using multiple threads. Is there any documentation or code examples I can see to accomplish this?

guillaumekln · 2023-03-31T15:32:45Z

There are 2 levels of multithreading:

Running one transcription on CPU with multiple threads
Running multiple transcriptions in parallel

Running one transcription on CPU with multiple threads

This number of threads can be configured with the argument cpu_threads (4 by default):

model = WhisperModel("large-v2", device="cpu", cpu_threads=8)

This is the number of threads used by the model itself (usually the number of OpenMP threads). The input is not split and processed in multiple parts.

Running multiple transcriptions in parallel

Multiple transcriptions can run in parallel when the model is using multiple workers or running on multiple GPUs:

# Create a model running on CPU with 4 workers each using 2 threads:
model = WhisperModel("large-v2", device="cpu", num_workers=4, cpu_threads=2)

# Create a model running on multiple GPUs:
model = WhisperModel("large-v2", device="cuda", device_index=[0, 1, 2, 3])

# Using multiple workers on a single GPU is also possible but will not increase the throughput by much:
# model = WhisperModel("large-v2", device="cuda", num_workers=2)

Then you can call model.transcribe from multiple Python threads. Of course there are multiple ways to do that. If you are using this library in a webserver, it may already use multiple threads so there is nothing to do.

Just as an example, here's how you can submit multiple transcriptions using a ThreadPoolExecutor. If there are enough files you will see num_workers * cpu_threads active CPU threads.

import concurrent.futures

from faster_whisper import WhisperModel

num_workers = 4
model = WhisperModel("large-v2", device="cpu", num_workers=num_workers, cpu_threads=2)

files = [
    "audio1.mp3",
    "audio2.mp3",
    "audio3.mp3",
    "audio4.mp3",
]


def transcribe_file(file_path):
    segments, info = model.transcribe(file_path)
    segments = list(segments)
    return segments


with concurrent.futures.ThreadPoolExecutor(num_workers) as executor:
    results = executor.map(transcribe_file, files)

    for path, segments in zip(files, results):
        print(
            "Transcription for %s:%s"
            % (path, "".join(segment.text for segment in segments))
        )

Joepetey · 2023-03-31T19:28:15Z

Thank you @guillaumekln for your help!

supratim1121992 · 2023-06-05T22:56:53Z

There are 2 levels of multithreading:

Running one transcription on CPU with multiple threads

Running multiple transcriptions in parallel

Running one transcription on CPU with multiple threads

This number of threads can be configured with the argument cpu_threads (4 by default):
model = WhisperModel("large-v2", device="cpu", cpu_threads=8)
Running multiple transcriptions in parallel

Multiple transcriptions can run in parallel when the model is using multiple workers or running on multiple GPUs:
# Create a model running on CPU with 4 workers each using 2 threads:
model = WhisperModel("large-v2", device="cpu", num_workers=4, cpu_threads=2)

# Create a model running on multiple GPUs:
model = WhisperModel("large-v2", device="cuda", device_index=[0, 1, 2, 3])

# Using multiple workers on a single GPU is also possible but will not increase the throughput by much:
# model = WhisperModel("large-v2", device="cuda", num_workers=2)
Then you can call model.transcribe from multiple Python threads. Of course there are multiple ways to do that. If you are using this library in a webserver, it may already use multiple threads so there is nothing to do.

Just as an example, here's how you can submit multiple transcriptions using a ThreadPoolExecutor. If there are enough files you will see num_workers * cpu_threads active CPU threads.
import concurrent.futures

from faster_whisper import WhisperModel

num_workers = 4
model = WhisperModel("large-v2", device="cpu", num_workers=num_workers, cpu_threads=2)

files = [
    "audio1.mp3",
    "audio2.mp3",
    "audio3.mp3",
    "audio4.mp3",
]


def transcribe_file(file_path):
    segments, info = model.transcribe(file_path)
    segments = list(segments)
    return segments


with concurrent.futures.ThreadPoolExecutor(num_workers) as executor:
    results = executor.map(transcribe_file, files)

    for path, segments in zip(files, results):
        print(
            "Transcription for %s:%s"
            % (path, "".join(segment.text for segment in segments))
        )

I am running the model on an AWS p3.16xlarge Sagemaker instance with 8 GPUs (16GB each). I am looking to achieve parallelization. Would ThreadPoolExecutor work in this case as well after I create the model on multiple GPUs using the device_index?

guillaumekln · 2023-06-06T08:20:28Z

Yes, the ThreadPoolExecutor example would also work to transcribe multiple files on multiple GPUs.

Another approach is to launch multiple Python processes (e.g. using multiprocessing) and load a model on a different GPU in each process.

supratim1121992 · 2023-06-06T14:43:55Z

Yes, the ThreadPoolExecutor example would also work to transcribe multiple files on multiple GPUs.

Another approach is to launch multiple Python processes (e.g. using multiprocessing) and load a model on a different GPU in each process.

Can you share a code snippet implementing the multiprocessing route with the above code instead of using ThreadPoolExecutor please.

guillaumekln · 2023-07-11T09:32:06Z

When using multiprocessing there is nothing specific about faster-whisper. You can look at the dozen of multiprocessing examples on the Web. You just want to make sure to load a single model in each process.

brajeshvisio01 · 2023-08-10T06:13:26Z

@guillaumekln I am using the model on GPU and using this line model = WhisperModel("large-v2", device="cuda", device_index=[0, 1, 2, 3]), but the the responstime is getting added , it's not resolving all the requests at a time , it is resolving like a queue, Below is my flask app code
`import base64
import concurrent.futures
from flask import *
from flask_cors import CORS, cross_origin
import os
import time
import wave
import threading
from faster_whisper import WhisperModel

model_size = "large-v2"
app = Flask(name)

os.environ["OMP_NUM_THREADS"] = "6"

model = WhisperModel("large-v2", device="cuda", device_index=[0, 1,2,3],num_workers=4, compute_type="int8")

@app.route("/transcribe", methods=["POST"])
@cross_origin()
def transcribe():
start_time=time.time()
try:
audio_file = request.files['audio']
audio_file.save(audio_file.filename)
segments, info = model.transcribe(audio_file.filename,language='en',task='transcribe', beam_size=5,temperature=0.2,vad_filter=True)
result=""
for segment in segments:
result=result+" "+segment.text
end_time=time.time()
dur=end_time-start_time
return {"start_time":start_time,"end_time":end_time, "duration":round(dur,3),"text":result}
except Exception as e:
print("Error::",str(e))
return(str(e))

if name == "main":
app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT",8080)))`

guillaumekln · 2023-08-10T06:21:17Z

Try adding threaded=True when calling app.run.

brajeshvisio01 · 2023-08-10T06:48:28Z

@guillaumekln
added threaded=True
app.run(debug=True,threaded=True, host="0.0.0.0", port=int(os.environ.get("PORT",8080)))

still time is increasing and geting this in log--->

this is showing increasing response time

brajeshvisio01 · 2023-08-10T13:30:35Z

@guillaumekln
Just for your information the above code {https://github.com/guillaumekln/faster-whisper/issues/100#issuecomment-1672616398} had run earlier (07-Aug-2023) with average response time around 600ms but now it is giving the above result https://github.com/guillaumekln/faster-whisper/issues/100#issuecomment-1672652559.
Kindly Help,
Thanks and Regards

wwfcnu · 2023-11-16T09:09:03Z

I ran 4 processes on a gpu at the same time, but the speed did not improve.

guillaumekln closed this as completed Apr 1, 2023

chainyo mentioned this issue Apr 2, 2023

Investigate on a better GPU usage Wordcab/wordcab-transcribe#6

Closed

RomanKlimov mentioned this issue Apr 10, 2023

Speeding up processing of single audio file with multithreading in faster-whisper #133

Open

guillaumekln mentioned this issue Apr 12, 2023

How to improve speed of iterataing over segments? #67

Closed

hobodrifterdavid mentioned this issue Jul 10, 2023

batch execution transcribe in faster-whisper #59

Open

guillaumekln mentioned this issue Sep 4, 2023

Performance Degradation When Running Multiple Jobs in Parallel on a Single GPU #441

Closed

This was referenced Sep 13, 2023

How to do multi-GPU computing? #471

Closed

How to run faster_whisper on multiple GPUs？ #473

Closed

trungkienbkhn mentioned this issue Mar 1, 2024

The WhisperModel function num_workers How to make the execution truly parallel #723

Open

trungkienbkhn mentioned this issue Jul 2, 2024

Generators cannot be saved directly #892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi threading #100

Multi threading #100

Joepetey commented Mar 31, 2023

guillaumekln commented Mar 31, 2023 •

edited

Loading

Joepetey commented Mar 31, 2023

supratim1121992 commented Jun 5, 2023

Running one transcription on CPU with multiple threads

Running multiple transcriptions in parallel

guillaumekln commented Jun 6, 2023

supratim1121992 commented Jun 6, 2023

guillaumekln commented Jul 11, 2023

brajeshvisio01 commented Aug 10, 2023 •

edited

Loading

guillaumekln commented Aug 10, 2023

brajeshvisio01 commented Aug 10, 2023 •

edited

Loading

brajeshvisio01 commented Aug 10, 2023

wwfcnu commented Nov 16, 2023

Multi threading #100

Multi threading #100

Comments

Joepetey commented Mar 31, 2023

guillaumekln commented Mar 31, 2023 • edited Loading

Running one transcription on CPU with multiple threads

Running multiple transcriptions in parallel

Joepetey commented Mar 31, 2023

supratim1121992 commented Jun 5, 2023

Running one transcription on CPU with multiple threads

Running multiple transcriptions in parallel

guillaumekln commented Jun 6, 2023

supratim1121992 commented Jun 6, 2023

guillaumekln commented Jul 11, 2023

brajeshvisio01 commented Aug 10, 2023 • edited Loading

guillaumekln commented Aug 10, 2023

brajeshvisio01 commented Aug 10, 2023 • edited Loading

brajeshvisio01 commented Aug 10, 2023

wwfcnu commented Nov 16, 2023

guillaumekln commented Mar 31, 2023 •

edited

Loading

brajeshvisio01 commented Aug 10, 2023 •

edited

Loading

brajeshvisio01 commented Aug 10, 2023 •

edited

Loading