-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi threading #100
Comments
There are 2 levels of multithreading:
Running one transcription on CPU with multiple threadsThis number of threads can be configured with the argument model = WhisperModel("large-v2", device="cpu", cpu_threads=8) This is the number of threads used by the model itself (usually the number of OpenMP threads). The input is not split and processed in multiple parts. Running multiple transcriptions in parallelMultiple transcriptions can run in parallel when the model is using multiple workers or running on multiple GPUs: # Create a model running on CPU with 4 workers each using 2 threads:
model = WhisperModel("large-v2", device="cpu", num_workers=4, cpu_threads=2)
# Create a model running on multiple GPUs:
model = WhisperModel("large-v2", device="cuda", device_index=[0, 1, 2, 3])
# Using multiple workers on a single GPU is also possible but will not increase the throughput by much:
# model = WhisperModel("large-v2", device="cuda", num_workers=2) Then you can call Just as an example, here's how you can submit multiple transcriptions using a import concurrent.futures
from faster_whisper import WhisperModel
num_workers = 4
model = WhisperModel("large-v2", device="cpu", num_workers=num_workers, cpu_threads=2)
files = [
"audio1.mp3",
"audio2.mp3",
"audio3.mp3",
"audio4.mp3",
]
def transcribe_file(file_path):
segments, info = model.transcribe(file_path)
segments = list(segments)
return segments
with concurrent.futures.ThreadPoolExecutor(num_workers) as executor:
results = executor.map(transcribe_file, files)
for path, segments in zip(files, results):
print(
"Transcription for %s:%s"
% (path, "".join(segment.text for segment in segments))
) |
Thank you @guillaumekln for your help! |
I am running the model on an AWS p3.16xlarge Sagemaker instance with 8 GPUs (16GB each). I am looking to achieve parallelization. Would ThreadPoolExecutor work in this case as well after I create the model on multiple GPUs using the device_index? |
Yes, the Another approach is to launch multiple Python processes (e.g. using |
Can you share a code snippet implementing the multiprocessing route with the above code instead of using ThreadPoolExecutor please. |
When using |
@guillaumekln I am using the model on GPU and using this line model = WhisperModel("large-v2", device="cuda", device_index=[0, 1, 2, 3]), but the the responstime is getting added , it's not resolving all the requests at a time , it is resolving like a queue, Below is my flask app code model_size = "large-v2" os.environ["OMP_NUM_THREADS"] = "6" model = WhisperModel("large-v2", device="cuda", device_index=[0, 1,2,3],num_workers=4, compute_type="int8") @app.route("/transcribe", methods=["POST"]) if name == "main": |
Try adding |
@guillaumekln still time is increasing and geting this in log---> |
@guillaumekln |
I ran 4 processes on a gpu at the same time, but the speed did not improve. |
I saw a few people talking about using multiple threads. Is there any documentation or code examples I can see to accomplish this?
The text was updated successfully, but these errors were encountered: