Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching inference commit should be reverted and applied part-by-part for community adaptation !!!! #937

Open
aligokalppeker opened this issue Jul 28, 2024 · 13 comments

Comments

@aligokalppeker
Copy link

aligokalppeker commented Jul 28, 2024

As the commit itself is inspected, it is a "really" messy update, aiming to improve the performance but in fact, pushes faster-whisper to a very bad state;

  • messed up dependencies; faster whisper has a clean and lean dependency, right now it depends on torch just for some FFT ops. Audio processing is switched to torch audio which also deals with problems in codec compatability.
  • not speed up but slow down; for non-batched operations and CPU-based ones, the slowdown is observed instead of optimization
  • code cover from other projects; getting a code block from the "not-so-active" whisper-x project, which is hard to maintain and possibly not suitable for real-time stuff.
  • complexity; to make the code batched and "optimized", really bulky stuff and different dependencies got into the code.
  • it is not worth the changes; in the end, there are alternative ways to do the batched processing as mentioned in this space, without affecting the code complexity.
  • optimizations may be done in a simple alternative way; there is no need to get all code to torch tensor, instead, speed-ups can be done via numpy GPU/CPU optimized alternatives; numpy 2.0, cupy, numba etc.

I do not know, how the PR is updated, but it should not have been merged;

  • it is too big to review; there is too much code in the one PR, who can say that this PR does not affect any scenarios by looking at so much code?
  • too many dependency changes to analyze
  • it should have been incremental; PR can be separated into 2-3 different PRs per feature and so all can be merged and tested independently.

This PR will be the doom of faster whisper, forcing all to remain in older versions or dump the project(or fork from 1.0.3).

It should be reverted and rearranged to come in incrementally

@MahmoudAshraf97
Copy link
Contributor

Hi, I'm one of the coauthors of the batching PR and was one of the top contributors in whisperx, I'll try to address your points one by one and hopefully it'll lead to a fruitful discussion.

  • messed up dependencies; faster whisper has a clean and lean dependency, right now it depends on torch just for some FFT ops. Audio processing is switched to torch audio which also deals with problems in codec compatability.

  • optimizations may be done in a simple alternative way; there is no need to get all code to torch tensor, instead, speed-ups can be done via numpy GPU/CPU optimized alternatives; numpy 2.0, cupy, numba etc.

torch is indeed used for feature extraction , but it does provide more than 8x speedup on cpu and even more on gpu compared to the previous numpy implementation, and more speedup can be squeezed with further optimization. This is the only use of torch after the PR so I don't understand where we got ALL the code to torch. As for the alternative optimized implementation, feel free to give it a try and open a PR, we will be more than pleased to replace torch with a better alternative.

  • not speed up but slow down; for non-batched operations and CPU-based ones, the slowdown is observed instead of optimization

The slowdown is observed on low-resource environments, we are still working to pinpoint the issue and solve it, and while this is a valid use case, it's does not represent the majority of the user base and those who face issues have the option to stick to v1.0.3 for now until its resolved

  • code cover from other projects; getting a code block from the "not-so-active" whisper-x project, which is hard to maintain and possibly not suitable for real-time stuff.

Whisperx has been inactive for a while indeed, but me and several other faster-whisper contributors are also contributors to whisperx and familiar with both codebases and have the ability to maintain it so rest assured, as for the realtime usage, while it's not a direct use case of whisper, I don't see how the PR affects that as it's totally backwards compatible

  • complexity; to make the code batched and "optimized", really bulky stuff and different dependencies got into the code.
    it is not worth the changes; in the end, there are alternative ways to do the batched processing as mentioned in this space, without affecting the code complexity.

The main PR went through a rigorous review process for more than a month and over 200 public comments and reviews from multiple users, The code is being refined and cleaned in several following PRs such as #921 and #936 , regardless of that, you are totally free to leave a review comment with constructive feedback on any PR or even better, open you own PR implementing any changes you see useful or a simpler batching implementation, if the batching PR isn't worth it for your use case, as I mentioned before you have the option to stick to an older version, faster whisper is pretty mature so you won't be missing out on anything 😉

  • it is too big to review; there is too much code in the one PR, who can say that this PR does not affect any scenarios by looking at so much code?

Feel free to add more tests to the test suite, any contribution is welcome

At the end, this PR is the result of tens of hours of work volunteered from several people, and while this not being the first issue opened about the batching PR, this is the first one to be so disrespectful and entitled
So I guess it's better to show some respect and stick to the community guidelines

@aligokalppeker
Copy link
Author

aligokalppeker commented Jul 29, 2024

Let me give a response to your answers and describe more about how this PR ruins faster-whisper. Before that let me say that by trying to put batching to faster-whisper you made it ineffective/bloated and more complex project to maintain;

torch is indeed used for feature extraction , but it does provide more than 8x speedup on cpu and even more on gpu compared to the previous numpy implementation, and more speedup can be squeezed with further optimization. This is the only use of torch after the PR so I don't understand where we got ALL the code to torch. As for the alternative optimized implementation, feel free to give it a try and open a PR, we will be more than pleased to replace torch with a better alternative.

You just bring up torch to pip dependency and all the interfaces and other structs on the project started to depend on torch tensor structure. This makes the code bloated with torch. You just changed the core transcription code and put torch all over.

The slowdown is observed on low-resource environments, we are still working to pinpoint the issue and solve it, and while this is a valid use case, it's does not represent the majority of the user base and those who face issues have the option to stick to v1.0.3 for now until its resolved

Even if you try to fix this issue, there will be many other cases that will make faster-whisper ineffective with the current design and updates. For example, one may not need batch processing for streaming transcription.

Whisperx has been inactive for a while indeed, but me and several other faster-whisper contributors are also contributors to whisperx and familiar with both codebases and have the ability to maintain it so rest assured, as for the realtime usage, while it's not a direct use case of whisper, I don't see how the PR affects that as it's totally backwards compatible

Who are you to consider to define the usage target of faster whisper? realtime or not? You are not the community or authority. Version 1.0.3 provides a decent way to use faster-whisper in this context and you should not have any right to degrade that.

The main PR went through a rigorous review process for more than a month and over 200 public comments and reviews from multiple users, The code is being refined and cleaned in several following PRs such as #921 and #936 , regardless of that, you are totally free to leave a review comment with constructive feedback on any PR or even better,

I am looking at the following PRs. Your contributor's team's review does not reflect the community review. Some of the changes are too big in the PR, even GitHub does not display them by default. I have left out my constructive feedback over the PR and this issue; REVERT THIS COMMIT. In this case, if your code includes meaningful parts you can commit them independently reaching to your desired state part by part(if it is worthed)

open you own PR implementing any changes you see useful or a simpler batching implementation, if the batching PR isn't worth it for your use case, as I mentioned before you have the option to stick to an older version, faster whisper is pretty mature so you won't be missing out on anything 😉

You should not be the one to redirect people to older versions based on your bad design decisions. You can not condemn faster-whisper to batched inferencing solution or any older version for other usages. You do not have the right to do that. If you want that do that in your whisper/whisper-x repo and leave this one clean and in quality. For any PRs and alternative solutions, this commit should be reverted.

Feel free to add more tests to the test suite, any contribution is welcome

It is not just the tests that ensure code quality and performance, it is also the design and implementation. Even if your suggestion seems viable it is not an answer/solution related to my argument. Clean code and design is needed

In the end, spending time/effort on complex/bloated stuff does not make it holy or beneficial. Contrary to that it creates a mess that is hard to maintain. I respect the high quality, lean, and flexible design that the original faster-whisper provides. With something similar to that, you can get the deserved respect.

Besides that, you and your team's effort on this repository does not make you any authority or decision maker.

@ibrahimdevs
Copy link

@MahmoudAshraf97 thanks for your precious time and effort adding this batching support to this project 🙏

I think that low resources and running on cpu is mostly for development case. In real products, mostly powerful gpus are using. So it is better to focus, if we have more resource, can we make it faster like batching algorithm. There are some performance problems on CPU and low resources environment.

@aligokalppeker I think it is better to fork this project from previous commit which not includes batching support, and maintain and improve your forked project. We all trying to make open-source projects better for us and others. We spend our precious time. We have to polite and respectful for each other.

@aligokalppeker
Copy link
Author

aligokalppeker commented Aug 5, 2024

@MahmoudAshraf97 thanks for your precious time and effort adding this batching support to this project 🙏

I think that low resources and running on cpu is mostly for development case. In real products, mostly powerful gpus are using. So it is better to focus, if we have more resource, can we make it faster like batching algorithm. There are some performance problems on CPU and low resources environment.

@aligokalppeker I think it is better to fork this project from previous commit which not includes batching support, and maintain and improve your forked project. We all trying to make open-source projects better for us and others. We spend our precious time. We have to polite and respectful for each other.

@ibrahimdevs You should advise this to the other guys, If they want to do something so different they can make a separate project that community can benefit from.

Your talk comes to the point that ignorance is a bliss, and you may be such a kind. But doing a fork is the easiest approach to avoid this mess. What i am trying to do is keeping the project alive original to its roots. Community will understand the merits of this conservative action sooner or later.

btw It is doubtful that you spend your precious time as you have less/no activity in github. So get away from false allegations.

@aligokalppeker aligokalppeker changed the title Revert the batching inference commit !!!! Batching inference commit should be reverted and applied part-by-part for community adaptation !!!! Aug 5, 2024
@dodysw
Copy link

dodysw commented Aug 7, 2024

hello while still in the topic, can

  1. help identify why after syncing my fork with this commit, I got this error:

File "/home/dody/playsub/venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 523, in transcribe
features = torch.stack(
RuntimeError: stack expects a non-empty TensorList

  1. also confused with this continuous warnings:

Model was trained with pyannote.audio 0.0.1, yours is 3.3.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.4.0+cu121. Bad things might happen unless you revert torch to 1.x.

The latest commit has raised multiple module version conflicts that I must manually resolves....

@MahmoudAshraf97
Copy link
Contributor

@dodysw the first issue means that vad found no speech in the audio

@dodysw
Copy link

dodysw commented Aug 8, 2024

@dodysw the first issue means that vad found no speech in the audio

shouldn't the library handle this situation when there's no speech in the audio? e.g. just do early return instead of crashed.

@aligokalppeker
Copy link
Author

hello while still in the topic, can

  1. help identify why after syncing my fork with this commit, I got this error:

File "/home/dody/playsub/venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 523, in transcribe features = torch.stack( RuntimeError: stack expects a non-empty TensorList

  1. also confused with this continuous warnings:

Model was trained with pyannote.audio 0.0.1, yours is 3.3.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.4.0+cu121. Bad things might happen unless you revert torch to 1.x.

The latest commit has raised multiple module version conflicts that I must manually resolves....

Unfortunately another result of the big bulky change of the PR. More and more issue will appear as the changes are too big to test and verify in all scenarios.

@aligokalppeker
Copy link
Author

The issue is also another sign of this commit #954

@ben256
Copy link

ben256 commented Aug 9, 2024

@aligokalppeker There may be some issues with the current implementation, but there is quite clearly a PR in which does resolve all the issues as I stated in that #954. Batched inference with the PR changes actually improved the output over sequential inference, so I think this commit and overall approach is for the better.

@Purfview
Copy link
Contributor

Ha, the unhappy customers... Looks like my comment was prophetic. 😜

I still think that merging that PR was not wise decision, some things there are suboptimal and breaking for some users, anyway, no need to be rude about it, in the end you can fork it and maintain "cleaner" repo of FW.

@aligokalppeker
Copy link
Author

Ha, the unhappy customers... Looks like my comment was prophetic. 😜

I still think that merging that PR was not wise decision, some things there are suboptimal and breaking for some users, anyway, no need to be rude about it, in the end you can fork it and maintain "cleaner" repo of FW.

@Purfview I have seen your comment now as you've referenced it in this issue. It summarizes the whole problem in this PR. There are too many things in a PR that can break things up and even if these stuff are solved, its design makes the faster-whisper; "bulky-whisper".

For each feature, alternative ways can be discussed and progressed accordingly to keep the faster-whisper still faster and lean.

I think, your "fork" suggestion is more appropriate for the authors as you've also mentioned in your comment.

@tammam1998
Copy link

tammam1998 commented Aug 16, 2024

I agree that this change should be reverted and we should maintain the spirit of keeping faster-whisper lean with minimal bloat and dependencies.

In fact, one of the reasons I decided to use faster-whisper instead of whisperX was that I found whisperX to be bloated and it had lots of code smell. I would rather we don't try to push faster-whisper in whisperX direction. Some of whisperX features fill a specific need for specific kinds of users which is great but I don't think we should push those features down to this repo, especially that whisperX depends on this repo.

I'm just putting my opinion here and I do acknowledge the hard work people have put. It's up to the repo owners to decide how to move forward, for now I'm likely sticking to v1.0.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants