forked from hayabhay/frogbase
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request hayabhay#13 from hayabhay/dev
Rewrote app to enable saving, browsing & searching transcriptions.
- Loading branch information
Showing
17 changed files
with
902 additions
and
325 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
# Custom ignore | ||
local/ | ||
data/ | ||
|
||
### Python template | ||
# Byte-compiled / optimized / DLL files | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
|
||
## Changelog | ||
All notable changes to this project will be documented in this file. | ||
|
||
### `v1.0.0a` (2023-02-07) | ||
Since there was some apetite for this, I've rewritten this to make it a tad cleaner with a few additional features based on issues raised and personal preferences. | ||
1. Ability to download entire YouTube playlists and upload multiple files at once | ||
2. Ability browse, filter, and search through saved audio files (For now, this is done with a simple SQLite database & SQLAlchemy ORM) | ||
3. Auto-export of transcriptions in multiple formats (was a feature request) | ||
4. Simple substring based search for transcript segments. This is done with a simple `LIKE` query on the SQLite database. | ||
5. Fully reworked UI with a cleaner layout and more intuitive navigation. | ||
6. Ability to save whisper configurations and reuse to prevent having to re-enter the same parameters every time. | ||
7. Removed the ability to crop audio after download to simplify the codebase. Also, temporarily removed summarization until GPT-3 integration is complete. | ||
### `v0.0.1` (2022-10-17) | ||
Initial release for demand testing ([PR #1](https://github.com/hayabhay/whisper-ui/pull/1)). | ||
|
||
Features: | ||
- Ability to process media from youtube & local files | ||
- Whisper transcription | ||
- Basic huggingface integration for summarization | ||
|
||
|
||
## Roadmap | ||
[Planned] | ||
|
||
1. Live Transcription with Whisper - Will [streamlit-webrtc](https://github.com/whitphx/streamlit-webrtc) library. This enables live transcription of audio from a microphone and can be used to take voice notes. | ||
3. CLIP embeddings transcribed text segments + Faiss index for semantic search | ||
2. GPT-3 integration - One approach is to simply allow for an instruct prompt to be entered for a transcript and save results. Will await feedback before implementing. | ||
4. ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,33 @@ | ||
# Streamlit UI for OpenAI's Whisper transcription & analytics | ||
|
||
https://user-images.githubusercontent.com/6735526/196173369-27c5ceec-733a-4928-8acb-17cbc2e77a04.mp4 | ||
# Streamlit UI for OpenAI's Whisper | ||
|
||
This is a simple [Streamlit UI](https://streamlit.io/) for [OpenAI's Whisper speech-to-text model](https://openai.com/blog/whisper/). | ||
It let's you automatically select media by YouTube URL or select local files & then runs Whisper on them. | ||
Following that, it will display some basic analytics on the transcription. | ||
Feel free to send a PR if you want to add any more analytics or features! | ||
It let's you download and transcribe media from YouTube videos, playlists, or local files. | ||
You can then browse, filter, and search through your saved audio files. | ||
Feel free to raise an issue for bugs or feature requests or send a PR. | ||
|
||
https://user-images.githubusercontent.com/6735526/216852681-53b6c3db-3e74-4c86-806f-6f6774a9003a.mp4 | ||
|
||
## Setup | ||
This was built & tested on Python 3.9 but should also work on Python 3.7+ as with the original [Whisper repo](https://github.com/openai/whisper)). | ||
This was built & tested on Python 3.11 but should also work on Python 3.7+ as with the original [Whisper repo](https://github.com/openai/whisper)). | ||
You'll need to install `ffmpeg` on your system. Then, install the requirements with `pip`. | ||
|
||
``` | ||
sudo apt install ffmpeg | ||
pip install -r requirements.txt | ||
``` | ||
## Usage | ||
|
||
Once you're set up, you can run the app with: | ||
|
||
``` | ||
streamlit run 01_Transcribe.py | ||
streamlit run app/01_🏠_Home.py | ||
``` | ||
|
||
This will open a new tab in your browser with the app. You can then select a YouTube URL or local file & click "Run Whisper" to run the model on the selected media. | ||
|
||
## Changelog | ||
All notable changes to this project alongside potential feature roadmap will be documented [in this file](CHANGELOG.md). | ||
|
||
## License | ||
Whisper is licensed under [MIT](https://github.com/openai/whisper/blob/main/LICENSE) while Streamlit is licensed under [Apache 2.0](https://github.com/streamlit/streamlit/blob/develop/LICENSE). | ||
Everything else is licensed under [MIT](https://github.com/hayabhay/whisper-ui/blob/main/LICENSE). |
Oops, something went wrong.