Merge pull request hayabhay#13 from hayabhay/dev

Rewrote app to enable saving, browsing & searching transcriptions.
bluelinden · Feb 5, 2023 · 91c938e · 91c938e
2 parents ee92f38 + 334d81c
commit 91c938e
Show file tree

Hide file tree

Showing 17 changed files with 902 additions and 325 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,6 @@
 # Custom ignore
 local/
+data/
 
 ### Python template
 # Byte-compiled / optimized / DLL files

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -5,7 +5,7 @@ default_stages: [commit]
 repos:
   # Common pre-commit hooks
   - repo: https://github.com/pre-commit/pre-commit-hooks
-    rev: v4.3.0
+    rev: v4.4.0
     hooks:
       - id: check-added-large-files
         args: ['--maxkb=20000']
@@ -27,7 +27,7 @@ repos:
       - id: trailing-whitespace
 
   - repo: https://github.com/Lucas-C/pre-commit-hooks
-    rev: v1.3.1
+    rev: v1.4.2
     hooks:
       - id: forbid-crlf
         name: CRLF end-lines checker
@@ -41,7 +41,7 @@ repos:
         language: python
 
   - repo: https://github.com/hadialqattan/pycln
-    rev: v2.1.1
+    rev: v2.1.3
     hooks:
       - id: pycln
         name: pycln
@@ -62,7 +62,7 @@ repos:
         language: system
 
   - repo: https://github.com/mwouts/jupytext
-    rev: v1.14.1
+    rev: v1.14.4
     hooks:
       - id: jupytext
         name: jupytext
@@ -75,7 +75,7 @@ repos:
           - black==22.1.0
 
   - repo: https://github.com/psf/black
-    rev: 22.8.0
+    rev: 23.1.0
     hooks:
       - id: black
         name: black
@@ -89,7 +89,7 @@ repos:
           - "--line-length=120"
 
   - repo: https://github.com/timothycrosley/isort
-    rev: 5.10.1
+    rev: 5.12.0
     hooks:
       - id: isort
         name: isort
@@ -109,7 +109,7 @@ repos:
           - "--trailing-comma"
 
   - repo: https://github.com/PyCQA/flake8
-    rev: 5.0.4
+    rev: 6.0.0
     hooks:
       - id: flake8
         name: flake8

diff --git a/01_Transcribe.py b/01_Transcribe.py
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,29 @@
+
+## Changelog
+All notable changes to this project will be documented in this file.
+
+### `v1.0.0a` (2023-02-07)
+Since there was some apetite for this, I've rewritten this to make it a tad cleaner with a few additional features based on issues raised and personal preferences.
+1. Ability to download entire YouTube playlists and upload multiple files at once
+2. Ability browse, filter, and search through saved audio files (For now, this is done with a simple SQLite database & SQLAlchemy ORM)
+3. Auto-export of transcriptions in multiple formats (was a feature request)
+4. Simple substring based search for transcript segments. This is done with a simple `LIKE` query on the SQLite database.
+5. Fully reworked UI with a cleaner layout and more intuitive navigation.
+6. Ability to save whisper configurations and reuse to prevent having to re-enter the same parameters every time.
+7. Removed the ability to crop audio after download to simplify the codebase. Also, temporarily removed summarization until GPT-3 integration is complete.
+### `v0.0.1` (2022-10-17)
+Initial release for demand testing ([PR #1](https://github.com/hayabhay/whisper-ui/pull/1)).
+
+Features:
+- Ability to process media from youtube & local files
+- Whisper transcription
+- Basic huggingface integration for summarization
+
+
+## Roadmap
+[Planned]
+
+1. Live Transcription with Whisper - Will [streamlit-webrtc](https://github.com/whitphx/streamlit-webrtc) library. This enables live transcription of audio from a microphone and can be used to take voice notes.
+3. CLIP embeddings transcribed text segments + Faiss index for semantic search
+2. GPT-3 integration - One approach is to simply allow for an instruct prompt to be entered for a transcript and save results. Will await feedback before implementing.
+4. ...
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2022 Abhay Kashyap
+Copyright (c) 2023 Abhay Kashyap
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -1,29 +1,33 @@
-# Streamlit UI for OpenAI's Whisper transcription & analytics
-
-https://user-images.githubusercontent.com/6735526/196173369-27c5ceec-733a-4928-8acb-17cbc2e77a04.mp4
+# Streamlit UI for OpenAI's Whisper
 
 This is a simple [Streamlit UI](https://streamlit.io/) for [OpenAI's Whisper speech-to-text model](https://openai.com/blog/whisper/).
-It let's you automatically select media by YouTube URL or select local files & then runs Whisper on them.
-Following that, it will display some basic analytics on the transcription.
-Feel free to send a PR if you want to add any more analytics or features!
+It let's you download and transcribe media from YouTube videos, playlists, or local files.
+You can then browse, filter, and search through your saved audio files.
+Feel free to raise an issue for bugs or feature requests or send a PR.
+
+https://user-images.githubusercontent.com/6735526/216852681-53b6c3db-3e74-4c86-806f-6f6774a9003a.mp4
 
 ## Setup
-This was built & tested on Python 3.9 but should also work on Python 3.7+ as with the original [Whisper repo](https://github.com/openai/whisper)).
+This was built & tested on Python 3.11 but should also work on Python 3.7+ as with the original [Whisper repo](https://github.com/openai/whisper)).
 You'll need to install `ffmpeg` on your system. Then, install the requirements with `pip`.
 
 ```
+sudo apt install ffmpeg
 pip install -r requirements.txt
 ```
 ## Usage
 
 Once you're set up, you can run the app with:
 
 ```
-streamlit run 01_Transcribe.py
+streamlit run app/01_🏠_Home.py
 ```
 
 This will open a new tab in your browser with the app. You can then select a YouTube URL or local file & click "Run Whisper" to run the model on the selected media.
 
+## Changelog
+All notable changes to this project alongside potential feature roadmap will be documented [in this file](CHANGELOG.md).
+
 ## License
 Whisper is licensed under [MIT](https://github.com/openai/whisper/blob/main/LICENSE) while Streamlit is licensed under [Apache 2.0](https://github.com/streamlit/streamlit/blob/develop/LICENSE).
 Everything else is licensed under [MIT](https://github.com/hayabhay/whisper-ui/blob/main/LICENSE).