Conversation
Summary of ChangesHello @Gambitnl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the stability and user experience of the audio processing pipeline. It addresses critical issues in both the PyAnnote speaker diarization and Ollama-based classification components, making them more resilient to common failure modes like memory exhaustion and API changes. Furthermore, it introduces a live transcription preview feature, providing users with immediate feedback and greater transparency during lengthy processing tasks. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several significant improvements. It fixes a crash in PyAnnote embedding extraction by making the normalization more robust and adding error handling. It also addresses Ollama memory exhaustion issues by implementing a retry mechanism with low-VRAM settings and an optional fallback model, along with a preflight check to warn users about potential memory issues. Additionally, a new live transcription preview feature has been added to the UI, providing real-time feedback during processing. The changes are well-structured and include corresponding updates to documentation and tests. I've found one area for improvement regarding exception handling to make the code even more robust.
| def _estimate_total_memory_gb(self) -> Optional[float]: | ||
| bytes_per_gb = 1024 ** 3 | ||
| try: | ||
| import psutil # type: ignore | ||
|
|
||
| return psutil.virtual_memory().total / bytes_per_gb | ||
| except Exception: | ||
| pass | ||
|
|
||
| if hasattr(os, "sysconf"): | ||
| try: | ||
| page_size = os.sysconf("SC_PAGE_SIZE") | ||
| phys_pages = os.sysconf("SC_PHYS_PAGES") | ||
| if isinstance(page_size, int) and isinstance(phys_pages, int): | ||
| return (page_size * phys_pages) / bytes_per_gb | ||
| except (ValueError, OSError, AttributeError): | ||
| pass | ||
|
|
||
| if os.name == "nt": | ||
| try: | ||
| import ctypes | ||
|
|
||
| class MEMORYSTATUSEX(ctypes.Structure): | ||
| _fields_: List[tuple[str, Any]] = [ | ||
| ("dwLength", ctypes.c_ulong), | ||
| ("dwMemoryLoad", ctypes.c_ulong), | ||
| ("ullTotalPhys", ctypes.c_ulonglong), | ||
| ("ullAvailPhys", ctypes.c_ulonglong), | ||
| ("ullTotalPageFile", ctypes.c_ulonglong), | ||
| ("ullAvailPageFile", ctypes.c_ulonglong), | ||
| ("ullTotalVirtual", ctypes.c_ulonglong), | ||
| ("ullAvailVirtual", ctypes.c_ulonglong), | ||
| ("ullAvailExtendedVirtual", ctypes.c_ulonglong), | ||
| ] | ||
|
|
||
| status = MEMORYSTATUSEX() | ||
| status.dwLength = ctypes.sizeof(MEMORYSTATUSEX) | ||
| if ctypes.windll.kernel32.GlobalMemoryStatusEx(ctypes.byref(status)): | ||
| return float(status.ullTotalPhys) / bytes_per_gb | ||
| except Exception: | ||
| pass | ||
|
|
||
| return None |
There was a problem hiding this comment.
The try...except Exception: blocks used for psutil and ctypes are too broad and can hide unexpected errors. It's better to catch specific exceptions that you expect to handle. For example, psutil might fail to import, or virtual_memory() could raise an OSError. Catching Exception could swallow bugs and make debugging harder.
def _estimate_total_memory_gb(self) -> Optional[float]:
bytes_per_gb = 1024 ** 3
try:
import psutil # type: ignore
return psutil.virtual_memory().total / bytes_per_gb
except (ImportError, AttributeError, OSError):
pass
if hasattr(os, "sysconf"):
try:
page_size = os.sysconf("SC_PAGE_SIZE")
phys_pages = os.sysconf("SC_PHYS_PAGES")
if isinstance(page_size, int) and isinstance(phys_pages, int):
return (page_size * phys_pages) / bytes_per_gb
except (ValueError, OSError, AttributeError):
pass
if os.name == "nt":
try:
import ctypes
class MEMORYSTATUSEX(ctypes.Structure):
_fields_: List[tuple[str, Any]] = [
("dwLength", ctypes.c_ulong),
("dwMemoryLoad", ctypes.c_ulong),
("ullTotalPhys", ctypes.c_ulonglong),
("ullAvailPhys", ctypes.c_ulonglong),
("ullTotalPageFile", ctypes.c_ulonglong),
("ullAvailPageFile", ctypes.c_ulonglong),
("ullTotalVirtual", ctypes.c_ulonglong),
("ullAvailVirtual", ctypes.c_ulonglong),
("ullAvailExtendedVirtual", ctypes.c_ulonglong),
]
status = MEMORYSTATUSEX()
status.dwLength = ctypes.sizeof(MEMORYSTATUSEX)
if ctypes.windll.kernel32.GlobalMemoryStatusEx(ctypes.byref(status)):
return float(status.ullTotalPhys) / bytes_per_gb
except (ImportError, AttributeError, OSError):
pass
return NoneFixed issues identified by code review bot: 1. Corrected bug count from 25 to 23 bugs 2. Fixed BUG #12 location (line 120, not 155-161) and improved description 3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass) 4. Removed BUG #20 (invalid - code already implements suggested fix) 5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None) 6. Updated summary counts: Medium 7→8, Low 11→8 7. Renumbered remaining bugs after removals 8. Updated total effort estimate: 34-46h → 32-44h All bugs now verified against actual source code with correct line numbers and accurate descriptions.
* docs: Add comprehensive bug report with 25 identified issues Created docs/KNOWN_ISSUES.md documenting all bugs found during systematic codebase analysis. Bugs are categorized by severity: - Critical (3): Security vulnerabilities and data loss risks - High (4): Crashes and race conditions - Medium (7): Logic errors and resource leaks - Low (11): Edge cases and type inconsistencies Each bug includes: - File location and line numbers - Detailed description and impact assessment - Code examples and suggested fixes - Priority classification Also includes fix priority roadmap with estimated effort (34-46 hours total) and testing recommendations for each category. * fix: Correct inaccuracies in KNOWN_ISSUES.md based on code review Fixed issues identified by code review bot: 1. Corrected bug count from 25 to 23 bugs 2. Fixed BUG #12 location (line 120, not 155-161) and improved description 3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass) 4. Removed BUG #20 (invalid - code already implements suggested fix) 5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None) 6. Updated summary counts: Medium 7→8, Low 11→8 7. Renumbered remaining bugs after removals 8. Updated total effort estimate: 34-46h → 32-44h All bugs now verified against actual source code with correct line numbers and accurate descriptions. --------- Co-authored-by: Claude <noreply@anthropic.com>
No description provided.