Skip to content

Conversation

@Camier
Copy link
Member

@Camier Camier commented Nov 14, 2025

Summary

  • add package metadata and pytest configuration helpers so src imports resolve reliably during testing
  • make Excel export more robust by adding an openpyxl dependency and a CSV fallback when the writer backend is unavailable
  • refactor the semantic search, research assistant, and TUI initialization paths so heavy dependencies are mocked or lazily loaded when credentials or network access are absent

Testing

  • pytest

Codex Task

Copilot AI review requested due to automatic review settings November 14, 2025 03:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the robustness and testability of the Reddit scraper application by implementing lazy initialization patterns, adding pytest configuration helpers, and providing graceful fallbacks for Excel export.

Key Changes:

  • Introduces dependency injection and lazy initialization for ResearchAssistant, EnhancedRetrievalSystem, and SemanticSearchEngine to support mocking and deferred heavy operations
  • Adds pytest configuration with automatic path setup and default environment variables for testing
  • Implements a CSV fallback mechanism when openpyxl is unavailable, with openpyxl added as an explicit dependency

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/conftest.py Adds pytest configuration to set up import paths and default Reddit API credentials
src/tui.py Refactors TUI initialization to separate scraper and research assistant setup with better error isolation
src/research_assistant.py Adds lazy initialization support via auto_initialize parameter and factory pattern for dependency injection
src/enhanced_retrieval.py Introduces dependency injection for scraper, semantic engine, and scorer components
src/advanced_scraper.py Adds robust Excel export with CSV fallback and includes openpyxl dependency; reorganizes imports
src/init.py Creates package-level __all__ declaration for module exports
requirements.txt Adds openpyxl dependency for Excel export support
pyproject.toml Adds openpyxl dependency in Poetry configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self,
encoder=None,
model_name: str = "all-MiniLM-L6-v2",
index_factory=faiss.IndexFlatIP,
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The index_factory parameter lacks a type annotation. Consider adding a type hint:

def __init__(
    self,
    encoder=None,
    model_name: str = "all-MiniLM-L6-v2",
    index_factory: type = faiss.IndexFlatIP,
):

Or more precisely, using typing.Callable:

from typing import Callable
# ...
index_factory: Callable[[int], faiss.IndexFlatIP] = faiss.IndexFlatIP,

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +45
scraper_factory=AdvancedRedditScraper,
retrieval_system_factory=EnhancedRetrievalSystem,
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The scraper_factory and retrieval_system_factory parameters lack type annotations. Consider adding type hints for better code clarity:

from typing import Callable, Type

def __init__(
    self,
    obsidian_vault_path: Optional[str] = None,
    *,
    auto_initialize: bool = True,
    scraper_factory: Type[AdvancedRedditScraper] = AdvancedRedditScraper,
    retrieval_system_factory: Type[EnhancedRetrievalSystem] = EnhancedRetrievalSystem,
):

Or if they're meant to be callable factories:

scraper_factory: Callable[[], AdvancedRedditScraper] = AdvancedRedditScraper,
retrieval_system_factory: Callable[[], EnhancedRetrievalSystem] = EnhancedRetrievalSystem,

Copilot uses AI. Check for mistakes.

try:
self.obsidian_exporter = ObsidianExporter(obsidian_vault_path)
except (FileNotFoundError, Exception):
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Catching Exception after FileNotFoundError is redundant since FileNotFoundError is a subclass of Exception. Consider catching just Exception:

except Exception:
    # If obsidian vault is not available, disable the feature but keep other functionality
    logger.warning("Obsidian vault not available - export to Obsidian disabled")
    self.obsidian_exporter = None
Suggested change
except (FileNotFoundError, Exception):
except Exception:

Copilot uses AI. Check for mistakes.
import time
import json
import logging
import os
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate import of os module. The module is imported at line 7 and again at line 12. Remove one of these duplicate imports.

Suggested change
import os

Copilot uses AI. Check for mistakes.
Comment on lines +273 to +280
logger.warning(
"Excel export dependency missing (%s). Falling back to a CSV-formatted "
"file with an .xlsx extension at %s. Install 'openpyxl' for true Excel "
"output.",
exc,
filename,
)
self._write_tabular_fallback(data, filename)
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing CSV-formatted data to a file with an .xlsx extension (as indicated in the warning message at lines 274-276) could be confusing for users and may cause issues with programs that expect valid Excel files. Consider either:

  1. Changing the file extension to .csv when falling back
  2. Or using pandas' to_csv() method with the filename changed to use a .csv extension

Example:

csv_filename = filename.replace('.xlsx', '.csv')
self._write_tabular_fallback(data, csv_filename)
logger.warning("Excel export dependency missing. Created CSV file at %s instead.", csv_filename)
Suggested change
logger.warning(
"Excel export dependency missing (%s). Falling back to a CSV-formatted "
"file with an .xlsx extension at %s. Install 'openpyxl' for true Excel "
"output.",
exc,
filename,
)
self._write_tabular_fallback(data, filename)
# Change the filename extension to .csv for fallback
csv_filename = filename
if csv_filename.lower().endswith('.xlsx'):
csv_filename = csv_filename[:-5] + '.csv'
logger.warning(
"Excel export dependency missing (%s). Created CSV file at %s instead. "
"Install 'openpyxl' for true Excel output.",
exc,
csv_filename,
)
self._write_tabular_fallback(data, csv_filename)

Copilot uses AI. Check for mistakes.

if self.enhanced_system is None:
try:
self.enhanced_system = self._retrieval_system_factory()
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy initialization strategy may not work as intended. When EnhancedRetrievalSystem() is instantiated at line 73, its __init__ method will attempt to create a new AdvancedRedditScraper() instance if scraper is None (see line 207 in enhanced_retrieval.py). This bypasses the exception handling in _initialize_dependencies().

To properly support lazy initialization, ResearchAssistant should pass the already-initialized scraper to EnhancedRetrievalSystem:

if self.enhanced_system is None:
    try:
        self.enhanced_system = self._retrieval_system_factory(scraper=self.scraper)
    except Exception as exc:
        logger.error("Failed to initialize EnhancedRetrievalSystem: %s", exc)
        self.enhanced_system = None
Suggested change
self.enhanced_system = self._retrieval_system_factory()
self.enhanced_system = self._retrieval_system_factory(scraper=self.scraper)

Copilot uses AI. Check for mistakes.

def __init__(
self,
encoder=None,
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The encoder parameter lacks a type annotation. Consider adding a type hint for better code clarity and IDE support:

def __init__(
    self,
    encoder: Optional[sentence_transformers.SentenceTransformer] = None,
    model_name: str = "all-MiniLM-L6-v2",
    index_factory=faiss.IndexFlatIP,
):
Suggested change
encoder=None,
encoder: Optional[sentence_transformers.SentenceTransformer] = None,

Copilot uses AI. Check for mistakes.
import os
import sys
from datetime import datetime, timezone
from typing import Dict, List, Optional, Tuple
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Tuple' is not used.

Suggested change
from typing import Dict, List, Optional, Tuple
from typing import Dict, List, Optional

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants