Improve scraper initialization and test support #1

Camier · 2025-11-14T03:04:52Z

Summary

add package metadata and pytest configuration helpers so src imports resolve reliably during testing
make Excel export more robust by adding an openpyxl dependency and a CSV fallback when the writer backend is unavailable
refactor the semantic search, research assistant, and TUI initialization paths so heavy dependencies are mocked or lazily loaded when credentials or network access are absent

Testing

pytest

Copilot

Pull Request Overview

This PR improves the robustness and testability of the Reddit scraper application by implementing lazy initialization patterns, adding pytest configuration helpers, and providing graceful fallbacks for Excel export.

Key Changes:

Introduces dependency injection and lazy initialization for ResearchAssistant, EnhancedRetrievalSystem, and SemanticSearchEngine to support mocking and deferred heavy operations
Adds pytest configuration with automatic path setup and default environment variables for testing
Implements a CSV fallback mechanism when openpyxl is unavailable, with openpyxl added as an explicit dependency

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/conftest.py	Adds pytest configuration to set up import paths and default Reddit API credentials
src/tui.py	Refactors TUI initialization to separate scraper and research assistant setup with better error isolation
src/research_assistant.py	Adds lazy initialization support via `auto_initialize` parameter and factory pattern for dependency injection
src/enhanced_retrieval.py	Introduces dependency injection for scraper, semantic engine, and scorer components
src/advanced_scraper.py	Adds robust Excel export with CSV fallback and includes openpyxl dependency; reorganizes imports
src/init.py	Creates package-level `__all__` declaration for module exports
requirements.txt	Adds openpyxl dependency for Excel export support
pyproject.toml	Adds openpyxl dependency in Poetry configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-14T03:09:24Z

src/enhanced_retrieval.py

+        self,
+        encoder=None,
+        model_name: str = "all-MiniLM-L6-v2",
+        index_factory=faiss.IndexFlatIP,


[nitpick] The index_factory parameter lacks a type annotation. Consider adding a type hint:

def __init__( self, encoder=None, model_name: str = "all-MiniLM-L6-v2", index_factory: type = faiss.IndexFlatIP, ):

Or more precisely, using typing.Callable:

from typing import Callable # ... index_factory: Callable[[int], faiss.IndexFlatIP] = faiss.IndexFlatIP,

Copilot · 2025-11-14T03:09:24Z

src/research_assistant.py

+        scraper_factory=AdvancedRedditScraper,
+        retrieval_system_factory=EnhancedRetrievalSystem,


[nitpick] The scraper_factory and retrieval_system_factory parameters lack type annotations. Consider adding type hints for better code clarity:

from typing import Callable, Type def __init__( self, obsidian_vault_path: Optional[str] = None, *, auto_initialize: bool = True, scraper_factory: Type[AdvancedRedditScraper] = AdvancedRedditScraper, retrieval_system_factory: Type[EnhancedRetrievalSystem] = EnhancedRetrievalSystem, ):

Or if they're meant to be callable factories:

scraper_factory: Callable[[], AdvancedRedditScraper] = AdvancedRedditScraper, retrieval_system_factory: Callable[[], EnhancedRetrievalSystem] = EnhancedRetrievalSystem,

Copilot · 2025-11-14T03:09:25Z

src/research_assistant.py

+
        try:
            self.obsidian_exporter = ObsidianExporter(obsidian_vault_path)
        except (FileNotFoundError, Exception):


[nitpick] Catching Exception after FileNotFoundError is redundant since FileNotFoundError is a subclass of Exception. Consider catching just Exception:

except Exception: # If obsidian vault is not available, disable the feature but keep other functionality logger.warning("Obsidian vault not available - export to Obsidian disabled") self.obsidian_exporter = None

Suggested change

except (FileNotFoundError, Exception):

except Exception:

Copilot · 2025-11-14T03:09:25Z

src/advanced_scraper.py

-import time
+import json
 import logging
+import os


Duplicate import of os module. The module is imported at line 7 and again at line 12. Remove one of these duplicate imports.

Suggested change

import os

Copilot · 2025-11-14T03:09:25Z

src/advanced_scraper.py

+            logger.warning(
+                "Excel export dependency missing (%s). Falling back to a CSV-formatted "
+                "file with an .xlsx extension at %s. Install 'openpyxl' for true Excel "
+                "output.",
+                exc,
+                filename,
+            )
+            self._write_tabular_fallback(data, filename)


Writing CSV-formatted data to a file with an .xlsx extension (as indicated in the warning message at lines 274-276) could be confusing for users and may cause issues with programs that expect valid Excel files. Consider either:

Changing the file extension to .csv when falling back

Or using pandas' to_csv() method with the filename changed to use a .csv extension

Example:

csv_filename = filename.replace('.xlsx', '.csv') self._write_tabular_fallback(data, csv_filename) logger.warning("Excel export dependency missing. Created CSV file at %s instead.", csv_filename)

Suggested change

logger.warning(

"Excel export dependency missing (%s). Falling back to a CSV-formatted "

"file with an .xlsx extension at %s. Install 'openpyxl' for true Excel "

"output.",

exc,

filename,

)

self._write_tabular_fallback(data, filename)

# Change the filename extension to .csv for fallback

csv_filename = filename

if csv_filename.lower().endswith('.xlsx'):

csv_filename = csv_filename[:-5] + '.csv'

logger.warning(

"Excel export dependency missing (%s). Created CSV file at %s instead. "

"Install 'openpyxl' for true Excel output.",

exc,

csv_filename,

)

self._write_tabular_fallback(data, csv_filename)

Copilot · 2025-11-14T03:09:26Z

src/research_assistant.py

+
+        if self.enhanced_system is None:
+            try:
+                self.enhanced_system = self._retrieval_system_factory()


The lazy initialization strategy may not work as intended. When EnhancedRetrievalSystem() is instantiated at line 73, its __init__ method will attempt to create a new AdvancedRedditScraper() instance if scraper is None (see line 207 in enhanced_retrieval.py). This bypasses the exception handling in _initialize_dependencies().

To properly support lazy initialization, ResearchAssistant should pass the already-initialized scraper to EnhancedRetrievalSystem:

if self.enhanced_system is None: try: self.enhanced_system = self._retrieval_system_factory(scraper=self.scraper) except Exception as exc: logger.error("Failed to initialize EnhancedRetrievalSystem: %s", exc) self.enhanced_system = None

Suggested change

self.enhanced_system = self._retrieval_system_factory()

self.enhanced_system = self._retrieval_system_factory(scraper=self.scraper)

Copilot · 2025-11-14T03:09:26Z

src/enhanced_retrieval.py

+
+    def __init__(
+        self,
+        encoder=None,


[nitpick] The encoder parameter lacks a type annotation. Consider adding a type hint for better code clarity and IDE support:

def __init__( self, encoder: Optional[sentence_transformers.SentenceTransformer] = None, model_name: str = "all-MiniLM-L6-v2", index_factory=faiss.IndexFlatIP, ):

Suggested change

encoder=None,

encoder: Optional[sentence_transformers.SentenceTransformer] = None,

Copilot · 2025-11-14T03:09:26Z

src/enhanced_retrieval.py

+import os
+import sys
 from datetime import datetime, timezone
+from typing import Dict, List, Optional, Tuple


Import of 'Tuple' is not used.

Suggested change

from typing import Dict, List, Optional, Tuple

from typing import Dict, List, Optional

Improve scraper initialization and test support

a283012

Copilot AI review requested due to automatic review settings November 14, 2025 03:04

Camier added the codex label Nov 14, 2025 — with ChatGPT Codex Connector

Copilot started reviewing on behalf of Camier November 14, 2025 03:05 View session

Copilot finished reviewing on behalf of Camier November 14, 2025 03:08

Copilot AI reviewed Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve scraper initialization and test support #1

Improve scraper initialization and test support #1

Uh oh!

Camier commented Nov 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		scraper_factory=AdvancedRedditScraper,
		retrieval_system_factory=EnhancedRetrievalSystem,

-            logger.warning(
-                "Excel export dependency missing (%s). Falling back to a CSV-formatted "
-                "file with an .xlsx extension at %s. Install 'openpyxl' for true Excel "
-                "output.",
-                exc,
-                filename,
-            )
-            self._write_tabular_fallback(data, filename)
+            # Change the filename extension to .csv for fallback
+            csv_filename = filename
+            if csv_filename.lower().endswith('.xlsx'):
+                csv_filename = csv_filename[:-5] + '.csv'
+            logger.warning(
+                "Excel export dependency missing (%s). Created CSV file at %s instead. "
+                "Install 'openpyxl' for true Excel output.",
+                exc,
+                csv_filename,
+            )
+            self._write_tabular_fallback(data, csv_filename)

	self.enhanced_system = self._retrieval_system_factory()
	self.enhanced_system = self._retrieval_system_factory(scraper=self.scraper)

	encoder=None,
	encoder: Optional[sentence_transformers.SentenceTransformer] = None,

	from typing import Dict, List, Optional, Tuple
	from typing import Dict, List, Optional

Improve scraper initialization and test support #1

Are you sure you want to change the base?

Improve scraper initialization and test support #1

Uh oh!

Conversation

Camier commented Nov 14, 2025

Summary

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants