Skip to content

Conversation

@konard
Copy link
Collaborator

@konard konard commented Oct 25, 2025

Summary

This PR adds comprehensive integration tests to verify that the web-capture service can successfully download real-world content from Habr.com (a Russian tech publication) using both Puppeteer and Playwright browser engines.

What Was Tested

The tests verify that we can download the Habr article at https://habr.com/ru/articles/895896 in both supported formats:

  1. Markdown conversion - Verify HTML can be fetched and converted to markdown
  2. PNG screenshots - Verify screenshots can be captured with proper image format

Implementation Details

New Files

  • tests/integration/habr-article.test.js - Integration tests for Habr article downloads (5 test cases)
    • Puppeteer markdown download test
    • Puppeteer image screenshot test
    • Playwright markdown download test
    • Playwright image screenshot test
    • Engine comparison test

Modified Files

  • jest.config.mjs - Added **/tests/integration/**/*.test.js to testMatch patterns
  • src/browser.js - Fixed Playwright adapter to properly handle browser context creation and setUserAgent limitation

Test Results

All 5 new tests pass successfully:

PASS tests/integration/habr-article.test.js (76.873 s)
  Habr Article Download Tests
    Puppeteer Engine
      ✓ can download Habr article as markdown (11966 ms)
      ✓ can download Habr article as image screenshot (8658 ms)
    Playwright Engine
      ✓ can download Habr article as markdown (6918 ms)
      ✓ can download Habr article as image screenshot (11104 ms)
    Engine Comparison for Habr Article
      ✓ both engines can successfully download the same Habr article (11244 ms)

Technical Notes

Playwright User Agent Limitation

During implementation, I discovered that Playwright doesn't support setUserAgent() after page creation (unlike Puppeteer). The user agent must be set during browser context creation. I updated the browser abstraction layer to:

  1. Create a browser context during Playwright browser initialization
  2. Add a warning when setUserAgent() is called on Playwright pages (as it has no effect)
  3. This maintains API compatibility while documenting the limitation

This is acceptable because:

  • Most use cases don't require custom user agents
  • When needed, the user agent can be set via browser context options
  • The warning helps developers understand the limitation

Test Strategy

The tests use domcontentloaded instead of networkidle0 to avoid timeouts on complex pages. The tests verify:

  • HTML content is fetched (> 1000 characters)
  • Markdown conversion produces valid output (> 100 characters with markdown syntax)
  • Screenshots are valid PNG images (correct signature and size)
  • Both engines produce similar results for the same URL

Fixes

Fixes #7


🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: undefined
@konard konard self-assigned this Oct 25, 2025
…r and Playwright

This commit adds integration tests to verify that we can download the Habr article
(https://habr.com/ru/articles/895896) as both markdown and PNG screenshots using all
supported browser engines (Puppeteer and Playwright).

Changes:
- Added tests/integration/habr-article.test.js with 5 test cases:
  * Puppeteer markdown download test
  * Puppeteer image screenshot test
  * Playwright markdown download test
  * Playwright image screenshot test
  * Engine comparison test verifying both engines work correctly

- Updated jest.config.mjs to include integration tests directory

- Fixed browser.js Playwright adapter to properly handle context creation
  and setUserAgent limitation (Playwright requires user agent to be set
  during context creation, not after page creation)

All tests pass successfully, verifying that both engines can download
real-world content from Habr.com including Russian language articles.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard konard marked this pull request as ready for review October 25, 2025 08:22
@konard
Copy link
Collaborator Author

konard commented Oct 25, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (275KB)
🔗 View complete solution draft log


Now working session is ended, feel free to review and add any feedback on the solution draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the test that we can actually download habr article (markdown and image): https://habr.com/ru/articles/895896 using all our support engines

1 participant