Skip to content

feat: use manual proxy rotation for web scraping 🔁#765

Merged
ramiAbdou merged 2 commits intomainfrom
rami/scraping
Mar 14, 2025
Merged

feat: use manual proxy rotation for web scraping 🔁#765
ramiAbdou merged 2 commits intomainfrom
rami/scraping

Conversation

@ramiAbdou
Copy link
Member

@ramiAbdou ramiAbdou commented Mar 14, 2025

Description ✏️

This PR will help reduce costs for web scraping by using a set of dedicated proxies instead of a browser instance that various SaaS companies offer. Now that we have a dedicated set of proxies, we need to use them with puppeteer.

The key here is we need to tell puppeteer we are using a proxy server, then we need to authenticate with it, ie:

const context = await browser.createBrowserContext({ proxyServer: proxy });

const page = await context.newPage();

await page.authenticate({
  password: OXYLABS_PASSWORD,
  username: OXYLABS_USERNAME,
});

This PR:

  • Introduces the OXYLABS_USERNAME, OXYLABS_PASSWORD, and OXYLABS_PROXIES environment variables.
  • Removes the DEFAULT_TIMEOUT on the puppeteer settings (already defaults to 30 seconds).
  • Uses the user-agents package for more realistic and more random user agent headers.

Type of Change 🐞

  • Feature - A non-breaking change which adds functionality.
  • Fix - A non-breaking change which fixes an issue.
  • Refactor - A change that neither fixes a bug nor adds a feature.
  • Documentation - A change only to in-code or markdown documentation.
  • Tests - A change that adds missing unit/integration tests.
  • Chore - A change that is likely none of the above.

Checklist ✅

  • I have done a self-review of my code.
  • I have manually tested my code (if applicable).
  • I have added/updated any relevant documentation (if applicable).

@ramiAbdou ramiAbdou self-assigned this Mar 14, 2025
@ramiAbdou ramiAbdou marked this pull request as ready for review March 14, 2025 08:41
@ramiAbdou ramiAbdou merged commit 9dd5b8e into main Mar 14, 2025
2 checks passed
@ramiAbdou ramiAbdou deleted the rami/scraping branch March 14, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant