Skip to content

Conversation

vlofgren
Copy link
Contributor

@vlofgren vlofgren commented Jan 19, 2025

  • Migrate away from using OkHttp in the crawler, use Java's HttpClient instead
  • Roll our own sitemap parser instead of using Apache's implementation, as that was causing memory issues

This change may give us issues with connections stuck in TIME_WAIT, as HttpClient doesn't support setting SO_LINGER.

@vlofgren vlofgren changed the title Migrate away from using OkHttp in the crawler, use Java's HttpClient instead Reduce the use of 3rd party code in the crawler Jan 20, 2025
@vlofgren vlofgren merged commit 2c67f50 into master Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant