-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue95: SSLExceptions Handled #99
Conversation
public class FetchedResultHandler implements HttpDownloader.Callback { | ||
|
||
private static final Logger logger = LoggerFactory.getLogger(FetchedResultHandler.class); | ||
|
||
private Storage targetStorage; | ||
private LinkStorage linkStorage; | ||
private final Set<String> sslExceptions = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data will be lost when the crawler is restarted.
|
||
public FetchedResultHandler(Storage targetStorage) { | ||
public FetchedResultHandler(Storage linkStorage, Storage targetStorage) { | ||
this.linkStorage = (LinkStorage) linkStorage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not do this cast until do some other changes in the crawler architecture, because it will break some other parts of the crawler. Until we make this all interactions with the storages need to be done via the method public Object insert(Object obj)
.
logger.info("Failed to download URL: {}\n>Reason: {}", link.getURL().toString(), e.getMessage()); | ||
} | ||
} | ||
|
||
private void handleSSLExceptions(LinkRelevance link){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of inserting a new URL in the link storage, fixing the Fetcher to work with HTTPS redirects would be a more robust solution.
Can you invest more time in finding a solution that involves changing only the fetcher, i.e, making sure that the fetcher is be able to follow redirects to https links automatically without involving the other parts of the crawler?
That may involve configuring correctly the Apache httpclient library inside the SimpleHttpFetcher class.
No description provided.