Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue95: SSLExceptions Handled #99

Closed
wants to merge 3 commits into from
Closed

Conversation

maqzi
Copy link
Member

@maqzi maqzi commented Jun 21, 2017

No description provided.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.4%) to 47.935% when pulling 498fdb0 on maqzi:issue95 into 760bf1c on ViDA-NYU:master.

@maqzi maqzi changed the title SSLExceptions Handled issue95: SSLExceptions Handled Jun 21, 2017
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 46.656% when pulling fb1ff0f on maqzi:issue95 into 760bf1c on ViDA-NYU:master.

public class FetchedResultHandler implements HttpDownloader.Callback {

private static final Logger logger = LoggerFactory.getLogger(FetchedResultHandler.class);

private Storage targetStorage;
private LinkStorage linkStorage;
private final Set<String> sslExceptions = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data will be lost when the crawler is restarted.


public FetchedResultHandler(Storage targetStorage) {
public FetchedResultHandler(Storage linkStorage, Storage targetStorage) {
this.linkStorage = (LinkStorage) linkStorage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do this cast until do some other changes in the crawler architecture, because it will break some other parts of the crawler. Until we make this all interactions with the storages need to be done via the method public Object insert(Object obj).

logger.info("Failed to download URL: {}\n>Reason: {}", link.getURL().toString(), e.getMessage());
}
}

private void handleSSLExceptions(LinkRelevance link){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of inserting a new URL in the link storage, fixing the Fetcher to work with HTTPS redirects would be a more robust solution.

Can you invest more time in finding a solution that involves changing only the fetcher, i.e, making sure that the fetcher is be able to follow redirects to https links automatically without involving the other parts of the crawler?

That may involve configuring correctly the Apache httpclient library inside the SimpleHttpFetcher class.

@maqzi maqzi closed this Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants