issue95: SSLExceptions Handled #99

maqzi · 2017-06-21T03:48:30Z

No description provided.

coveralls · 2017-06-21T03:59:37Z

Coverage increased (+1.4%) to 47.935% when pulling 498fdb0 on maqzi:issue95 into 760bf1c on ViDA-NYU:master.

coveralls · 2017-06-21T05:45:38Z

Coverage increased (+0.1%) to 46.656% when pulling fb1ff0f on maqzi:issue95 into 760bf1c on ViDA-NYU:master.

aecio · 2017-06-21T16:56:39Z

src/main/java/focusedCrawler/crawler/async/FetchedResultHandler.java

 public class FetchedResultHandler implements HttpDownloader.Callback {

    private static final Logger logger = LoggerFactory.getLogger(FetchedResultHandler.class);

    private Storage targetStorage;
+    private LinkStorage linkStorage;
+    private final Set<String> sslExceptions = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());


This data will be lost when the crawler is restarted.

aecio · 2017-06-21T16:59:25Z

src/main/java/focusedCrawler/crawler/async/FetchedResultHandler.java


-    public FetchedResultHandler(Storage targetStorage) {
+    public FetchedResultHandler(Storage linkStorage, Storage targetStorage) {
+        this.linkStorage = (LinkStorage) linkStorage;


We should not do this cast until do some other changes in the crawler architecture, because it will break some other parts of the crawler. Until we make this all interactions with the storages need to be done via the method public Object insert(Object obj).

aecio · 2017-06-21T17:09:52Z

src/main/java/focusedCrawler/crawler/async/FetchedResultHandler.java

            logger.info("Failed to download URL: {}\n>Reason: {}", link.getURL().toString(), e.getMessage());
        }
    }
-
+    private void handleSSLExceptions(LinkRelevance link){


Instead of inserting a new URL in the link storage, fixing the Fetcher to work with HTTPS redirects would be a more robust solution.

Can you invest more time in finding a solution that involves changing only the fetcher, i.e, making sure that the fetcher is be able to follow redirects to https links automatically without involving the other parts of the crawler?

That may involve configuring correctly the Apache httpclient library inside the SimpleHttpFetcher class.

SSLExceptions Handled

498fdb0

maqzi added 2 commits June 21, 2017 01:34

SSLExceptions Handled - UnitTest added

0237d33

SSLExceptions Handled - UnitTest added

fb1ff0f

maqzi changed the title ~~SSLExceptions Handled~~ issue95: SSLExceptions Handled Jun 21, 2017

aecio reviewed Jun 21, 2017

View reviewed changes

maqzi closed this Jul 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue95: SSLExceptions Handled #99

issue95: SSLExceptions Handled #99

maqzi commented Jun 21, 2017

coveralls commented Jun 21, 2017

coveralls commented Jun 21, 2017

aecio Jun 21, 2017

aecio Jun 21, 2017

aecio Jun 21, 2017

issue95: SSLExceptions Handled #99

issue95: SSLExceptions Handled #99

Conversation

maqzi commented Jun 21, 2017

coveralls commented Jun 21, 2017

coveralls commented Jun 21, 2017

aecio Jun 21, 2017

Choose a reason for hiding this comment

aecio Jun 21, 2017

Choose a reason for hiding this comment

aecio Jun 21, 2017

Choose a reason for hiding this comment