The parser currently fails tuples when an exception is caught. This is probably not the right behaviour as the same URL will be refetched and refail again later (unless of course). What we could do instead would be to use the status stream to mark the URL as failed and keep track of the reason why it did so. Whichever component is in charge of persisting the URL status can then decide on what to do with it.
This is related to the discussion in #42
The same logic could be applied to fetch failures as well. Instead of failing them and let the spout handle the logic of keeping track of the number of errors we'd send to the status stream. The advantage of doing this is that the spout wouldn't have to update anything and would just read.
The parser currently fails tuples when an exception is caught. This is probably not the right behaviour as the same URL will be refetched and refail again later (unless of course). What we could do instead would be to use the
statusstream to mark the URL as failed and keep track of the reason why it did so. Whichever component is in charge of persisting the URL status can then decide on what to do with it.This is related to the discussion in #42
The same logic could be applied to fetch failures as well. Instead of failing them and let the spout handle the logic of keeping track of the number of errors we'd send to the status stream. The advantage of doing this is that the spout wouldn't have to update anything and would just read.