Retain all relevant header information for resources #68

nightsh · 2020-03-17T17:09:08Z

As we're not downloading resources, we currently have no way of knowing some basic information about them without hitting each URL.

However, downloading would be a rather costly action, both in time and disk space. But we can probably get the headers info only.

An idea would be to use Scrapy's cache, if possible, but we need to investigate.

Examples of useful headers to fetch for each downloadable file:

Acceptance criteria:

nightsh · 2020-03-23T10:13:03Z

ETA: 3h

nightsh · 2020-03-30T07:29:05Z

Implemented in #87, pending review and merge.

nightsh self-assigned this Mar 17, 2020

nightsh mentioned this issue Mar 17, 2020

Gather data insights from all scrapers output #52

Closed

9 tasks

nightsh mentioned this issue Mar 30, 2020

Collect headers for resources #87

Merged

osahon-okungbowa closed this as completed in #87 Mar 30, 2020

Provide feedback