Skip to content
This repository has been archived by the owner on Dec 5, 2022. It is now read-only.

End to end integration tests for sources #3

Open
ryanblock opened this issue Apr 17, 2020 · 2 comments
Open

End to end integration tests for sources #3

ryanblock opened this issue Apr 17, 2020 · 2 comments
Assignees

Comments

@ryanblock
Copy link
Contributor

All sources scrapers (both crawl and scrape) should be subject to end to end integration tests, wherein both are exercised against live cache or internet.

Crawl: if a function, should execute and return a valid url or object containing { url, cookie }

Scrape: should load out of the live production cache and return a well-formed result.

If the cache misses, the integration test runner can invoke a crawl for that source and write it to disk locally to complete the test.

@ryanblock ryanblock changed the title End to end source integration tests End to end integration tests for sources Apr 17, 2020
@jzohrab
Copy link
Contributor

jzohrab commented Apr 17, 2020

Adding some notes in a google doc to think of some test scenarios.

@jzohrab
Copy link
Contributor

jzohrab commented Jun 12, 2020

Every test scenario in that google doc has been added except for this:

Every date in the live prod cache should be scrapable, and not return any errors. We will iterate through all dates in the cache (potentially even every date/time, which indicates a data set), and the corresponding scraper should return data. Some crawlers use an array of “subregion names” (e.g. county names) to create URLs, but then change that list over time. That would result in cache misses, which must never occur.

I still think this is necessary for successful regeneration and system/data stability.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants