Support data source pagination #1

jzohrab · 2020-04-16T21:49:51Z

**Description.

In coviddatascraper, PR covidatlas/coronadatascraper#835 provides support for ArcGIS data pagination. Some json result sets are too big to return in a single response, so the requests will need to manage that. Presumably, similar to GitHub API, they provide a "nextResultSet" token or similar in the response, and then clients can requery with that as a token.

We'd need to manage that for both crawls and scrapes. Presumably this could be managed with lambdas, but the cache file naming convention will need to be page-aware, and return all files.

Describe the solution you'd like

One possibility: include page number, indexed from zero, after the cache key (or name), e.g., <datetime>-<name>-<page>-<sha>.<ext>.gz. If there is only one page (which will be true in most cases), 'page' would be 0 and there won't be any other data sets, and the thing passed to scrape would just be the content.

The text was updated successfully, but these errors were encountered:

camjc · 2020-05-13T21:06:35Z

JP is a good example, we hit the 10,000 limit there

camjc · 2020-05-13T21:17:54Z

https://services8.arcgis.com/JdxivnCyd1rvJTrY/arcgis/rest/services/v2_covid19_list_csv/FeatureServer/0/query lets you query the dataset from a UI. Here are the main settings to get the JSON we use

camjc · 2020-05-13T21:22:47Z

Hoping someone can advise on how the pagination works, I don't know how it does.

ryanblock · 2020-05-13T22:02:22Z

Agreed. I know nothing about this system. I could really use:

A broken source that needs this
Clear instructions on how it's broken (eg steps to repro)
If available, any ideas on how to unbreak things and get the source humming

jzohrab · 2020-05-14T00:56:35Z

Perhaps relevant: covidatlas/coronadatascraper#839 - "WIP: add scraper for Singapore ARCGIS" There are a few PRs for arcgis pagination open in CDS, hard to sort out what's what though. El mié., 13 de may. de 2020 a la(s) 18:02, Ryan Block ( notifications@github.com) escribió:

…

Agreed. I know nothing about this system. I could really use: 1. A broken source that needs this 2. Clear instructions on how it's broken (eg steps to repro) 3. If available, any ideas on how to unbreak things and get the source humming — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMPWDOANEH2DYRIWWXO2C3RRMKHXANCNFSM4MKGX6LA> .

jzohrab · 2020-05-17T16:18:09Z

Actually relevant: currently, the PA scraper fetches paginated data. It handles this itself. e.g.

$ yarn start --location PA
...
1000 records from "... url ...&resultOffset=0&resultRecordCount=50000&f=json
...
✏️  coronadatascraper-cache/2020-5-17/55c884a3dc1f7fe60c5bb08af5371500.json written
1000 records from "... url ... &resultOffset=1000&resultRecordCount=50000&f=json
...
✏️  coronadatascraper-cache/2020-5-17/de03720e752a8e6478e066e3cb308ee2.json written
1000 records from "... url ... &resultOffset=2000&resultRecordCount=50000&f=json
...

ref src/shared/scrapers/PA/
method async function TEMPfetchArcGISJSON(obj, featureURL, date) {

jzohrab · 2020-05-28T17:48:10Z

WIP PR will soon close this: https://github.com/covidatlas/li/pull/193/files

jzohrab · 2020-05-31T19:34:07Z

New PR: #218.

jzohrab · 2020-06-12T15:27:52Z

PR merged 🎉

jzohrab added the enhancement New feature or request label Apr 16, 2020

ryanblock self-assigned this Apr 16, 2020

jzohrab self-assigned this May 28, 2020

jzohrab closed this as completed Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support data source pagination #1

Support data source pagination #1

jzohrab commented Apr 16, 2020

camjc commented May 13, 2020

camjc commented May 13, 2020

camjc commented May 13, 2020

ryanblock commented May 13, 2020

jzohrab commented May 14, 2020 via email

jzohrab commented May 17, 2020

jzohrab commented May 28, 2020

jzohrab commented May 31, 2020

jzohrab commented Jun 12, 2020

Support data source pagination #1

Support data source pagination #1

Comments

jzohrab commented Apr 16, 2020

camjc commented May 13, 2020

camjc commented May 13, 2020

camjc commented May 13, 2020

ryanblock commented May 13, 2020

jzohrab commented May 14, 2020 via email

jzohrab commented May 17, 2020

jzohrab commented May 28, 2020

jzohrab commented May 31, 2020

jzohrab commented Jun 12, 2020