Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GOMNO: proxy external JSON and CSV ebola2014 outbreak data as OutbreakTSS API'd feeds #18

Open
JohnTigue opened this issue Nov 19, 2014 · 5 comments

Comments

@JohnTigue
Copy link
Owner

There are very few JSON formatted sources of ebola outbreak data. They can be unified via the Outbreak APIs hopefully. At the very least, they must be proxied and made available via Outbreak API.

Relevant tool:Filtering JSON with pyjsonselect and jss

@JohnTigue
Copy link
Owner Author

#14 is related

@JohnTigue
Copy link
Owner Author

HOWTO read JSON wisely in node.js:
http://stackoverflow.com/questions/5726729/how-to-parse-json-using-node-js

@JohnTigue
Copy link
Owner Author

Really should tackle #8 first

@JohnTigue JohnTigue changed the title Collect external JSON outbreak data, ingest, provide as Outbreak APIed Proxy external JSON and CSV outbreak data as Outbreak Time Series API'd feeds Dec 6, 2014
@JohnTigue
Copy link
Owner Author

Early on the plan was to add this functionality to outbreak_time_series_reader but that is conflating things. So, now the plan is to have a site/service which does that messy stuff and "proxies" the non-OTSS JSON data into OTSS feeds. Also, by doing this experience/testing will be gained for outbreak_time_series_reader in node.js contexts

(Read the following for mini-TODOs)
From the early OTSS:
Nonetheless, in order to work with the existing data feeds (and confirm the Outbreak Time Series API would cover of the most important use cases), outbreak_time_series_reader can read from the pre-existing JSON feed.

outbreak_time_series_reader can currently read from a few non Outbreak API JSON feeds of Ebola2014 data that are on the Web. This was only simply to prove the reusability of the JavaScript object model and to hopefully encourage the adoption of the Outbreak Time Series APIs by the three known cases:

  • Simon Johnson
  • HDX
  • ebolainliberia.org

Do not expect that to continue to work moving forward (read: this is deprecated and the functionality will be removed some time in 2015). For details of the above listed ebola data feeds, see [[JSON Ebola2014 Data Found on the Web]].

2014-11-20: until HDX have time series data via an API, still need to deal with keeping [[Outbreak Time Series Specification]] data for EbolaMapper fresh. See #9 and #18

outbreak_time_series_reader can read the CSV at:
https://github.com/cmrivers/ebola/

What about:
https://github.com/luiscape/hdx-datastorer-ebola-cases/blob/master/data/data.csv
That is not getting updated frequently. So, perhaps not important. That's just one guy versus the Caitlin Rivers' team of 27 updating daily.

@JohnTigue JohnTigue changed the title Proxy external JSON and CSV outbreak data as Outbreak Time Series API'd feeds GOMNO: proxy external JSON and CSV ebola2014 outbreak data as OTSS API'd feeds Dec 9, 2014
@JohnTigue
Copy link
Owner Author

This is to be implemented with Apache Spark for two reasons.

  1. Spark 1.1 has very nice JSON reading machinery which can detect implicit schemas in JSON JsonTable.fromRDD(sqlContext,,)
  2. Currently, this isn't really a big data problem in terms of volume (but it will evolve to that) but it is in terms of variety (using IBMs 3Vs: volume, velocity, and variety). Actually this problem could involve all 3 Vs.

See:

@JohnTigue JohnTigue changed the title GOMNO: proxy external JSON and CSV ebola2014 outbreak data as OTSS API'd feeds GOMNO: proxy external JSON and CSV ebola2014 outbreak data as OutbreakTSS API'd feeds Dec 14, 2014
@JohnTigue JohnTigue added this to the Milestone 1: MVP for Web clients milestone Dec 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant