Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process which produces a JSON cache of all public feeds #67

Open
chrisbartley opened this issue Mar 25, 2022 · 2 comments
Open

Add process which produces a JSON cache of all public feeds #67

chrisbartley opened this issue Mar 25, 2022 · 2 comments
Assignees

Comments

@chrisbartley
Copy link
Member

Sites like environmentaldata.org suffer from painfully slow load times as they try to load all of ESDR's public feeds. It might be nice to have a JSON cache of all public feeds which gets updated regularly (every 1-5 minutes?) and contains some essential, but minimal info about each feed.

Perhaps the following:

  • id
  • name
  • latitude
  • longitude
  • lastUpload / maxTimeSecs
  • exposure?

A query like this is a decent start:

select productId,
       id,
       name,
       latitude,
       longitude,
       UNIX_TIMESTAMP(lastUpload) as lastUploadSecs,
       maxTimeSecs,
       deviceId
from Feeds
where isPublic = 1
order by productId, deviceId, id desc;

Ideas to consider:

  • Store the JSON under ESDR's public directory, maybe in some subdirectory denoting it as a cache.
  • Multiple versions, with differing amounts of info.
  • Abbreviated field names in the interest of file size OR using some more compact format such as an array of arrays.
  • Group by product ID?
  • Sort by productId asc, deviceId asc, feedId desc...so that the most recent feed for a device comes first?
  • Also generate separate JSON files per product?

Possible JSON format:

{
   "version" : 1,
   "fields" : ["id", "name", "latitude", "longitude", "lastUploadSecs", "maxTimeSecs", "deviceId"],
   "feedsByProductId" : {
      "1" : [
         [26087, "West Mifflin ACHD", 40.363144, -79.864837, 1576762626, 1576686600, 26017],
         [59665, "Pittsburgh ACHD", 40.4656, -79.9611, 1648222891, 1648218600, 56652]
      ],
      "8" : [
         [4268, "CREATE Lab Test", 40.44382296127876, -79.94647309184074, 1635191877, 1635189738.36, 4260],
         [4277, "Red Room", 40.34107763959465, -80.069620013237, 1484140498, 1484084287, 4265]
      ],
      "9" : [
         [4245, "Wifi Speck 6", 40.443738, -79.946481, 1422565308, 1422565433, 4230],
         [4313, "Outdoors", 40.50156314996969, -80.06125688552856, 1432395167, 1431359910, 4291]
      ]
   }
}
@rsargent
Copy link
Member

rsargent commented Mar 25, 2022 via email

@chrisbartley
Copy link
Member Author

Important to enable gzip compression to speed loading;

Compression is on by default with all ESDR responses, but yes. I like the idea of pre-gzipping. We could also pre-gzip and configure things so that they get served by Apache without having to proxy to Node.

We should consider sharding it into 10-25 chunks to enable parallel download and incremental build of the map.

Suggestions for sharding strategies welcome and appreciated. My first thought was to shard by product ID (because some projects care about only one or a select few products), but it gets kinda dumb quickly...almost half of the products have fewer than 10 feeds, many with only 1. We currently have 113645 public feeds, but 90% of them are PurpleAir (but that will drop once the conversion to purpleair_v2 is done).

Maybe settle on a fixed number of shards--or a fixed max number of feeds per shard (5k? 10K?)--along with a "table of contents" JSON which tells you which shard(s) to fetch for which product ID. PurpleAir would be sharded into multiple, whereas some other single shard would contain everything other than the top three in the table below (a total of 4338 feeds).

+--------+---------+----------------------------------------+
|numFeeds|productId|name                                    |
+--------+---------+----------------------------------------+
|77320   |69       |PurpleAir                               |
|25486   |101      |purpleair_v2                            |
|6501    |11       |AirNow                                  |
|1076    |98       |airnow_reporting_area                   |
|797     |9        |speck_v1                                |
|659     |84       |airnow_aqi_highest_five_city            |
|241     |88       |sharp1014                               |
|241     |87       |muxed_bme280_and_sgp30                  |
|231     |22       |retigo_filestore_test4                  |
|205     |102      |TCEQ                                    |
|192     |34       |retigo_filestore_test5                  |
|110     |82       |voc_site                                |
|94      |78       |VOC                                     |
|88      |68       |RAMP                                    |
|65      |80       |AWBA_PurpleAir                          |
|59      |99       |airviz_pmvoc_a_v1                       |
|44      |35       |BAAQMD                                  |
|28      |58       |awair                                   |
|27      |8        |cattfish_v1                             |
|22      |1        |ACHD                                    |
|17      |81       |AWBA_Valero                             |
|15      |94       |RISE_smoke_recognition_v1               |
|15      |97       |RISE_smoke_recognition_v3               |
|14      |93       |AWBA_Chevron                            |
|11      |86       |AWBA_FencelineMartinez                  |
|11      |100      |clarity                                 |
|10      |75       |filtrete_smart_sensor                   |
|9       |36       |fenceline_org                           |
|8       |71       |Halmon                                  |
|5       |66       |noaa_nws_forecast_office                |
|5       |29       |cpb                                     |
|5       |85       |airnow_aqi_highest_five_ranking         |
|4       |44       |flamingo_v1                             |
|3       |39       |AcuRite                                 |
|3       |89       |smoke_recognition                       |
|2       |83       |AWBA_FencelineRodeo                     |
|2       |76       |Argos                                   |
|1       |62       |AlphaSense_OPCN2                        |
|1       |42       |shenango_channel_smoke_detection_product|
|1       |63       |plant_module                            |
|1       |14       |retigo_test_product                     |
|1       |79       |ArgosSpectrometer                       |
|1       |46       |EnviroDylosV1_1                         |
|1       |64       |hound                                   |
|1       |17       |retigo_test_product2                    |
|1       |47       |DylosDC1700                             |
|1       |95       |RISE_smoke_recognition_v2               |
|1       |26       |teom                                    |
|1       |56       |dylos_dc1100_pro_pc                     |
|1       |70       |HAL                                     |
|1       |55       |grimm_edm_180                           |
|1       |59       |air_quality_egg_pm                      |
|1       |10       |maap_5012                               |
|1       |54       |foobot                                  |
|1       |38       |avalon_wind                             |
|1       |57       |blueair_aware                           |
|1       |77       |adafruit_bme680                         |
+--------+---------+----------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants