Create a new DisplayField Schema #264

vforgione · 2018-03-16T18:43:32Z

We should add another schema that holds information about what fields should be displayed on search results.

For most cases, this is similar to the virtual fields and constraints: give the owner a list of the fields attached to the meta. Where this gets tricky is with jsonb fields. For these, we need to inspect unique paths within the JSON and provide them as well.

For example, if the data set has fields id, lat, lon, timestamp and observations, where observations is a JSON field with the following structure:

{
  "sensor 1": {
    "temperature": "whatever"
  },
  "sensor 2": {
    "temperature": "whatever",
    "humidity": "whatever
  }
}

we should display:

id
lat
lon
timestamp
observations.sensor1.temperature
observations.sensor2.temperature
observations.sensor2.humidity

The text was updated successfully, but these errors were encountered:

HeyZoos · 2018-03-16T18:52:17Z

When you say display do you mean API results or frontend results? Would this schema be used to return json objects that look like:

{
    "observations.sensor1.temperature": 1,
    "observations.sensor2.temperature": 2,
    "observations.sensor2.humidity": 3
}

vforgione · 2018-03-16T18:56:00Z

Front end results: search specifically. The way the graphs are shown right now in the general explorer app is I grab the first few numeric fields of a data set and graph them out, regardless if they are significant to the actual data set. It's a very naive approach.

In the case of AoT, or any other data set that has embedded JSON, I use a dummy set of fields and values to draw the graphs.

What I want to be able to do is give data set owners the ability to highlight significant fields in their data sets to be used in displaying search results.

HeyZoos · 2018-03-16T19:03:24Z

So a DisplayField should give me the information I need to dive down into a json field and extract a value?

HeyZoos · 2018-03-16T19:06:32Z

I think it's a little overkill, especially if it's just for displaying search results. I could see providing those values through a simple query with json_object_keys and the json operators postgres provides.

vforgione · 2018-03-16T19:07:43Z

Kind of. I'm thinking of it like this,

fields = DisplayFieldActions.list(for_meta: meta)
query = """
select
  <%= for field <- fields do %>
  avg(<%= field %>,
  <% end %>
  ...
"""

or something less crappy. You get the idea though.

Edit: ideally, we would use an ecto query and the model for the data set from the registry.

vforgione · 2018-03-16T19:11:07Z

The thing is, I have no way of limiting fields to display in search results without some arbitrary cut off. Especially when it comes to jsonb fields, doing a full introspection and then graphing those values not only looks like garbage on the front end, but can potentially be dangerously expensive to the DB and the web server.

This is a simple solution to that.

HeyZoos · 2018-03-16T19:21:23Z

Do you mean limiting the number of fields to display? I feel like the cutoff is still arbitrary with the new schema. I also don't see how it's expensive for the database. We assume that the json values all have the same "shape", so it's a matter of grabbing the first row, deriving the structure from that, and using that information in your subsequent queries.

In regards to the snippet you posted, you'd grab the fields value with a postgres query that maps the key structure of the first row. It's the same number of trips to the database but without a new schema that introduces 1:M (Meta to DisplayField) and 1:1 (DisplayField to DataSetField) relationships.

vforgione · 2018-03-16T19:51:41Z

Here are some counter examples. In the case of Beach Lab Culture Tests we have the following rows;

Name	Type
Beach	text
Culture Note	text
Culture Reading Mean	float
Culture Sample 1 Reading	float
Culture Sample 1 Timestamp	timestamptz
Culture Sample 2 Reading	float
Culture Sample 2 Timestamp	timestamptz
Culture Sample Interval	integer
Culture Test ID	integer
DNA Reading Mean	float
DNA Sample 1 Reading	float
DNA Sample 2 Reading	float
DNA Sample Timestamp	timestamptz
DNA Test ID	integer
Latitude	float
Location	text
Longitude	float

These rows include DNA testing data because the source of the data didn't exclude them from the results for whatever reason. If we were to display every numeric field in here it would be 1) crowded given the number of fields and 2) include fields that have little to no relevance to the core of the data set.

In the case of Array of Things data, not every observation sub-document contains identical keys. It's true that most nodes have a relatively stable set of sensors, but as the builds continue boards will be added and removed. Additionally, on the back end of AoT they scrub certain malformed data (such as temperatures that read 98 bajillion degrees). If we were to naively grab the first row we could not guarantee that it is an accurate representation of the data set. Additionally, jsonb_object_keys only gets the top level keys:

 node_id |       array_agg        
---------+------------------------
 034     | {BMP180,HTU21D,TSYS01}
 056     | {BMP180,HTU21D,TSYS01}
 032     | {BMP180,HTU21D,TSYS01}
 048     | {BMP180,HTU21D,TSYS01}
 08D     | {BMP180,HTU21D,TSYS01}
 091     | {BMP180,HTU21D,TSYS01}
 03D     | {HTU21D,TSYS01}
 05D     | {BMP180,HTU21D,TSYS01}
 053     | {BMP180,HTU21D,TSYS01}
 086     | {BMP180,HTU21D,TSYS01}
 06B     | {BMP180,HTU21D,TSYS01}
 037     | {BMP180,HTU21D,TSYS01}
 00D     | {BMP180,HTU21D,TSYS01}
...

Note that this is an aggregate of all the records. For most of these nodes they only had 2 sensors reporting in for a while - only recently did a third board start reporting in, and there's like a dozen others still to come.

So, to me, having a predetermined, limited set of fields that we pull from to draw graphs for search results greatly simplifies mitigating these pitfalls.

vforgione · 2018-04-02T19:58:58Z

Idea to surface key paths from embedded JSONB fields:

create or replace function jsonb_key_paths(_value jsonb)
  returns text[]
  language sql as
$$
with recursive _struct (key, value) as (

  select
    t.key,
    t.value

  from
    jsonb_each(_value) as t

  union all

  select
    concat(_struct.key, '.', t.key),
    t.value

  from
    _struct,
    jsonb_each(
      case
        when jsonb_typeof(_struct.value) <> 'object'
        then '{}'::jsonb
        else _struct.value
      end
    ) as t

)

select array(
  select distinct key
  from _struct
  where jsonb_typeof(_struct.value) not in ('array', 'object')
  order by key
)
$$;

In the case of AoT Chicago, we have a data set table with an observations field that has non-deterministic depth and breadth data in JSON format. We want to be able to surface those embedded keys so that we can highlight them in graphs. Using a function like this one, we can show the data set owner a list of keys that exist in the data set that they can use to build graphs.

For example, the result of the function is

select node_id, timestamp, observations, jsonb_key_paths(observations) from "ds_no3g4DGcITlkTTYu" limit 10;

 node_id |       timestamp        |                                                              observations                                                              |                              jsonb_key_paths
---------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------
 02F     | 2018-03-16 09:19:17-05 | {"BMP180": {"temperature": -1.65}, "HTU21D": {"humidity": 71.94, "temperature": -0.78}, "TSYS01": {"temperature": -1.1}}               | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:19:40-05 | {"BMP180": {"temperature": -1.6}, "HTU21D": {"humidity": 72.52, "temperature": -0.72}, "TSYS01": {"temperature": -1.04}}               | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:20:03-05 | {"BMP180": {"temperature": -1.7000000000000002}, "HTU21D": {"humidity": 71.33, "temperature": -0.79}, "TSYS01": {"temperature": -1.1}} | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:20:29-05 | {"BMP180": {"temperature": -1.7000000000000002}, "HTU21D": {"humidity": 71.29, "temperature": -0.8}, "TSYS01": {"temperature": -1.15}} | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:20:52-05 | {"BMP180": {"temperature": -1.65}, "HTU21D": {"humidity": 72.51, "temperature": -0.72}, "TSYS01": {"temperature": -1.07}}              | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:21:15-05 | {"BMP180": {"temperature": -1.6}, "HTU21D": {"humidity": 72.33, "temperature": -0.75}, "TSYS01": {"temperature": -1.09}}               | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:21:39-05 | {"BMP180": {"temperature": -1.65}, "HTU21D": {"humidity": 72.3, "temperature": -0.74}, "TSYS01": {"temperature": -1.1}}                | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:22:02-05 | {"BMP180": {"temperature": -1.6}, "HTU21D": {"humidity": 72.8, "temperature": -0.72}, "TSYS01": {"temperature": -1.08}}                | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:22:26-05 | {"BMP180": {"temperature": -1.6}, "HTU21D": {"humidity": 73.07, "temperature": -0.67}, "TSYS01": {"temperature": -1.02}}               | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
 02F     | 2018-03-16 09:22:50-05 | {"BMP180": {"temperature": -1.6}, "HTU21D": {"humidity": 72.77, "temperature": -0.6900000000000001}, "TSYS01": {"temperature": -1.05}} | {BMP180.temperature,HTU21D.humidity,HTU21D.temperature,TSYS01.temperature}
(10 rows)

which we can further aggregate to get a master list of keys for a given data set.

Or, a more simple example:

plenario_dev=# SELECT jsonb_key_paths('{"foo":{"bar":"baz","boom":"bang","bleh":{"barf":"yack"}}, "top": "level"}');
           jsonb_key_paths            
--------------------------------------
 {foo.bar, foo.bleh.barf, foo.boom, top}
(1 row)

Adding a new schema so that data set owners and admins can store metadata about data sets to more easily generate charts/graphs. - migration for schema - migration that adds function to recursively name subkeys in jsonb fields - schema file with function to surface fields - test for function Updates #264

- added changesets, queries, actions - wrote tests for actions Updates #264

give it: - a chart - a truncation span ("minutes", "hours", "days", etc) - (optional) time span - (optional) bbox and you get aggregated data for the truncated timestamps Updates #264

can now get results aggregated to points Updates #264

the ranges came out uneven so i updated the query and made it work with the series data rather than min/max of the data in the bucket. it's a little heavier in run time, but it makes the output much cleaner and easier to comprehend. also added tests for limiting to time range and bbox. updates #264

- added web stuff for chart meta - add to meta on show page - added basic display for bucket data the bucket data seems to be kind of wonky: it doesn't vary at all and shows only straight lines. i need to investigate this further. updates #264

The way we're currently aggregating data and rendering charts is awful. It needed to be completely rebuilt. The new schemas, controllers, etc allow users to create default chart renderings for their data sets that can then be used for display in search results. Stuff Completed: - new `Chart` schema for top level metadata - new `ChartDataset` schema for aggregate metadata - controllers, views, templates to CRUD the schemas - special endpoint for rendering the graph serverside Stuff To Do: - use charts in explorer - provide api access to aggregated data (json instead of rendered html) - add in functionality for use with leaflet - add text fields to views Regarding _add text fields to views_, for some charts we could really use being able to aggregate on text fields. Leaving them off the views made sense at first, as we only wanted to use them to filter non-text data. But now we've come to a point where it makes sense to have it in there. It will be beneficial to charting, but it will also allow us to start adding trigram indexes so we can offer basic text search capabilities as well -- both in the explorer and via the API. Updates #264

I added text fields to the data set views to enable aggregation around text values in charts. I also added a new index for each text field so that we can perform full text search later. For now, we're doing nothing with it, but later we can start using it as the indexes will be there and populated moving forward. This will require another migration to enable the Postgres Trigram extension. We will also have to do manual migrations to rebuild the views: ```elixir ids = MetaActions.list(ready_only: true) |> Enum.map(& &1.id) ids |> Enum.map(&MetaActions.get/1) |> Enum.each(&DataSetActions.down!/1) ids |> Enum.map(&MetaActions.get/1) |> Enum.each(&DataSetActions.up!/1) ids |> Enum.map(&MetaActions.get/1) |> Enum.each(&PlenarioEtl.import_data_set_on_demand/1) ``` Updates #264

I needed to add in functionality to limit queries to bboxes and time ranges. It also made sense to apply aggregate granularity to the data sets that are being aggregated by date trunc. The unfortunate thing about Ecto is that it can be really, really dumb. I mean part of its strength is that it's crazy versatile and you can throw a lot at it, but the fragments don't like dynamic values, so I had to build a billion different functions to handle granularity. Other than that, this is pretty straight forward -- just a lot of code to cover the cases where dynamic fragments would have been helpful. Updates #264

So the entire point of this was to render charts that didn't suck and had a little meaning to them in the search results. We've got that now. Additions: - new `with_charts` option for getting metas - totally wiped and redid the explorer controller - made small chages here and there to glue it together To Do: - make the charts smaller - clean up my god awful js - add an api endpoint for the chart aggregates Updates #264

I added a new JS dependency to throttle the calls and cleaned up the script. This should be good now. Updates #264

The results charts are now limited to the bounds given by the search in the form of the explorer. Updates #264

vforgione added the core label Mar 16, 2018

vforgione added this to the Data Set Fix Ups milestone Mar 16, 2018

tomschenkjr mentioned this issue Apr 2, 2018

OpenGrid and Plenario misalignment Chicago/opengrid#313

Open

vforgione mentioned this issue Apr 2, 2018

Data Set Metadata #234

Closed

vforgione pushed a commit that referenced this issue Apr 4, 2018

Added Basic Actions for ChartMeta

62347a3

- added changesets, queries, actions - wrote tests for actions Updates #264

vforgione pushed a commit that referenced this issue Apr 5, 2018

Cleaned Up Date Trunc and Added Point Agged

3ff06e9

can now get results aggregated to points Updates #264

vforgione added the feature request label Jun 26, 2018

vforgione self-assigned this Jun 28, 2018

vforgione pushed a commit that referenced this issue Jul 12, 2018

Throttling Calls to Load Charts

b4e41d4

I added a new JS dependency to throttle the calls and cleaned up the script. This should be good now. Updates #264

vforgione pushed a commit that referenced this issue Jul 12, 2018

Tied Date Range, Bbox and Granularity in Explorer to Search Results

8f5b45b

The results charts are now limited to the bounds given by the search in the form of the explorer. Updates #264

vforgione mentioned this issue Jul 13, 2018

264 chart metas #351

Merged

vforgione closed this as completed Jul 17, 2018

vforgione mentioned this issue Jul 19, 2018

Plenario Explorer: Integrate with charting visualizations #360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a new DisplayField Schema #264

Create a new DisplayField Schema #264

vforgione commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

vforgione commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

vforgione commented Mar 16, 2018 •

edited

vforgione commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

vforgione commented Mar 16, 2018

vforgione commented Apr 2, 2018 •

edited

Create a new DisplayField Schema #264

Create a new DisplayField Schema #264

Comments

vforgione commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

vforgione commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

vforgione commented Mar 16, 2018 • edited

vforgione commented Mar 16, 2018

HeyZoos commented Mar 16, 2018

vforgione commented Mar 16, 2018

vforgione commented Apr 2, 2018 • edited

vforgione commented Mar 16, 2018 •

edited

vforgione commented Apr 2, 2018 •

edited