New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a new DisplayField Schema #264
Comments
When you say display do you mean API results or frontend results? Would this schema be used to return json objects that look like:
|
Front end results: search specifically. The way the graphs are shown right now in the general explorer app is I grab the first few numeric fields of a data set and graph them out, regardless if they are significant to the actual data set. It's a very naive approach. In the case of AoT, or any other data set that has embedded JSON, I use a dummy set of fields and values to draw the graphs. What I want to be able to do is give data set owners the ability to highlight significant fields in their data sets to be used in displaying search results. |
So a |
I think it's a little overkill, especially if it's just for displaying search results. I could see providing those values through a simple query with |
Kind of. I'm thinking of it like this,
or something less crappy. You get the idea though. Edit: ideally, we would use an ecto query and the model for the data set from the registry. |
The thing is, I have no way of limiting fields to display in search results without some arbitrary cut off. Especially when it comes to This is a simple solution to that. |
Do you mean limiting the number of fields to display? I feel like the cutoff is still arbitrary with the new schema. I also don't see how it's expensive for the database. We assume that the json values all have the same "shape", so it's a matter of grabbing the first row, deriving the structure from that, and using that information in your subsequent queries. In regards to the snippet you posted, you'd grab the |
Here are some counter examples. In the case of
These rows include DNA testing data because the source of the data didn't exclude them from the results for whatever reason. If we were to display every numeric field in here it would be 1) crowded given the number of fields and 2) include fields that have little to no relevance to the core of the data set. In the case of Array of Things data, not every
Note that this is an aggregate of all the records. For most of these nodes they only had 2 sensors reporting in for a while - only recently did a third board start reporting in, and there's like a dozen others still to come. So, to me, having a predetermined, limited set of fields that we pull from to draw graphs for search results greatly simplifies mitigating these pitfalls. |
Idea to surface key paths from embedded JSONB fields:
In the case of AoT Chicago, we have a data set table with an For example, the result of the function is
which we can further aggregate to get a master list of keys for a given data set. Or, a more simple example:
|
Adding a new schema so that data set owners and admins can store metadata about data sets to more easily generate charts/graphs. - migration for schema - migration that adds function to recursively name subkeys in jsonb fields - schema file with function to surface fields - test for function Updates #264
- added changesets, queries, actions - wrote tests for actions Updates #264
give it: - a chart - a truncation span ("minutes", "hours", "days", etc) - (optional) time span - (optional) bbox and you get aggregated data for the truncated timestamps Updates #264
can now get results aggregated to points Updates #264
the ranges came out uneven so i updated the query and made it work with the series data rather than min/max of the data in the bucket. it's a little heavier in run time, but it makes the output much cleaner and easier to comprehend. also added tests for limiting to time range and bbox. updates #264
- added web stuff for chart meta - add to meta on show page - added basic display for bucket data the bucket data seems to be kind of wonky: it doesn't vary at all and shows only straight lines. i need to investigate this further. updates #264
The way we're currently aggregating data and rendering charts is awful. It needed to be completely rebuilt. The new schemas, controllers, etc allow users to create default chart renderings for their data sets that can then be used for display in search results. Stuff Completed: - new `Chart` schema for top level metadata - new `ChartDataset` schema for aggregate metadata - controllers, views, templates to CRUD the schemas - special endpoint for rendering the graph serverside Stuff To Do: - use charts in explorer - provide api access to aggregated data (json instead of rendered html) - add in functionality for use with leaflet - add text fields to views Regarding _add text fields to views_, for some charts we could really use being able to aggregate on text fields. Leaving them off the views made sense at first, as we only wanted to use them to filter non-text data. But now we've come to a point where it makes sense to have it in there. It will be beneficial to charting, but it will also allow us to start adding trigram indexes so we can offer basic text search capabilities as well -- both in the explorer and via the API. Updates #264
I added text fields to the data set views to enable aggregation around text values in charts. I also added a new index for each text field so that we can perform full text search later. For now, we're doing nothing with it, but later we can start using it as the indexes will be there and populated moving forward. This will require another migration to enable the Postgres Trigram extension. We will also have to do manual migrations to rebuild the views: ```elixir ids = MetaActions.list(ready_only: true) |> Enum.map(& &1.id) ids |> Enum.map(&MetaActions.get/1) |> Enum.each(&DataSetActions.down!/1) ids |> Enum.map(&MetaActions.get/1) |> Enum.each(&DataSetActions.up!/1) ids |> Enum.map(&MetaActions.get/1) |> Enum.each(&PlenarioEtl.import_data_set_on_demand/1) ``` Updates #264
I needed to add in functionality to limit queries to bboxes and time ranges. It also made sense to apply aggregate granularity to the data sets that are being aggregated by date trunc. The unfortunate thing about Ecto is that it can be really, really dumb. I mean part of its strength is that it's crazy versatile and you can throw a lot at it, but the fragments don't like dynamic values, so I had to build a billion different functions to handle granularity. Other than that, this is pretty straight forward -- just a lot of code to cover the cases where dynamic fragments would have been helpful. Updates #264
So the entire point of this was to render charts that didn't suck and had a little meaning to them in the search results. We've got that now. Additions: - new `with_charts` option for getting metas - totally wiped and redid the explorer controller - made small chages here and there to glue it together To Do: - make the charts smaller - clean up my god awful js - add an api endpoint for the chart aggregates Updates #264
I added a new JS dependency to throttle the calls and cleaned up the script. This should be good now. Updates #264
The results charts are now limited to the bounds given by the search in the form of the explorer. Updates #264
We should add another schema that holds information about what fields should be displayed on search results.
For most cases, this is similar to the virtual fields and constraints: give the owner a list of the fields attached to the meta. Where this gets tricky is with
jsonb
fields. For these, we need to inspect unique paths within the JSON and provide them as well.For example, if the data set has fields
id
,lat
,lon
,timestamp
andobservations
, where observations is a JSON field with the following structure:we should display:
The text was updated successfully, but these errors were encountered: