-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide drill across #245
Comments
This feature can be indeed added to the SQL query generator, however it has to be refactored first. See reasons in this post. As for the API, my suggestion would be just to extend the original interface with one more argument: My suggestion would be: How does that interface sound? As with joining, I'm not quite sure yet how difficult it would be to implement, but I'll just throw few pointers and notes here for future reference:
|
Few notes based on current work at branch sqlrefactor:
|
Thanks for looking into this. I think the interface sounds OK. One question I have is if there is some semantic or functional meaning in having a "primary" vs "secondary" cube in a drill-across query? The cube (in the path element) feels more "primary" relative to the ones in the "across" argument, and it feels like it might suggest that it is treating them somehow differently. So if there is semantic or functional meaning between these, I think that distinction is a benefit - if not, then I think it complicates the mental model a bit, since it suggests there might be. Also, to clarify - my suggestion was to use a route "/cubes" (plural) that only handles drill across queries, and perhaps other operations involving >1 cubes. This would be in addition to the "/cube" endpoint. Hence I don't think the /cube/foo+bar+baz/model issue would come up, since that would only be an available operation on /cube/*/model. |
As for "primary" vs "secondary" – I'm just following current interface which might also impose some limitations on the "secondary" cubes. This distinction is definitely not a benefit and as you say, it complicates the mental model a bit. I'm not in favor for those limitations, but I take it as an intermediate step. I was thinking briefly what would have to be changed to accommodate queries across multiple cubes. The current BrowserAs for Browser, we can either add We need to have an object which knows about all joins and tables. Currently called CellAs for Cell – there are some methods that were meant to be of "explorative" nature, such as drill-down. which are using the Cell-linked cube. We can either: A. make Cell dumb and use only string based references, move the explorative functions to some other utility object keeping a state (†). No cube reference in a cell. Cell would be "portable" to any other cube with the same dimensions. (A) feels lighter, but will require additional methods for checking validity of the dimensions in the cell. ModelAnother radical option would be to bring back concept of a "model" as a closed set of physically related cubes. There would be one This might reduce some complexity, but require more work as it will affect the SummaryWe have a design chain reaction here, for historical reasons. Besides SQL backend query generator lacking the ability to include multiple fact tables, there is a bit more in the library that was based on single-cube query assumption. This definitely needs to be redesigned. I will keep this in mind during this refactoring. I don't think that any of the above will happen in this round, if I want to deliver more understandable query generator, but it has to happen soon or later. For the time being we will have to stick with primary/secondary cubes in the drill-across. Anyway, any input – thoughts, pics, questions – on this matter is appreciated. (†) Off-topic note: Here I kind of regret calling Browser a Browser as it is in current implementation. It would be nice to have Browser as explorative class holding together cube and currently viewed cell. Useable from within environments such as iPython Notebook. The aggregation/query providing class would have a different name and functionality of current AggregationBrowser. |
@grugnog (forgot to respond) I see your point with |
Imo, this project is young enough that you should push class name and API specification changes sooner (in version 2) rather than later (= closer to never). |
@dtheodor good point. How to change the concept of Suggestion A
The usage would be: cube = workspace.cube("sales")
# or
cube = provider.cube("sales")
result = workspace.aggregate(cube.aggregate["value"])
# or as a string `cube.aggregate`
result = workspace.aggregate(["sales.value"])
# or from a store:
result = store.aggregate(cube.aggregate["value"])
All attribute objects would have to be back-linked to their owners (model objects) such as cubes or dimensions. HTTP APIWith this change the HTTP API will not have to be changed, cube-oriented API will stay as: To list all aggergates: Therefore instead of Suggestion BThe first suggestion has several issues. The main is, that we can't have multiple models using the same store (database). Therefore we need to have an object that will contain the original model together:
Side effects:
Not going to evolve this part deeper here, more should be discussed in #256. Cube can be requested in a similar way as in the suggestion A. Aggregation done through workspace or browser – similar principle as in A. HTTP API would be the same as in A. ConclusionRegardless of implementation, the best way to implement drill-across is to hide concept of cubes completely in the aggregation call and use just aggregate list. Cubes would be resolved in the aggregating objects and the aggregation would be computed or rejected based on the possibility of the given combination of aggregates and dimensions (same store, shared dimensions, other rules ...). Thoughts? |
See also #273 |
Right now to drill across, users need to be implement this logic as part of the analytics/reporting system, which is fine - but can be quite a bit of work in practice. It would be really convenient if there was a function that could assist with this.
In terms of architecture, it seems like this could perhaps be implemented as an additional layer on or beside the backend that could perform multiple queries and aggregate the results. This would make reuse easier (compared with having cubes viewer make multiple queries directly, for example).
Rough sketch of how this might work:
/cubes/<cube1>+<cube2>+.../<aggr-options>/<query>
The would potentially be any query that is accepted by the "cube" endpoint. The logic would probably first need to parse the query and validate that each cube has each of the dimensions used for that query (it would be nice if cubes didn't need to have identical dimensions, as in #169).
Then the backend would perform the queries on each cube and aggregate the results. Assuming there are more than one way to aggregate the results (beyond just extra columns for each cube) having argument might be useful to allow the user to tweak that.
I can see more complex stuff like cross-tables that wouldn't fit the above pattern, but am not sure if it makes sense to try and include them (and if it does, if they could sensibly use this format without making it more complex for simple stuff).
The text was updated successfully, but these errors were encountered: