Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ECharts BoxPlot chart #11199

Merged
merged 21 commits into from
Nov 12, 2020
Merged

Conversation

villebro
Copy link
Member

@villebro villebro commented Oct 8, 2020

SUMMARY

Replace the BoxPlot chart with an ECharts implementation (The bulk of work has been done in apache-superset/superset-ui#801 ). Features of the new viz plugin:

The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise.

Also introduces a new dedicated post processing operation for boxplot to make it easier for viz plugin developers to tap into this functionality.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

image

World's Bank Data example dashboard after:
image

World's Bank Data example dashboard before:
image

Control panels:
image
image

TEST PLAN

CI + new tests

ADDITIONAL INFORMATION

@codecov-io
Copy link

codecov-io commented Oct 8, 2020

Codecov Report

Merging #11199 (e60d19d) into master (77dff0e) will increase coverage by 0.99%.
The diff coverage is 82.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11199      +/-   ##
==========================================
+ Coverage   65.61%   66.61%   +0.99%     
==========================================
  Files         873      874       +1     
  Lines       42315    42319       +4     
  Branches     3971     3972       +1     
==========================================
+ Hits        27765    28190     +425     
+ Misses      14433    14031     -402     
+ Partials      117       98      -19     
Flag Coverage Δ
cypress 55.24% <87.50%> (+5.60%) ⬆️
javascript 62.85% <77.77%> (+<0.01%) ⬆️
python 62.03% <82.19%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...-frontend/src/visualizations/presets/MainPreset.js 100.00% <ø> (ø)
superset/utils/dashboard_import_export.py 0.00% <ø> (-60.72%) ⬇️
superset/viz.py 59.33% <ø> (+1.60%) ⬆️
superset/cli.py 40.39% <12.50%> (+0.39%) ⬆️
superset/views/core.py 74.32% <66.66%> (+0.02%) ⬆️
superset/dashboards/commands/importers/v0.py 80.92% <80.92%> (ø)
superset/utils/pandas_postprocessing.py 78.60% <95.45%> (+3.12%) ⬆️
superset-frontend/src/components/Icon/index.tsx 100.00% <100.00%> (ø)
...tend/src/explore/components/ExploreChartHeader.jsx 83.78% <100.00%> (ø)
...rc/explore/components/controls/AnnotationLayer.jsx 51.17% <100.00%> (+9.85%) ⬆️
... and 80 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77dff0e...c39f18a. Read the comment docs.

Comment on lines 33 to 55
const { slices } = bootstrapData.dashboard_data;
// then define routes and create alias for each requests
slices.forEach(slice => {
const alias = `getJson_${slice.slice_id}`;
const formData = `{"slice_id":${slice.slice_id}}`;
cy.route('POST', `/superset/explore_json/?*${formData}*`).as(alias);
aliases.push(`@${alias}`);
});
dashboard = bootstrapData.dashboard_data;
});
});

it('should load dashboard', () => {
xit('should load dashboard', () => {
const { slices } = dashboard;

// then define routes and create alias for each requests
slices.forEach(slice => {
const vizType = slice.form_data.viz_type;
const isLegacy = isLegacyChart(vizType);
const alias = `getJson_${slice.slice_id}_${vizType}_${isLegacy}`;
const formData = `{"slice_id":${slice.slice_id}}`;
const route = isLegacy
? `/superset/explore_json/?*${formData}*`
: `/api/v1/chart/data?dashboard_id=${dashboard.id}`;
cy.route('POST', `${route}`).as(alias);
aliases.push(`@${alias}`);
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason this test started flaking out on the legacy plugins, despite those being unchanged. Although I xited this test, I updated it in a way which IMO should work.

@villebro villebro force-pushed the villebro/boxplot branch 2 times, most recently from fde5283 to a8e4a32 Compare October 22, 2020 11:45
@bkyryliuk bkyryliuk self-requested a review October 30, 2020 18:00
Copy link
Member

@bkyryliuk bkyryliuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall everything looks good! let me test it on staging once conflicts are resolved

if callable(operator):
aggfunc = operator
else:
func = NUMPY_FUNCTIONS.get(operator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice cleanup!

@@ -607,3 +617,103 @@ def test_prophet_incorrect_time_grain(self):
periods=10,
confidence_interval=0.8,
)

def test_boxplot_tukey(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
there was a bug with 3rd quantile, will test it out once conflicts are resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, let me take a look

Copy link
Member

@bkyryliuk bkyryliuk Oct 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant prior to your PR, will test now if that is still an issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is still a problem
image
Data for the bucket:

[f] 1000-2499 | 7251270
[f] 1000-2499 | 1820590.6666666667
[f] 1000-2499 | 1187242.142857143
[f] 1000-2499 | 1151020
e.g. look like outlier was not excluded for quartile calculations

Copy link
Member Author

@villebro villebro Oct 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to reproduce. Interesting edge case with so few observations, I'll replicate with some other commonly used stats package to see what it would give for a similar case.

@bkyryliuk
Copy link
Member

nit: Old naming seems a bit more intuitive and cleaner,

Old:
image
New:
image

@villebro
Copy link
Member Author

nit: Old naming seems a bit more intuitive and cleaner,

I'll make a follow-up PR to use the same wording as in the previous version.

@villebro
Copy link
Member Author

villebro commented Oct 31, 2020

For reference Bogdan's sample data in Superset and results for same dataset from two different online boxplot calculators:
Superset:
image
http://www.alcula.com/calculators/statistics/box-plot/
image
http://shiny.chemgrid.org/boxplotr/:
image

@villebro
Copy link
Member Author

villebro commented Oct 31, 2020

It seems most stats packages are using midpoint interpolation for quartiles. Results with new interpolation parameters seem to reproduce the BoxPlotR results (I'm guessing this is THE benchmark for boxplots):
image

@villebro villebro force-pushed the villebro/boxplot branch 3 times, most recently from 1f92ed7 to 6761bf8 Compare November 2, 2020 13:17
@villebro villebro force-pushed the villebro/boxplot branch 3 times, most recently from b3d02a6 to b011cd9 Compare November 10, 2020 13:32
cy.visitChartByName('Boys'); // a table chart
cy.verifySliceSuccess({ waitAlias: '@postJson' });
});

xit('Should not load mathjs when not needed', () => {
Copy link
Member Author

@villebro villebro Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud (pinging as I noticed you had introduced this test recently) I had to disable this test, as apparently the function annotation layer that was added to the ECharts Timeseries viz loads mathjs despite not being used here. The relevant code is here (is called by transformProps.ts): https://github.com/apache-superset/superset-ui/blob/master/plugins/plugin-chart-echarts/src/utils/annotation.ts . Is there something that can be done differently to avoid having mathjs load here?

Copy link
Member

@ktmud ktmud Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You chart plugin seems to be using a different version of mathjs, which is probably incompatible with what's used in AnnotationLayer control. Do you mind changing them to the same version and see if the test still fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good to know, I'll do that 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud I've now upgraded all mathjs deps to 8.0.1, and still failing the test. I propose disabling it for now so we can merge this.

Copy link
Member

@ktmud ktmud Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is failing because of this import chain: @superset-ui/plugin-chart-echarts -> Timeseries -> transformProps -> utils/annotation -> mathjs. Unlike loadChart, transformProps is not loaded asynchronously, so Webpack could not defer the loading of mathjs, which means it will be loaded for all first-visits on almost all pages, undoing part of the optimization we did in #10837 .

The test is there to make sure this optimization still works. It didn't break because it was flaky. It caught the breaking changes it is supposed to catch.

It's OK to merge this PR as is for now, but I'd recommend finding ways to move the mathjs logic to the chart renderer so that we could continue benefiting from code split and asynchronous loading.

In general I think transformProps should be as lightweight as possible and preferably only return scalars. It should only be about applying defaults and consolidating parameter names/shapes. Most complex logics should be dealt with either by the chart renderer itself or at another data-processing layer on top of the chart. This way it's easier to reuse your chart at other places outside of Superset charting plugins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. I'll move the mathjs (and any other similar) rendering logic out of transformProps and re-enable the test in a subsequent bump.

@villebro villebro merged commit 2718909 into apache:master Nov 12, 2020
@villebro villebro deleted the villebro/boxplot branch November 12, 2020 08:01
@junlincc junlincc self-requested a review November 12, 2020 16:32
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
* feat: add ECharts BoxPlot chart

* lint

* fix cypress tests

* lint

* remove viz.py shim

* bump plugin package

* skip non-legacy plugin cypress dashboard tests

* fix cypress tests

* disable cypress tests for non-leagcy charts

* fix bad rebase

* use midpoint interpolation for quartile calculation

* bump packages and add support for no groupby

* whitespace

* whitespace

* linting

* fix tests

* xit mathjs load test

* bump mathjs to 8.0.1

* disable cypress filter test for v1 charts
@junlincc
Copy link
Member

@villebro i think it make sense to expand box plot to include candlesticks chart and whisker chart two variations since they share the same set of control, just like you did for time series. what do you think for post 1.0 hardening?

@villebro
Copy link
Member Author

@junlincc hmm, not a bad idea, as candlesticks just use pick slightly different data points from within the group (when we do that it might make sense to rename the boxplot post transformation operation to something more generic). However, candlesticks tend to have a temporal x-axis, so we need to add proper support for that first.

@rusackas rusackas added the viz:charts:echarts Related to Echarts label Nov 25, 2020
@villebro villebro mentioned this pull request Dec 11, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL viz:charts:echarts Related to Echarts 🚢 1.0.0
Projects
None yet
7 participants