Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Aera/line chart] Zero imputation resample returns error when combined with Group By #19157

Closed
3 tasks done
EBoisseauSierra opened this issue Mar 15, 2022 · 6 comments
Closed
3 tasks done
Labels
#bug Bug report explore:advanced-analysis Related to Advanced Analysis in Explore

Comments

@EBoisseauSierra
Copy link
Contributor

I am getting an Unexpected error when trying to use the Advanced analytics/Resample/Fill Method = Zero imputation feature of the Time-series {Line, Area} Chart.

How to reproduce the bug

  1. Create a new time series line/area chart using:
    • Metrics: sum(numeric),
    • Group by: str (categorical variable, with or without NULL values),
    • Contribution mode, Filters, Series Limit, Sort by: n/a
  2. In Advanced analytics/Resample, select:
    • Rule: 1 month start frequency,
    • Fill Method: Zero imputation
  3. Click on Run

Note that removing the metric in Group By allows to generate a graph as expected.

Expected results

A chart is generated.

Actual results

The Explore Panel returns an Unexpected error:

<!doctype html><html lang="en"><head><meta charset="UTF-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"/><meta name="viewport" content="width=q,initial-scale=1"/><link rel="icon" type="image/png" href="/static/assets/e3bafb62eb2592c0bb0e.png"/><link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500&display=swap" rel="stylesheet"/><style>html { height: 100%; } body { color: #1985a0; font-family: 'Inter', sans-serif; height: 100%; margin: 0; display: flex; align-items: stretch; } h1 { font-weight: 600; font-size: 88px; margin: 0; } p { font-weight: 500; font-size: 24px; line-height: 40px; width: 490px; } .button { -webkit-appearance: button; -moz-appearance: button; appearance: button; background-color: #1985a0; /* Green */ border: none; color: white; padding: 16px 38px; text-align: center; text-decoration: none; display: inline-block; font-size: 11px; border-radius: 4px; text-transform: uppercase; } .error-page-content { display: flex; flex-direction: row; align-items: center; justify-content: space-between; max-width: 1350px; margin: auto; width: 100%; padding: 56px; } img { width: 540px; }</style><title>500: Internal server error | Superset</title></head><body><div class="error-page-content"><section><h1>Internal server error</h1><p>Sorry, something went wrong. We are fixing the mistake now. Try again later or go back to home.</p><a href="/" class="button">Back to home</a></section><img alt="500" src="/static/assets/b01fb73b111d937e4c09.png" width="540"/></div></body></html>

The following can be found in the logs:

2022-03-15 16:16:35,070:DEBUG:superset.stats_logger:[stats_logger] (incr) ChartRestApi.data.error
2022-03-15 16:16:35,070:ERROR:superset.views.base:cannot reindex from a duplicate axis
Traceback (most recent call last):
  File "/opt/superset/venv/lib64/python3.9/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/superset/venv/lib64/python3.9/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/superset/venv/lib64/python3.9/site-packages/flask_appbuilder/security/decorators.py", line 104, in wraps
    return f(self, *args, **kwargs)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/views/base_api.py", line 85, in wraps
    raise ex
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/views/base_api.py", line 82, in wraps
    duration, response = time_function(f, self, *args, **kwargs)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/utils/core.py", line 1468, in time_function
    response = func(*args, **kwargs)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/utils/log.py", line 242, in wrapper
    value = f(*args, **kwargs)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/charts/api.py", line 736, in data
    return self.get_data_response(command)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/charts/api.py", line 547, in get_data_response
    result = command.run(force_cached=force_cached)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/charts/commands/data.py", line 50, in run
    payload = self._query_context.get_payload(
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_context.py", line 305, in get_payload
    query_results = [
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_context.py", line 306, in <listcomp>
    get_query_results(
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_actions.py", line 186, in get_query_results
    return result_func(query_context, query_obj, force_cached)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_actions.py", line 148, in _get_results
    payload = _get_full(query_context, query_obj, force_cached)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_actions.py", line 98, in _get_full
    payload = query_context.get_df_payload(query_obj, force_cached=force_cached)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_context.py", line 468, in get_df_payload
    query_result = self.get_query_result(query_obj)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_context.py", line 274, in get_query_result
    df = query_object.exec_post_processing(df)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/common/query_object.py", line 422, in exec_post_processing
    df = getattr(pandas_postprocessing, operation)(df, **options)
  File "/opt/superset/venv/lib64/python3.9/site-packages/superset/utils/pandas_postprocessing.py", line 982, in resample
    df = df.resample(rule).asfreq(fill_value=fill_value)
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/resample.py", line 848, in asfreq
    return self._upsample("asfreq", fill_value=fill_value)
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/resample.py", line 1137, in _upsample
    result = obj.reindex(
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/util/_decorators.py", line 312, in wrapper
    return func(*args, **kwargs)
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/frame.py", line 4176, in reindex
    return super().reindex(**kwargs)
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/generic.py", line 4811, in reindex
    return self._reindex_axes(
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/frame.py", line 4022, in _reindex_axes
    frame = frame._reindex_index(
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/frame.py", line 4041, in _reindex_index
    return self._reindex_with_indexers(
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/generic.py", line 4877, in _reindex_with_indexers
    new_data = new_data.reindex_indexer(
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/internals/managers.py", line 1301, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/opt/superset/venv/lib64/python3.9/site-packages/pandas/core/indexes/base.py", line 3477, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
2022-03-15 16:16:35,072:INFO:werkzeug:127.0.0.1 - - [15/Mar/2022 16:16:35] "POST /api/v1/chart/data?form_data=%7B%22slice_id%22%3A2%7D HTTP/1.1" 500 -

Screenshots

Screenshot from 2022-03-15 16-16-27

Environment

(please complete the following information):

  • browser type and version: Firefox Developer Edition 99.0b3
  • superset version: 1.4.1
  • python version: 3.9.10
  • node.js version: 14.19.0
  • any feature flags active: None

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

This seems linked to #18045.

@EBoisseauSierra EBoisseauSierra added the #bug Bug report label Mar 15, 2022
@zhaoyongjie
Copy link
Member

@EBoisseauSierra I have posted a new PR for refactoring AA in this scenario.

@zhaoyongjie zhaoyongjie added the explore:advanced-analysis Related to Advanced Analysis in Explore label Mar 16, 2022
@zhaoyongjie
Copy link
Member

@EBoisseauSierra I have merged that PR. Let me know if any following up is needed.

@EBoisseauSierra
Copy link
Contributor Author

@zhaoyongjie thanks! I finally managed to have a go at it, and can confirm (5ae7e5499) that it works for SUM aggregation.

However, my particular example uses cumulative sum, and the graphs doesn't seem to resample:

Screenshot from 2022-03-29 12-21-38

(with cumsum, the “zero filling” would then mean horizontal line from last data point when no data for that time bucket)


  • Time
    • time grain: month
  • Query
    • metric: SUM(metric)
    • group by: category
  • AA
    • rolling function: cumsum,
    • time shift: null,
    • calculation type: actual values,
    • resample rule: 1 month start frequency
    • fill method: backward values.

@zhaoyongjie
Copy link
Member

@zhaoyongjie thanks! I finally managed to have a go at it, and can confirm (5ae7e5499) that it works for SUM aggregation.

However, my particular example uses cumulative sum, and the graphs doesn't seem to resample:

Screenshot from 2022-03-29 12-21-38

(with cumsum, the “zero filling” would then mean horizontal line from last data point when no data for that time bucket)

* Time
  
  * time grain: month

* Query
  
  * metric: `SUM(metric)`
  * group by: `category`

* AA
  
  * rolling function: cumsum,
  * time shift: _null_,
  * calculation type: actual values,
  * resample rule: 1 month start frequency
  * fill method: backward values.

@EBoisseauSierra Could you share the dataset so that I reproduce this case?

@rusackas
Copy link
Member

@EBoisseauSierra it seems that this issue has gone cold... are you still seeing an issue here, and if so are you able to provide a reproduction case with sample data? Otherwise, we should probably close this one out.

@rusackas
Copy link
Member

This is likely fixed by now, and is pretty out of date if not. If people are still encountering this in current versions (3.x) please open a new Issue or a PR to address the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report explore:advanced-analysis Related to Advanced Analysis in Explore
Projects
None yet
Development

No branches or pull requests

3 participants