Skip to content

Conversation

@ktmud
Copy link
Member

@ktmud ktmud commented Aug 25, 2021

SUMMARY

More explicitly handle Infinity values in query results and improve type inference related to infinity values.

Previously we treat Infinity as N/A since JSON cannot handle infinity values. This PR explicitly retains infinity values in Pandas dataframes and adds two fields in data response (infs and -infs) to return to the client side which nulls in the data records are actually infinities. The client side can then convert them back into JavaScript Infinity/-Infinity (not the scope of this PR).

More unit tests to come.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

df[col] = pd.to_numeric(
df[col]
.replace(INFINITY_LITERALS, np.inf)
.replace(NEGATIVE_INFINITY_LITERALS, -np.inf),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some DBAPI (e.g. Druid) returns Infinity as quoted strings "Infinity", we must manually replace them into Numpy infs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should probably be moved into the Druid db_engine_spec to avoid running this logic on unaffected db engines. Something like

BaseEngineSpec.replace_literal_values(val: str) -> Any:
    return val

which then is implemented in the Druid spec with those specific literals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll update.

if self.enforce_numerical_metrics:
self.df_metrics_to_num(df, query_object)

df.replace([np.inf, -np.inf], np.nan, inplace=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infs will be forced into nulls by JSON encoder anyway.

@codecov
Copy link

codecov bot commented Aug 25, 2021

Codecov Report

Merging #16450 (29816e0) into master (db11c3e) will decrease coverage by 0.21%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16450      +/-   ##
==========================================
- Coverage   76.63%   76.41%   -0.22%     
==========================================
  Files        1002     1002              
  Lines       53635    53641       +6     
  Branches     6851     6851              
==========================================
- Hits        41101    40992     -109     
- Misses      12295    12410     +115     
  Partials      239      239              
Flag Coverage Δ
hive ?
mysql 81.57% <100.00%> (+0.04%) ⬆️
postgres 81.59% <100.00%> (+<0.01%) ⬆️
presto ?
python 81.68% <100.00%> (-0.42%) ⬇️
sqlite 81.20% <90.90%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/common/query_actions.py 93.42% <100.00%> (+0.36%) ⬆️
superset/common/query_context.py 90.74% <100.00%> (ø)
superset/constants.py 100.00% <100.00%> (ø)
superset/db_engines/hive.py 0.00% <0.00%> (-82.15%) ⬇️
superset/db_engine_specs/hive.py 69.80% <0.00%> (-16.87%) ⬇️
superset/db_engine_specs/presto.py 83.47% <0.00%> (-6.49%) ⬇️
superset/views/database/mixins.py 81.03% <0.00%> (-1.73%) ⬇️
superset/connectors/sqla/models.py 88.04% <0.00%> (-1.66%) ⬇️
superset/db_engine_specs/base.py 88.00% <0.00%> (-0.39%) ⬇️
superset/models/core.py 89.14% <0.00%> (-0.26%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db11c3e...29816e0. Read the comment docs.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we start considering moving to a binary format like msgpack for moving data over the wire to both decrease the payload size and add better support for these types of values?

df[col] = pd.to_numeric(
df[col]
.replace(INFINITY_LITERALS, np.inf)
.replace(NEGATIVE_INFINITY_LITERALS, -np.inf),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should probably be moved into the Druid db_engine_spec to avoid running this logic on unaffected db engines. Something like

BaseEngineSpec.replace_literal_values(val: str) -> Any:
    return val

which then is implemented in the Druid spec with those specific literals.

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 16, 2022
@stale stale bot closed this Apr 27, 2022
@john-bodley john-bodley deleted the infinity-data-type branch June 8, 2022 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inactive Inactive for >= 30 days size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants