fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129

john-bodley · 2023-05-19T07:31:31Z

SUMMARY

The prophet package (formally fbprophet) historically has been "difficult" to work with from an installation perspective—atypical requirement definitions (fbstan needed to be installed first), no pre-built wheels, etc.—though thankfully in v1.1 there are pre-built binaries for MacOS, Windows, and Linux.

This PR bumps the version of the package and defines it as a requirement for the testing environment removing the need for the pytest.importorskip("prophet")—likely present due to problematic build. This ensures that the associated tests will now always run.

The frozen requirements were updated by running:

pip-compile-multi --no-upgrade

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

Has associated issue:
[prophet] prophet installation fails with ModuleNotFoundError: No module named 'lunarcalendar' #20901
Tests fail for optional prophet forecasting functionality #24406
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

john-bodley · 2023-05-19T07:34:57Z

requirements/testing.txt

@@ -107,8 +125,6 @@ pydata-google-auth==1.7.0
    # via pandas-gbq
 pyfakefs==5.2.2
    # via -r requirements/testing.in
-pyhive[presto]==0.6.5


This is likely a legacy dependency resulting from not having run pip-compile-multi. The presto extra requirement is specified in requirements/development.in which is an input to requirements/testing.in and thus shouldn't be re-included here.

john-bodley · 2023-05-19T07:35:32Z

setup.py

@@ -177,7 +177,7 @@ def get_git_sha() -> str:
        "postgres": ["psycopg2-binary==2.9.6"],
        "presto": ["pyhive[presto]>=0.6.5"],
        "trino": ["trino>=0.324.0"],
-        "prophet": ["prophet>=1.0.1, <1.1", "pystan<3.0"],


pystan is now no longer required for prophet v1.1.

john-bodley · 2023-05-19T18:09:40Z

tests/integration_tests/charts/data/api_tests.py

@@ -444,11 +444,11 @@ def test_chart_data_dttm_filter(self):
        else:
            raise Exception("ds column not found")

+    @pytest.mark.usefixtures("load_birth_names_dashboard_with_slices")


I'm really not sure how this test worked before it was skipped without the inclusion of the fixture which ensures (hopefully) that the tests are idempotent.

john-bodley · 2023-05-19T18:12:22Z

superset/utils/pandas_postprocessing/prophet.py

@@ -134,7 +135,11 @@ def prophet(  # pylint: disable=too-many-arguments
        raise InvalidPostProcessingError(_("DataFrame include at least one series"))

    target_df = DataFrame()
-    for column in [column for column in df.columns if column != index]:
+    for column in [


@villebro I would love your input on this as I'm somewhat perplexed as to how this was working. After reenabling the tests it seemed to be trying to fit non-numerical columns within the pd.DataFrame. I couldn't find any options where we explicitly define which columns to produce a forecast for.

The original logic I had was for determining numeric columns was,

df.select_dtypes(include=np.number).columns

however the MySQL tests were failing because it seems like some of the numeric columns where of type object which is typically used to encode strings.

john-bodley · 2023-05-19T18:13:10Z

tests/integration_tests/charts/data/api_tests.py

@@ -476,7 +476,7 @@ def test_chart_data_prophet(self):
        self.assertIn("sum__num__yhat", row)
        self.assertIn("sum__num__yhat_upper", row)
        self.assertIn("sum__num__yhat_lower", row)
-        self.assertEqual(result["rowcount"], 47)
+        self.assertEqual(result["rowcount"], 103)


I'm not sure if 47 or 103 is correct. I assume these tests have been disabled for a long time and thus the underlying data might have changed.

codecov · 2023-05-19T18:16:36Z

Codecov Report

Merging #24129 (fdf434b) into master (e20b695) will increase coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head fdf434b differs from pull request most recent head 13597d1. Consider uploading reports for the commit 13597d1 to get more accurate results

@@            Coverage Diff             @@
##           master   #24129      +/-   ##
==========================================
+ Coverage   69.06%   69.08%   +0.02%     
==========================================
  Files        1901     1901              
  Lines       74019    74020       +1     
  Branches     8116     8116              
==========================================
+ Hits        51121    51137      +16     
+ Misses      20787    20772      -15     
  Partials     2111     2111

Flag	Coverage Δ
hive	`54.06% <100.00%> (+0.16%)`	⬆️
mysql	`79.50% <100.00%> (+0.08%)`	⬆️
postgres	`79.58% <100.00%> (+0.08%)`	⬆️
presto	`53.96% <100.00%> (+0.16%)`	⬆️
python	`83.53% <100.00%> (+0.04%)`	⬆️
sqlite	`78.09% <100.00%> (+0.08%)`	⬆️
unit	`54.70% <100.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/utils/pandas_postprocessing/prophet.py	`81.03% <100.00%> (+26.64%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

john-bodley · 2023-05-20T00:29:53Z

@villebro I might need some additional 👀 on this as I'm somewhat out of my league in terms of expectations and inconsistencies regarding prophet.

john-bodley · 2023-05-20T00:28:25Z

tests/integration_tests/charts/data/api_tests.py

@@ -476,7 +476,7 @@ def test_chart_data_prophet(self):
        self.assertIn("sum__num__yhat", row)
        self.assertIn("sum__num__yhat_upper", row)
        self.assertIn("sum__num__yhat_lower", row)
-        self.assertEqual(result["rowcount"], 47)


I'm not sure why (on line #473) result["data"] is [] for only MySQL.

I think we should update the query context so that it's not using the legacy properties (granularity, is_timeseries etc). I wonder if that may be causing issues.

sebastianliebscher · 2023-06-16T18:54:24Z

This PR will also fix #24406

john-bodley · 2023-06-24T01:57:01Z

@sebastianliebscher and @villebro I fixed the CI issue—there was an inconsistency with how the Pandas Dataframe columns where being encoded—and thus this should be ready to review.

sebastianliebscher · 2023-06-24T16:26:15Z

requirements/testing.txt

@@ -85,6 +101,8 @@ parameterized==0.9.0
    # via -r requirements/testing.in
 pathable==0.4.3
    # via jsonschema-spec
+prophet==1.1.3


A new version of prophet is already available :)

Suggested change

prophet==1.1.3

prophet==1.1.4

If I think about it: "prophet": ["prophet>=1.1.0, <2.0.0"], would always install latest prophet 1.x, but CI would only test the pinned release. Maybe we should pin prophet in setup.py?

@sebastianliebscher CI uses the pinned version in the frozen requirements/testing.txt file.

I mean, with future releases of prophet if a user enables the optional dependency with pip install -e '.[prophet]', they would potentially install a version that is not tested against CI.

The issue you describe is true for any package (or sub-package) which is not explicitly pinning.

Note we (Airbnb) typically install Superset as a service by:

Defining a requirements/production.in file which includes the lines

-e .[prophet,…] ddtrace …

Running pip-compile-multi to compile the dependencies—taking into account the prophet, ddtrace, etc. production dependencies.

Installing Superset via,

python3 -m pip install -r requirements/production.txt python3 -m pip install -e .

which ensures we always install Superset in a deterministic manner which is super critical.

Why doesn’t Superset explicitly pin every package (and sub-package) in setup.py? Though packages should provide flexibility, one could argue this is valid for a service—given it typically isn’t used as a dependency—however we run into the issue of creating an infeasible dependency space when installation dependent requirements are added. By relaxing constraints (by way of removing explicit pins) we allow pip-compile-multi to solve this somewhat difficult problem on our behalf.

A bit late, but I thank you very much for your detailed explanation! This helped a lot to understand how Superset manages dependencies.

villebro

Looks good, thanks for cleaning this up. After this is merged I can follow up by refactoring those tests so that they're using the generic x-axis syntax in the query context. I'll also make sure the rowcount is correct. At any rate, I think things are better after this is merged than before, so LGTM 🚢

villebro · 2023-07-04T13:21:53Z

tests/integration_tests/charts/data/api_tests.py

@@ -476,7 +476,7 @@ def test_chart_data_prophet(self):
        self.assertIn("sum__num__yhat", row)
        self.assertIn("sum__num__yhat_upper", row)
        self.assertIn("sum__num__yhat_lower", row)
-        self.assertEqual(result["rowcount"], 47)


I think we should update the query context so that it's not using the legacy properties (granularity, is_timeseries etc). I wonder if that may be causing issues.

…ic (#24129) (cherry picked from commit 383dac6)

pull-request-size bot added the size/M label May 19, 2023

john-bodley commented May 19, 2023

View reviewed changes

john-bodley force-pushed the john-bodley--bump-prophet branch 3 times, most recently from 27e9420 to 1765c97 Compare May 19, 2023 18:07

john-bodley commented May 19, 2023

View reviewed changes

john-bodley force-pushed the john-bodley--bump-prophet branch 5 times, most recently from 4c44e85 to 85245d1 Compare May 20, 2023 00:27

john-bodley requested review from villebro and michael-s-molina May 20, 2023 00:28

john-bodley commented May 20, 2023

View reviewed changes

john-bodley force-pushed the john-bodley--bump-prophet branch from 85245d1 to 8b517ec Compare May 29, 2023 21:32

596050 mentioned this pull request Jun 1, 2023

[prophet] prophet installation fails with ModuleNotFoundError: No module named 'lunarcalendar' #20901

Closed

3 tasks

john-bodley mentioned this pull request Jun 16, 2023

chore: bump prophet and holidays (predictive forecasting) #24426

Closed

9 tasks

john-bodley force-pushed the john-bodley--bump-prophet branch from 8b517ec to 4710248 Compare June 24, 2023 01:34

chore: Bump prophet

13597d1

john-bodley force-pushed the john-bodley--bump-prophet branch from 4710248 to 13597d1 Compare June 24, 2023 01:57

sebastianliebscher reviewed Jun 24, 2023

View reviewed changes

john-bodley requested a review from sebastianliebscher June 26, 2023 16:05

villebro approved these changes Jul 4, 2023

View reviewed changes

john-bodley changed the title ~~chore: Bump prophet~~ fix: Bump prophet and remedy column eligibility logic Jul 5, 2023

john-bodley changed the title ~~fix: Bump prophet and remedy column eligibility logic~~ fix: Bump prophet, re-enable tests, and remedy column eligibility logic Jul 5, 2023

michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Jul 5, 2023

john-bodley merged commit 383dac6 into apache:master Jul 5, 2023
33 checks passed

michael-s-molina mentioned this pull request Jul 6, 2023

Tests fail for optional prophet forecasting functionality #24406

Closed

3 tasks

michael-s-molina pushed a commit that referenced this pull request Jul 26, 2023

fix: Bump prophet, re-enable tests, and remedy column eligibility log…

af2689d

…ic (#24129) (cherry picked from commit 383dac6)

michael-s-molina mentioned this pull request Aug 17, 2023

fix: Downgrades Prophet to 1.1.1 and Holidays to 0.23 #25017

Merged

9 tasks

mistercrunch added 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129

fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129

john-bodley commented May 19, 2023 •

edited

Loading

john-bodley May 19, 2023

john-bodley May 19, 2023

john-bodley May 19, 2023 •

edited

Loading

john-bodley May 19, 2023 •

edited

Loading

john-bodley May 19, 2023

codecov bot commented May 19, 2023 •

edited

Loading

john-bodley commented May 20, 2023

john-bodley May 20, 2023

villebro Jul 4, 2023

sebastianliebscher commented Jun 16, 2023

john-bodley commented Jun 24, 2023

sebastianliebscher Jun 24, 2023

sebastianliebscher Jun 24, 2023

john-bodley Jun 24, 2023 •

edited

Loading

sebastianliebscher Jun 24, 2023

john-bodley Jun 25, 2023

sebastianliebscher Jul 13, 2023

villebro left a comment

villebro Jul 4, 2023

fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129

fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129

Conversation

john-bodley commented May 19, 2023 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley May 19, 2023 • edited Loading

Choose a reason for hiding this comment

john-bodley May 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 19, 2023 • edited Loading

Codecov Report

john-bodley commented May 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sebastianliebscher commented Jun 16, 2023

john-bodley commented Jun 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley Jun 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley commented May 19, 2023 •

edited

Loading

john-bodley May 19, 2023 •

edited

Loading

john-bodley May 19, 2023 •

edited

Loading

codecov bot commented May 19, 2023 •

edited

Loading

john-bodley Jun 24, 2023 •

edited

Loading