Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prophet Regressor #2242

Merged
merged 5 commits into from Aug 4, 2021
Merged

Add Prophet Regressor #2242

merged 5 commits into from Aug 4, 2021

Conversation

ParthivNaresh
Copy link
Contributor

@ParthivNaresh ParthivNaresh commented May 9, 2021

Fixes #1499

Thanks to a conversation with @dsherry , @kmax12 , and @tyler3991, we made the decision to include Prophet alongside the cmdstanpy backend. The reasoning behind this is that because we aren't distributing Prophet code (simply requiring the user to install it as a dependency) we are in the same position as Prophet is regarding its use of pystan 2 and its GPLv3 license. As we aren't distributing the underlying code, we (and Prophet) aren't required to be under the same copyleft license.

However, because we have to think downstream to Tempo and whether it will be installed on-premises (which would require packaging up dependencies and distributing them), we can't rely on pystan 2 as a backend. Because Prophet doesn't support pystan 3 yet (which uses a permissive license), we will have to rely on cmdstanpy.

For more information read here

This PR only deals with adding the estimator so we can get it into main. I'll be raising another PR to add it to AutoML alongside perf tests.

@codecov
Copy link

codecov bot commented May 9, 2021

Codecov Report

Merging #2242 (879c147) into main (0c64266) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2242     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        293     295      +2     
  Lines      26894   27055    +161     
=======================================
+ Hits       26851   27012    +161     
  Misses        43      43             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/estimators/__init__.py 100.0% <ø> (ø)
...alml/tests/model_family_tests/test_model_family.py 100.0% <ø> (ø)
evalml/tests/utils_tests/test_dependencies.py 85.2% <ø> (ø)
evalml/utils/gen_utils.py 99.6% <ø> (ø)
evalml/model_family/model_family.py 100.0% <100.0%> (ø)
...lines/components/estimators/regressors/__init__.py 100.0% <100.0%> (ø)
...ponents/estimators/regressors/prophet_regressor.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_components.py 100.0% <100.0%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c64266...879c147. Read the comment docs.

@ParthivNaresh ParthivNaresh self-assigned this May 17, 2021
@ParthivNaresh
Copy link
Contributor Author

ParthivNaresh commented May 17, 2021

A current issue I'm facing regarding the cmdstanpy backend is during following these instructions.

After installation of the backend cmdstanpy, a function has to be run from the installed package: install_cmdstan.py
This is responsible for downloading CmdStan from GitHub and building the CmdStan utilities. After this is done, a new directory is create in the default $HOME location (or whatever directory was specified when python install_cmdstan.py was run) by the name of cmdstan-2.26.1,

The path to this folder has to be assigned to the environment variable CMDSTAN. It can either be assigned to it separately as
export CMDSTAN=<path/to/dir>
or as part of the final installation command for Prophet through:
CMDSTAN=<path/to/dir> STAN_BACKEND=CMDSTANPY pip install prophet

Prophet includes pystan naturally as part of the installation process. To specify using cmdstanpy, a parameter has to be passed during Prophet instantiation: stan_backend='CMDSTANPY. If this parameter isn't specified, it will revert to the pystan implementation.

m = prophet.forecaster.Prophet()
print(f'Using backend: {m.stan_backend.get_type()}')
>>> Using backend: PYSTAN

If the backend is specified as cmdstanpy, an error is thrown:

m = prophet.forecaster.Prophet(stan_backend='CMDSTANPY')
print(f'Using backend: {m.stan_backend.get_type()}')
>>> ValueError: no such file /Users/parthiv.naresh/miniconda3/envs/prophet/lib/python3.8/site-packages/prophet/stan_model/prophet_model.bin

It appears to be looking for a file, prophet_model.bin which doesn't exist, whereas if the default pystan backend is used, then the file prophet_model.pkl is used, which does exist.

This has been filed with Prophet.

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh This is looking good! I have some broad comments on the installation process and how we can make that easier for users. Component logic looks good but I'll also take a closer look later!

windows-requirements.txt Outdated Show resolved Hide resolved
@@ -66,11 +66,16 @@ jobs:
! pip freeze | grep -E "xgboost|catboost|lightgbm|plotly|ipywidgets|category_encoders"
exit $?
- if: ${{ !matrix.core_dependencies }}
name: Installing Dependencies
name: Installing Dependencies and Prophet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we're installing our dependencies (which installs prophet since it's in our requirements file) and then we have to reinstall prophet with the right backend.

Although that works, I think we can simplify it a bit. I tried this out locally, and it seems to work (in the sense the unit tests pass)

pip install cmdstanpy==0.9.68
<compile cmdstan>
export CMDSTAN=<path-to-cmdstan>
export STAN_BACKEND=CMDSTANPY
pip install -r dev-requirements.txt

I think this also makes it easier to communicate to users how to properly install prophet with the right backend.

This reminds me, we should add a section to the install page in the docs about prophet installation! I'm hoping it can be this:

pip install cmdstanpy==0.9.68
<compile cmdstan>
export CMDSTAN=<path-to-cmdstan>
export STAN_BACKEND=CMDSTANPY
pip install evalml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, hopefully we should be able to remove this once we can get the cmdstan-ext wheel up for an easy pip install

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update on this: we're not including Prophet in requirements because it's being installed as an extra-requirement in setup.py. Therefore we'll be including the entire installation process for Prophet for git-test-automl and git-test-other.

evalml/utils/gen_utils.py Outdated Show resolved Hide resolved
holidays_prior_scale=10,
seasonality_mode="additive",
random_seed=0,
stan_backend="CMDSTANPY",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't meant to be changed right? I thought the plan was to only support prophet with the CMDSTANPY backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need this to be specified so we can unit test it. Unfortunately the parameter doesn't get included in the attributes list unless it's passed into the Prophet component

make installdeps-test
pip uninstall pystan -y
pip freeze
- if: ${{ !matrix.core_dependencies && (matrix.command == 'git-test-modelunderstanding' || matrix.command == 'git-test-dask') }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't install prophet for non-core dependencies and if the tests are git-test-modelunderstanding and git-test-dask because they don't need it

Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, Parthiv! I think that addressing @freddyaboulton 's concerns about the tests taking 15s each is very important to not bloating the CI time. If you can reduce those and ping me with the results, I'd love to follow up. Additionally, with respect to the suppression of stdout, I went through the ref and it seemed like some people had issues crop up with "running out of file descriptors" or something of the like. For the sake of getting the Prophet regressor in, I'm willing to accept the risk of some weirdness here, but I'd like an issue filed to try and address this differently post-merge as I think this has the potential of being tech debt with a high interest rate, ala the weirdness going on with the matplotlib pngs.

setup.py Outdated
from setuptools import find_packages, setup

with open("README.md", "r") as fh:
long_description = fh.read()

extras_require = {
'update_checker': ['alteryx-open-src-update-checker >= 2.0.0'],
'prophet': ['cmdstan-builder == 0.0.3']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This links to the cmdstan-builder pip package.
The link for it on GitHub is here, and the PyPI package is here.
To learn more about the thought process behind this, please take a look at this, this, this, this, this and this.

setup.py Outdated Show resolved Hide resolved
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work @ParthivNaresh ! I think this looks great. I mainly have some questions about the install process that I want to resolve before merge.

README.md Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
evalml/utils/gen_utils.py Outdated Show resolved Hide resolved
evalml/tests/utils_tests/test_dependencies.py Outdated Show resolved Hide resolved
@@ -1297,6 +1312,8 @@ def test_estimators_accept_all_kwargs(
)
if estimator_class.model_family == ModelFamily.ENSEMBLE:
params = estimator.parameters
elif estimator_class.model_family == ModelFamily.PROPHET:
params = estimator.get_params()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using estimator.parameters work for prophet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would work. I'm testing this in particular because Prophet has its own dictionary representation that covers a wide range of parameters being passed into the component.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for explaining how things work in the writeups/comments/calls! Glad we're able to add this to EvalML!

@ParthivNaresh ParthivNaresh merged commit 455210b into main Aug 4, 2021
@chukarsten chukarsten mentioned this pull request Aug 12, 2021
@freddyaboulton freddyaboulton deleted the Add-Prophet-Regressor branch May 13, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time series regression: add Prophet estimator
5 participants