Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pythonPackages.mlflow: init at 1.4.0 #74091

Merged
merged 5 commits into from Feb 16, 2020
Merged

pythonPackages.mlflow: init at 1.4.0 #74091

merged 5 commits into from Feb 16, 2020

Conversation

@tbenst
Copy link
Contributor

tbenst commented Nov 25, 2019

Motivation for this change

add mlflow, an Open source platform for the machine learning lifecycle. Note that this package is only partially functional on NixOS, and is not intended to support features requiring conda.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
@tbenst
Copy link
Contributor Author

tbenst commented Nov 25, 2019

Here's one issue I'm stuck on:

> nix-shell -I nixpkgs=. -p 'python3.buildEnv.override { extraLibs = [ python3Packages.mlflow ]; }'
$ mlflow server --host 0.0.0.0
Traceback (most recent call last):
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/bin/.gunicorn-wrapped", line 6, in <module>
    from gunicorn.app.wsgiapp import run
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 9, in <module>
    from gunicorn.app.base import Application
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/app/base.py", line 10, in <module>
    from gunicorn import util
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/util.py", line 26, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
Running the mlflow server failed. Please see the logs above for details.

This appears to be benoitc/gunicorn#1716. So I tried adding setuptools to propagatedBuildInputs, and also setuptools_scm to buildInputs. I tried adding these to both mlflow as well as gunicorn. but no change.

Also possibly related to #68314. Anyone know how to fix this?

@tbenst
Copy link
Contributor Author

tbenst commented Nov 25, 2019

I originally was going to post this in mlflow/mlflow, but after tracing the error it appears to be a NixOS specific error that is caused by a bad PATH...not sure what's going on...

tl;dr why are python3.7 paths being appended to my PATH for some packages instead of the relevant python3.7.5 paths?

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): NixOS
  • MLflow installed from (source or binary): source
  • MLflow version (run mlflow --version): 1.4.0
  • Python version: 3.7.5
  • Exact command to reproduce: mlflow server

Describe the problem

Provide the exact sequence of commands / steps that you executed before running into the problem.
I am trying to package mlflow for NixOS. I get the following error:

$ mlflow server
Traceback (most recent call last):
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/bin/.gunicorn-wrapped", line 6, in <module>
    from gunicorn.app.wsgiapp import run
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 9, in <module>
    from gunicorn.app.base import Application
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/app/base.py", line 10, in <module>
    from gunicorn import util
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/util.py", line 26, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
Running the mlflow server failed. Please see the logs above for details.

At first I thought this was benoitc/gunicorn#1716, so I added setuptools as a dependency, but the issue remained.

Other info / logs

Since there's no stacktrace back to mlflow, I next tried following the code base by eye to see how the error occurs. mlflow server is handled by https://github.com/mlflow/mlflow/blob/3fc02ff20938ac5f6eb5681fac9cb693f55e1a19/mlflow/cli.py#L237
and calls https://github.com/mlflow/mlflow/blob/3fc02ff20938ac5f6eb5681fac9cb693f55e1a19/mlflow/server/__init__.py#L62
This constructs full_command of ['gunicorn', '-b', '127.0.0.1:5000', '-w', '4', 'mlflow.server:app'] and calls https://github.com/mlflow/mlflow/blob/3fc02ff20938ac5f6eb5681fac9cb693f55e1a19/mlflow/utils/process.py#L9
cmd_env equals the following: https://gist.githubusercontent.com/tbenst/0dbeecd11a5d91b57577a1f3919110f1/raw/d0f668530c0ec57c2d06e84447d34a192b093bdf/cmd_env. The error appears to come from https://github.com/mlflow/mlflow/blob/3fc02ff20938ac5f6eb5681fac9cb693f55e1a19/mlflow/utils/process.py#L35

I'd like to note that

$ gunicorn -b 127.0.0.1:5000 -w 4 mlflow.server:app 
[2019-11-25 10:57:05 -0800] [16341] [INFO] Starting gunicorn 19.9.0
[2019-11-25 10:57:05 -0800] [16341] [INFO] Listening at: http://127.0.0.1:5000 (16341)
[2019-11-25 10:57:05 -0800] [16341] [INFO] Using worker: sync
[2019-11-25 10:57:05 -0800] [16368] [INFO] Booting worker with pid: 16368
[2019-11-25 10:57:05 -0800] [16378] [INFO] Booting worker with pid: 16378
[2019-11-25 10:57:05 -0800] [16388] [INFO] Booting worker with pid: 16388
[2019-11-25 10:57:05 -0800] [16432] [INFO] Booting worker with pid: 16432

succeeds. Additionally, in a python shell, I can do the following without any issue:

>>> import os
>>> cmd_env = os.environ.copy()
>>> cwd = None
>>> import subprocess
>>> cmd = ['gunicorn', '-b', '127.0.0.1:5000', '-w', '4', 'mlflow.server:app']
>>> cwd = None
>>> child = subprocess.Popen(cmd, env=cmd_env, cwd=cwd, universal_newlines=True,
...                                  stdin=subprocess.PIPE)
>>> [2019-11-25 11:00:43 -0800] [2241] [INFO] Starting gunicorn 19.9.0
[2019-11-25 11:00:43 -0800] [2241] [INFO] Listening at: http://127.0.0.1:5000 (2241)
[2019-11-25 11:00:43 -0800] [2241] [INFO] Using worker: sync
[2019-11-25 11:00:43 -0800] [2297] [INFO] Booting worker with pid: 2297
[2019-11-25 11:00:43 -0800] [2304] [INFO] Booting worker with pid: 2304
[2019-11-25 11:00:43 -0800] [2309] [INFO] Booting worker with pid: 2309
[2019-11-25 11:00:43 -0800] [2321] [INFO] Booting worker with pid: 2321

However, if I copy mlflow_cmd_env (see gist), then I recreate the error:

>>> mlflow_cmd_env = { ....very long.... }
>>> child = subprocess.Popen(cmd, env=mlflow_cmd_env,  cwd=cwd, universal_newlines=True, stdin=subprocess.PIPE)
>>> Traceback (most recent call last):
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/bin/.gunicorn-wrapped", line 6, in <module>
    from gunicorn.app.wsgiapp import run
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 9, in <module>
    from gunicorn.app.base import Application
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/base.py", line 12, in <module>
    from gunicorn import util
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/util.py", line 12, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

Next, look at the differences:

>>> for k,v in cmd_env.items():
...   if mlflow_cmd_env[k]!=v:
...     print(k)
... 
HOST_PATH
out
buildInputs
buildCommandPath
NIX_CFLAGS_COMPILE
NIX_LDFLAGS
PATH

Now, let's figure out which is responsible:

>>> new_cmd['HOST_PATH'] = mlflow_cmd_env['HOST_PATH']
>>> child = subprocess.Popen(cmd, env=new_cmd,  cwd=cwd, universal_newlines=True, stdin=subprocess.PIPE)
>>> [2019-11-25 11:35:39 -0800] [13682] [INFO] Starting gunicorn 19.9.0
[2019-11-25 11:35:39 -0800] [13682] [INFO] Listening at: http://127.0.0.1:5000 (13682)
[2019-11-25 11:35:39 -0800] [13682] [INFO] Using worker: sync
[2019-11-25 11:35:39 -0800] [13702] [INFO] Booting worker with pid: 13702
[2019-11-25 11:35:39 -0800] [13710] [INFO] Booting worker with pid: 13710
[2019-11-25 11:35:39 -0800] [13714] [INFO] Booting worker with pid: 13714
[2019-11-25 11:35:39 -0800] [13717] [INFO] Booting worker with pid: 13717

KeyboardInterrupt
>>> [2019-11-25 11:35:43 -0800] [13682] [INFO] Handling signal: int
[2019-11-25 11:35:43 -0800] [13714] [INFO] Worker exiting (pid: 13714)
[2019-11-25 11:35:43 -0800] [13702] [INFO] Worker exiting (pid: 13702)
[2019-11-25 11:35:43 -0800] [13717] [INFO] Worker exiting (pid: 13717)
[2019-11-25 11:35:43 -0800] [13710] [INFO] Worker exiting (pid: 13710)
[2019-11-25 11:35:43 -0800] [13682] [INFO] Shutting down: Master

KeyboardInterrupt
>>> new_cmd['PATH'] = mlflow_cmd_env['PATH']
>>> child = subprocess.Popen(cmd, env=new_cmd,  cwd=cwd, universal_newlines=True, stdin=subprocess.PIPE)
>>> Traceback (most recent call last):
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/bin/.gunicorn-wrapped", line 6, in <module>
    from gunicorn.app.wsgiapp import run
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 9, in <module>
    from gunicorn.app.base import Application
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/base.py", line 12, in <module>
    from gunicorn import util
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/util.py", line 12, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

So the issue is that PATH changes.

>>> for p in pyshell:
...   if p in mlflow:
...     continue
...   else:
...     print(p)
... 
/nix/store/9d4yqmgbc2a2bmh51h4bw4lbj223j4mz-python3-3.7.5-env/bin
>>> for p in mlflow:
...   if p in pyshell:
...     continue
...   else:
...     print(p)
... 
/nix/store/gpnm7i19lpj8p43mjrdw03d0hjalmskl-python3-3.7.5/bin
/nix/store/vip1apgf32s3ash2gmayjvsn6xw4slwi-python3.7-mlflow-1.4.0/bin
/nix/store/q9m23mv47gmaass0z259dbgg8xr862f9-python3.7-alembic-1.2.1/bin
/nix/store/jhcx4gb0f4l69xbc9kg16n0kzqqim98f-python3.7-Mako-1.1.0/bin
/nix/store/5088myssxnqwx0v0zi077cig4mxcdlz2-python3.7-setuptools-41.4.0/bin
/nix/store/a71ljrc2mi8f2hh28jw0gg7xbqn4zn3b-python3.7-chardet-3.0.4/bin
/nix/store/3d7qbv3gwaj6gjkahall8c50s0bq2csh-python3.7-Flask-1.1.1/bin
/nix/store/x4xgjqhpsgck4qqk9h0gyja4z2z1jm50-python3.7-numpy-1.17.3/bin
/nix/store/mnhf2sayqn6xhy3dis0k84d3jvw477lh-python3.7-xlrd-1.2.0/bin
/nix/store/bq1vk78w92n9kk3ycgrhdx7r8rma50zf-python3.7-tables-3.6.1/bin
/nix/store/397kn759lcxm2x379hh5nbyg081bcf4h-python3.7-pbr-5.4.3/bin
/nix/store/5q3905k786hg0k8q58fxzh4h3gn9v27a-python3.7-python-gflags-3.1.2/bin
/nix/store/vljgxs2rf8zczzrl9dr7v44r2vm3zc5d-python3.7-websocket_client-0.56.0/bin
/nix/store/acp1rmqsfkph4rs6llrdiabgxaicg0fi-python3.7-databricks-cli-0.9.1/bin
/nix/store/nm0phx7v93dxr23yy6ayclx96d1y6r0m-python3.7-tabulate-0.8.5/bin
/nix/store/fw3b63z1s3qz0d6hkbd5wqqng4c7d3ni-python3.7-sqlparse-0.3.0/bin
/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/bin
/nix/store/lkwx29nvav1gm17licakszhgyzrir0vl-python3-3.7.5-env/bin

Huh, that's weird, I'm in python3.7.5, why are all these python3.7 paths here? Sure enough, if I try

Traceback (most recent call last):
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/bin/.gunicorn-wrapped", line 6, in <module>
    from gunicorn.app.wsgiapp import run
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 9, in <module>
    from gunicorn.app.base import Application
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/base.py", line 12, in <module>
    from gunicorn import util
  File "/nix/store/mpvq0adhzjsm3nznya786mcv1198zjm8-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/util.py", line 12, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

Whereas in shell:

$ which gunicorn
/nix/store/9d4yqmgbc2a2bmh51h4bw4lbj223j4mz-python3-3.7.5-env/bin/gunicorn
$ /nix/store/9d4yqmgbc2a2bmh51h4bw4lbj223j4mz-python3-3.7.5-env/bin/gunicorn
usage: gunicorn [OPTIONS] [APP_MODULE]
gunicorn: error: No application module specified.
@jonringer
Copy link
Contributor

jonringer commented Nov 25, 2019

$ mlflow server --host 0.0.0.0
Traceback (most recent call last):
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/bin/.gunicorn-wrapped", line 6, in <module>
    from gunicorn.app.wsgiapp import run
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 9, in <module>
    from gunicorn.app.base import Application
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/app/base.py", line 10, in <module>
    from gunicorn import util
  File "/nix/store/vl5qa9893ckq3pic1lnm3z1wgvzni989-python3.7-gunicorn-20.0.2/lib/python3.7/site-packages/gunicorn/util.py", line 26, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
Running the mlflow server failed. Please see the logs above for details.

this is failing because gunicorn needs setuptools, not your package

@jonringer jonringer mentioned this pull request Nov 25, 2019
5 of 10 tasks complete
@tbenst
Copy link
Contributor Author

tbenst commented Nov 25, 2019

@jonringer doh! I tried the same fix earlier but wrote propogatedBuildInputs facepalm.

Now I get a new error:

[nix-shell:~/code/nixpkgs]$ mlflow server
[2019-11-25 13:48:52 -0800] [19192] [INFO] Starting gunicorn 19.9.0
[2019-11-25 13:48:52 -0800] [19192] [INFO] Listening at: http://127.0.0.1:5000 (19192)
[2019-11-25 13:48:52 -0800] [19192] [INFO] Using worker: sync
[2019-11-25 13:48:52 -0800] [19195] [INFO] Booting worker with pid: 19195
[2019-11-25 13:48:52 -0800] [19195] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/nix/store/057kqmjj9zixqlsgzzmbjvmh3wwinb0l-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/nix/store/057kqmjj9zixqlsgzzmbjvmh3wwinb0l-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/workers/base.py", line 129, in init_process
    self.load_wsgi()
  File "/nix/store/057kqmjj9zixqlsgzzmbjvmh3wwinb0l-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/nix/store/057kqmjj9zixqlsgzzmbjvmh3wwinb0l-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/nix/store/057kqmjj9zixqlsgzzmbjvmh3wwinb0l-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
    return self.load_wsgiapp()
  File "/nix/store/057kqmjj9zixqlsgzzmbjvmh3wwinb0l-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/nix/store/057kqmjj9zixqlsgzzmbjvmh3wwinb0l-python3.7-gunicorn-19.9.0/lib/python3.7/site-packages/gunicorn/util.py", line 350, in import_app
    __import__(module)
ModuleNotFoundError: No module named 'mlflow'
[2019-11-25 13:48:52 -0800] [19195] [INFO] Worker exiting (pid: 19195)
[2019-11-25 13:48:52 -0800] [19192] [INFO] Shutting down: Master
[2019-11-25 13:48:52 -0800] [19192] [INFO] Reason: Worker failed to boot.
Running the mlflow server failed. Please see the logs above for details.

And yet this works:

[nix-shell:~/code/nixpkgs]$ python 
Python 3.7.5 (default, Oct 14 2019, 23:08:55) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mlflow
>>> 

[nix-shell:~/code/nixpkgs]$ gunicorn -b 127.0.0.1:5000 -w 4 mlflow.server:app
[2019-11-25 14:03:31 -0800] [10056] [INFO] Starting gunicorn 19.9.0
[2019-11-25 14:03:31 -0800] [10056] [INFO] Listening at: http://127.0.0.1:5000 (10056)
[2019-11-25 14:03:31 -0800] [10056] [INFO] Using worker: sync
[2019-11-25 14:03:31 -0800] [10059] [INFO] Booting worker with pid: 10059
[2019-11-25 14:03:31 -0800] [10060] [INFO] Booting worker with pid: 10060
[2019-11-25 14:03:31 -0800] [10061] [INFO] Booting worker with pid: 10061
[2019-11-25 14:03:31 -0800] [10062] [INFO] Booting worker with pid: 10062
@tbenst tbenst force-pushed the tbenst:mlflow branch from 5a8b007 to ef67a38 Nov 25, 2019
@jonringer
Copy link
Contributor

jonringer commented Nov 25, 2019

the only thing i can think of, is that the new worker process isn't getting the same PYTHONPATH

@jonringer
Copy link
Contributor

jonringer commented Nov 25, 2019

if mlflow is meant to be an "application", rather a package. You could move it out of python-modules and use python.withPackage(...) to create an interpreter with the dependencies the server needs, otherwise I'm not sure of a good way to allow the worker processes access to those packages.

@tbenst
Copy link
Contributor Author

tbenst commented Nov 25, 2019

@jonringer I need to use mlflow both as a program mlflow server as well as importing it in another program. The problem seems to be that when I do which python in the nix-shell, I get /nix/store/8hmc3nfyqcbcc35khnpf54z7b8h0qrzi-python3-3.7.5-env/bin/python, but if I call which python from subprocess.Popen I get /nix/store/gpnm7i19lpj8p43mjrdw03d0hjalmskl-python3-3.7.5/bin/python, which is the system python and does not have mlflow installed

@tbenst
Copy link
Contributor Author

tbenst commented Dec 3, 2019

Had a useful conversation with @jtojnar on irc, so figured I'd copy here if only to refer back later:

<tbenst> anyone have experience with subprocesses and PATH? having an issue 
         where PATH is changing in python when using subprocess
<tbenst> https://github.com/NixOS/nixpkgs/pull/74091#issuecomment-558367416
<tbenst> I tried adding `shell=True` to the `Popen` call but no dice
<tbenst> I need the `Popen` call to use python where it can import `mlflow`
<jtojnar> tbenst the python from shebang is not part of PATH
<jtojnar> so it will not be available outside of nix-shell
<tbenst> jtojnar, good to know. However I get the same problem with
         `nix-build -I nixpkgs=. -A python3Packages.mlflow && result/bin/mlflow server`
<jtojnar> tbenst if you want to use that, you either need to wrap the script and
          set PATH in the wrapper, or use the python through absolute pah
<jtojnar> maybe through something like sys.executable
<tbenst> jtojnar, the script has the correct PATH. it's just this subprocess call to gunicorn. If I call gunicorn from nix-shell all is good, but as soon as I call it through subprocess it no longer has correct PATH
<tbenst> interesting I'll take a look at sys.executable
<jtojnar> tbenst well, then it does not have the correct PATH
<jtojnar> tbenst sy.executable will not work if this comment is correct
          https://github.com/NixOS/nixpkgs/blob/d2b71c643a2ef49b57497987f847d4c366604c6f/pkgs/development/tools/pipenv/default.nix#L31-L35
<jtojnar> will need the direct path to interpreter
<tbenst> jtojnar, very cool, giving that a try, ty
<jtojnar> tbenst looking at https://github.com/NixOS/nixpkgs/pull/74091#issuecomment-558356275,
          what you want to do is have dependency pick up module from environment
<tbenst> jtojnar, sorry I didn't quite understand "dependency pick up module from
         environment", could you expand on that?
<tbenst> I think that means adding the PATH for mlflow to cmd_env["PATH"]
<jtojnar> tbenst mlflow provides a Python module and then runs its dependency
          gunicorn  and expects it to find the module
<jtojnar> tbenst for that you would need the subprocess to preserve PYTHONPATH
          env var (or possibly extend it with the value of `site.getsitepackages`)
<tbenst> jtojnar, I tried `cmd_env['PYTHONPATH'] = ':'.join(site.getsitepackages())`
         but didn't fix it
* jtojnar sent a long message:  < https://matrix.org/_matrix/media/r0/download/matrix.org/LHmQBUpAkMUrPcRzhujTGWOu >
<jtojnar> tbenst do you see it in gunicorn?
<tbenst> jtojnar, not sure I understand the question, or at least not sure how to
         answer it. Right now I'm trying to just call gunicorn directly by overriding the mlflow script, but finding it mighty challenging
<jtojnar> tbenst it would be nice to see if the env var/sitepackages are getting
          through the subprocess

I missed the long message at first, reposting here in case matrix deletes.

I did an experiment that confirms @jtojnar's suspicions that PYTHONPATH is somehow not being set in the subprocess:

patchPhase = ''
    substituteInPlace mlflow/utils/process.py --replace \
      "child = subprocess.Popen(cmd, env=cmd_env, cwd=cwd, universal_newlines=True," \
      "import site; py_path=':'.join(site.getsitepackages()); print('MAINPROC '+py_path); cmd_env['PYTHONPATH'] = py_path; child = subprocess.Popen(['echo', 'SUBPROCESS ', '$PYTHONPATH'], env=cmd_env, cwd=cwd, universal_newlines=True,"
  '';

Output: https://gist.github.com/tbenst/f13ad655e6ad4cc6dae31c49b9f3643a

A few things are very odd namely:

❯ nix-shell -p python3
$ python
>>> import os
>>> import site
>>> py_path=':'.join(site.getsitepackages())
>>> cmd_env = os.environ.copy()
>>> cmd_env["PYTHONPATH"] = py_path
>>> import subprocess
>>> subprocess.Popen(['echo', 'SUBPROCESS ', '$PYTHONPATH'], env=cmd_env, cwd=None, universal_newlines=True, stdin=subprocess.PIPE)
<subprocess.Popen object at 0x7f14357cda90>
SUBPROCESS  $PYTHONPATH
>>> subprocess.Popen('echo SUBPROCESS $PYTHONPATH', env=cmd_env, cwd=None, universal_newlines=True, stdin=subprocess.PIPE, shell=True)
<subprocess.Popen object at 0x7f14357d2278>
SUBPROCESS /nix/store/s5f3vpmig33nk4zyk228q55wdydd3pc2-python3-3.7.3/lib/python3.7/site-packages
  • secondly, even though I set PYTHONPATH it appears to have no effect. Once again, this appears to only happen in the mlflow script. It works fine if I call python3:
>>> cmd_env["PYTHONPATH"] = "hello"
>>> subprocess.Popen('echo SUBPROCESS $PYTHONPATH', env=cmd_env, cwd=None, universal_newlines=True, stdin=subprocess.PIPE, shell=True)
<subprocess.Popen object at 0x7f14357cdf60>
SUBPROCESS hello
@tbenst
Copy link
Contributor Author

tbenst commented Dec 3, 2019

It appears that this issue is caused by #23676

@jonringer
Copy link
Contributor

jonringer commented Dec 3, 2019

What you're looking for is buildPythonApplication, I would recommend adding a mlflow = with python3Packages; toPythonApplication mlflow; to all-packages.nix, then just call nix-shell -p mlflow.

❯ nix-shell -p python3
$ python
>>> import os
>>> import site
>>> py_path=':'.join(site.getsitepackages())
>>> cmd_env = os.environ.copy()
>>> cmd_env["PYTHONPATH"] = py_path
>>> import subprocess
>>> subprocess.Popen(['echo', 'SUBPROCESS ', '$PYTHONPATH'], env=cmd_env, cwd=None, universal_newlines=True, stdin=subprocess.PIPE)
<subprocess.Popen object at 0x7f14357cda90>
SUBPROCESS  $PYTHONPATH
>>> subprocess.Popen('echo SUBPROCESS $PYTHONPATH', env=cmd_env, cwd=None, universal_newlines=True, stdin=subprocess.PIPE, shell=True)
<subprocess.Popen object at 0x7f14357d2278>
SUBPROCESS /nix/store/s5f3vpmig33nk4zyk228q55wdydd3pc2-python3-3.7.3/lib/python3.7/site-packages

you just happened to construct the only path that matters, the python standard lib (site packages) gets put on PYTHONPATH due to the interpreter being wrapped:

[nix-shell:~/projects/nixpkgs]$ python
Python 3.7.5 (default, Oct 14 2019, 23:08:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import subprocess
>>> subprocess.Popen(['echo $PYTHONPATH'], stdin=subprocess.PIPE, shell=True)
<subprocess.Popen object at 0x7f2331c81dd0>
>>> /nix/store/gpnm7i19lpj8p43mjrdw03d0hjalmskl-python3-3.7.5/lib/python3.7/site-packages:/nix/store/gpnm7i19lpj8p43mjrdw03d0hjalmskl-python3-3.7.5/lib/python3.7/site-packages
@jonringer
Copy link
Contributor

jonringer commented Dec 3, 2019

while your package is in python-modules, you will need to patch the source code to allow for the packages to call commands or reference store paths. If you're just an application, then you're free to use wrapping mechanisms to your pleasure

@jonringer
Copy link
Contributor

jonringer commented Dec 3, 2019

[11:12:46] jon@jon-workstation ~/projects/nixpkgs ((ef67a380caa...))
$ nix-shell -p "with import ./. {}; with python3Packages; toPythonApplication mlflow"

[nix-shell:~/projects/nixpkgs]$ mlflow --help
Usage: mlflow [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  artifacts    Upload, list, and download artifacts from an MLflow artifact...
  azureml      Serve models on Azure ML.
  db           Commands for managing an MLflow tracking database.
  experiments  Manage experiments.
  models       Deploy MLflow models locally.
  run          Run an MLflow project from the given URI.
  runs         Manage runs.
  sagemaker    Serve models on SageMaker.
  server       Run the MLflow tracking server.
  ui           Launch the MLflow tracking UI for local viewing of run...

[nix-shell:~/projects/nixpkgs]$ mlflow server
[2019-12-03 11:13:26 -0800] [20735] [INFO] Starting gunicorn 19.9.0
[2019-12-03 11:13:26 -0800] [20735] [INFO] Listening at: http://127.0.0.1:5000 (20735)
[2019-12-03 11:13:26 -0800] [20735] [INFO] Using worker: sync
[2019-12-03 11:13:26 -0800] [20738] [INFO] Booting worker with pid: 20738
[2019-12-03 11:13:26 -0800] [20739] [INFO] Booting worker with pid: 20739
[2019-12-03 11:13:26 -0800] [20740] [INFO] Booting worker with pid: 20740
[2019-12-03 11:13:26 -0800] [20741] [INFO] Booting worker with pid: 20741
@tbenst tbenst force-pushed the tbenst:mlflow branch from ef67a38 to 3cb9de5 Dec 3, 2019
@tbenst
Copy link
Contributor Author

tbenst commented Dec 3, 2019

@jonringer ah, I didn't know about toPythonApplication, neat! I think this resolves the issue for now as I believe mlflow only uses subprocess when used as an cmdline app, but I'm about to do more testing to verify. If I'm right, this should be good for final review & ready for merge after gunicorn is merged from staging-next.

@tbenst
Copy link
Contributor Author

tbenst commented Dec 5, 2019

personally, i would move mflow out of python-packages, and use buildPythonApplication instead. It would solve most of your issues.

I am using mlflow = with python3Packages; toPythonApplication mlflow; in all-packages.nix; would buildPythonApplication do something different? If so, I suppose I could make a separate nix derivation for python-modules and the application itself. In order to use mlflow, need both A) a server running via mlflow server and B) need to import mlflow in your python code.

Edit: no difference in behavior using buildPythonApplication. subprocess is still broken and cannot pass PYTHONPATH. I think the current solution is only viable option given the current site-initialization approach in nixpkgs.

here's the branch for buildPythonApplication: https://github.com/tbenst/nixpkgs/tree/mlflow-app

@tbenst tbenst force-pushed the tbenst:mlflow branch from ffd7467 to 21bd378 Dec 5, 2019
@tbenst
Copy link
Contributor Author

tbenst commented Dec 15, 2019

@jonringer just wanted to check on this? I've been using this package successfully in production for over a week now with no issues. Let me know if you think any of above threads are not resolved

@tbenst tbenst force-pushed the tbenst:mlflow branch 2 times, most recently from 3c913ac to c86f61a Jan 6, 2020
@tbenst
Copy link
Contributor Author

tbenst commented Jan 7, 2020

@jonringer ok implemented #74091 (comment). I think this addresses final concern

@tbenst tbenst force-pushed the tbenst:mlflow branch from c86f61a to cf30415 Jan 7, 2020
@tbenst tbenst mentioned this pull request Jan 7, 2020
Copy link
Contributor

tomberek left a comment

Tested on NixOS. Functions as expected when used with --no-conda.

Copy link
Contributor

jonringer left a comment

mostly LGTM

a little concerned about the non-standard mlflow-server package, but it is a convenient way to make a gunicorn web server. @FRidh thoughts?

pkgs/servers/mlflow-server/default.nix Outdated Show resolved Hide resolved
pkgs/top-level/all-packages.nix Show resolved Hide resolved
@tbenst tbenst force-pushed the tbenst:mlflow branch from cf30415 to 41db39a Feb 3, 2020
@tbenst
Copy link
Contributor Author

tbenst commented Feb 3, 2020

@jonringer thanks for the excellent feedback and support on this issue! Learned a lot. This is ready for final review.

@worldofpeace @disassembler would love to get this in 20.03 milestone as possible

@tbenst tbenst force-pushed the tbenst:mlflow branch from 41db39a to e91f7ee Feb 3, 2020
@tbenst tbenst force-pushed the tbenst:mlflow branch 2 times, most recently from af9b214 to 9fe16c4 Feb 4, 2020
@worldofpeace
Copy link
Member

worldofpeace commented Feb 5, 2020

@tbenst Backports of adding packages is fine even after the release.

pkgs/servers/mlflow-server/default.nix Outdated Show resolved Hide resolved
pkgs/servers/mlflow-server/default.nix Outdated Show resolved Hide resolved
pkgs/servers/mlflow-server/default.nix Outdated Show resolved Hide resolved
pkgs/top-level/all-packages.nix Outdated Show resolved Hide resolved
@tbenst tbenst force-pushed the tbenst:mlflow branch from 9fe16c4 to 540255a Feb 16, 2020
@tbenst
Copy link
Contributor Author

tbenst commented Feb 16, 2020

Made the requested changes!

Copy link
Contributor

jonringer left a comment

LGTM

[17 built, 2 copied (0.4 MiB), 0.1 MiB DL]
https://github.com/NixOS/nixpkgs/pull/74091
11 package built:
mlflow-server python27Packages.databricks-cli python27Packages.gorilla python37Packages.databricks-cli python37Packages.gorilla python37Packages.mlflow python37Packages.querystring_parser python38Packages.databricks-cli python38Packages.gorilla python38Packages.mlflow python38Packages.querystring_parser
@jonringer jonringer merged commit a35a280 into NixOS:master Feb 16, 2020
@tbenst
Copy link
Contributor Author

tbenst commented Feb 16, 2020

@worldofpeace possible to add this to 20.03 / to 20.03 backports? Let me know how I can help. Thank you!

@tbenst tbenst mentioned this pull request May 9, 2020
2 of 10 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.