Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign uppythonPackages.mlflow: init at 1.4.0 #74091
Conversation
This comment has been minimized.
This comment has been minimized.
Here's one issue I'm stuck on:
This appears to be benoitc/gunicorn#1716. So I tried adding Also possibly related to #68314. Anyone know how to fix this? |
This comment has been minimized.
This comment has been minimized.
I originally was going to post this in tl;dr why are python3.7 paths being appended to my PATH for some packages instead of the relevant python3.7.5 paths? System information
Describe the problemProvide the exact sequence of commands / steps that you executed before running into the problem.
At first I thought this was benoitc/gunicorn#1716, so I added Other info / logsSince there's no stacktrace back to mlflow, I next tried following the code base by eye to see how the error occurs. I'd like to note that
succeeds. Additionally, in a python shell, I can do the following without any issue:
However, if I copy
Next, look at the differences:
Now, let's figure out which is responsible:
So the issue is that
Huh, that's weird, I'm in python3.7.5, why are all these python3.7 paths here? Sure enough, if I try
Whereas in shell:
|
This comment has been minimized.
This comment has been minimized.
this is failing because gunicorn needs setuptools, not your package |
This comment has been minimized.
This comment has been minimized.
@jonringer doh! I tried the same fix earlier but wrote propogatedBuildInputs facepalm. Now I get a new error:
And yet this works:
|
This comment has been minimized.
This comment has been minimized.
the only thing i can think of, is that the new worker process isn't getting the same PYTHONPATH |
This comment has been minimized.
This comment has been minimized.
if mlflow is meant to be an "application", rather a package. You could move it out of python-modules and use |
This comment has been minimized.
This comment has been minimized.
@jonringer I need to use mlflow both as a program |
This comment has been minimized.
This comment has been minimized.
Had a useful conversation with @jtojnar on irc, so figured I'd copy here if only to refer back later:
I missed the long message at first, reposting here in case matrix deletes. I did an experiment that confirms @jtojnar's suspicions that PYTHONPATH is somehow not being set in the subprocess:
Output: https://gist.github.com/tbenst/f13ad655e6ad4cc6dae31c49b9f3643a A few things are very odd namely:
|
This comment has been minimized.
This comment has been minimized.
It appears that this issue is caused by #23676 |
This comment has been minimized.
This comment has been minimized.
What you're looking for is buildPythonApplication, I would recommend adding a
you just happened to construct the only path that matters, the python standard lib (site packages) gets put on PYTHONPATH due to the interpreter being wrapped:
|
This comment has been minimized.
This comment has been minimized.
while your package is in python-modules, you will need to patch the source code to allow for the packages to call commands or reference store paths. If you're just an application, then you're free to use wrapping mechanisms to your pleasure |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
@jonringer ah, I didn't know about |
This comment has been minimized.
This comment has been minimized.
Definitely progress as According to @FRidh's comment, I thought that setting NIX_PYTHONPATH would append to PYTHONPATH, but any value that I set for I basically want the initialization that happens with I suppose I could always run |
This comment has been minimized.
This comment has been minimized.
if you want things set in that manner, you will want to use |
This comment has been minimized.
This comment has been minimized.
@jonringer not sure I understand--are you suggesting I add Edit: didn't mean to sound like I have a preferred solution in the last comment. I'd simply be happy with |
This comment has been minimized.
This comment has been minimized.
the |
This comment has been minimized.
This comment has been minimized.
@jonringer I think this is what you mean? Doesn't apply to subprocess :/: configuration.nix:
Edit: I suppose I could write my own |
This comment has been minimized.
This comment has been minimized.
That should work.... as it will load all transitive dependencies into a different env derivation:
|
This comment has been minimized.
This comment has been minimized.
@jonringer any idea how to add mlflow from within the derivation?
|
This comment has been minimized.
This comment has been minimized.
Got it working! Trick was to manually replicate the basic edit: might be a more elegant approach in https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/version-management/sourcehut/default.nix but not clear if that would work, so I'm content for now |
5670584
to
ffd7467
personally, i would move mflow out of python-packages, and use |
]; | ||
|
||
checkPhase = "pytest tests"; | ||
# tests folder is missing in PyPI |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
tbenst
Dec 5, 2019
Author
Contributor
The tests require internet access: https://github.com/databricks/databricks-cli/blob/aad9f95bffbc607b02558b0a847041e9e8fe93c9/tests/runs/test_cli.py#L58. The package is needed in mlflow due to this import: https://github.com/mlflow/mlflow/blob/279669ff4d198b2c662e9ee60f8575e421b2ae91/mlflow/utils/databricks_utils.py#L7.
This comment has been minimized.
This comment has been minimized.
jonringer
Dec 5, 2019
Contributor
do you have an example of how you would like to use mlflow? probably help me gauge what would be reflected in the expressions
This comment has been minimized.
This comment has been minimized.
tbenst
Dec 5, 2019
Author
Contributor
definitely, here's a basic one: https://gist.github.com/tbenst/e793704190322915e9641c64351322a2
Essentially, I use mlflow to track model performance (similar to tensorboard if you're familiar) and store model parameters. I have a box on aws running mlflow server
, and then use mlflow
in local python scripts to log to the server.
Really appreciate all of your help!
This comment has been minimized.
This comment has been minimized.
tbenst
Dec 5, 2019
Author
Contributor
should also mention that databricks-cli is only needed for using their SaaS offering of mlflow server
. I don't use this, so at least as far as I'm concerned could just remove the import, but thought I'd leave it in in case another nix user wants it + only have goodwill towards DataBricks for open-sourcing everything.
patchPhase = '' | ||
substituteInPlace mlflow/utils/process.py --replace \ | ||
"child = subprocess.Popen(cmd, env=cmd_env, cwd=cwd, universal_newlines=True," \ | ||
"cmd[0]='$out/bin/gunicornMlflow'; child = subprocess.Popen(cmd, env=cmd_env, cwd=cwd, universal_newlines=True," | ||
''; | ||
|
||
gunicornScript = writeText "gunicornMlflow" | ||
'' | ||
#!/usr/bin/env python | ||
import re | ||
import sys | ||
from gunicorn.app.wsgiapp import run | ||
if __name__ == '__main__': | ||
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', ''', sys.argv[0]) | ||
sys.exit(run()) | ||
''; | ||
|
||
postInstall = '' | ||
gpath=$out/bin/gunicornMlflow | ||
cp ${gunicornScript} $gpath | ||
chmod 555 $gpath | ||
''; |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
tbenst
Dec 5, 2019
Author
Contributor
Even if I move out of python-modules
and use buildPythonApplication
this is still needed. Without, gunicorn
will not be able to import mlflow
, which is necessary.
Note that this is a subtle hack--I'm leveraging
]; | ||
|
||
checkPhase = "python querystring_parser/tests.py"; | ||
# one test fails due to https://github.com/bernii/querystring-parser/issues/35 |
This comment has been minimized.
This comment has been minimized.
jonringer
Dec 4, 2019
Contributor
you just want to run unit tests, you shoudl be able to exclude integration tests
This comment has been minimized.
This comment has been minimized.
tbenst
Dec 5, 2019
Author
Contributor
There are no integration tests, only unit tests: https://github.com/bernii/querystring-parser/blob/master/querystring_parser/tests.py
This comment has been minimized.
This comment has been minimized.
I am using Edit: no difference in behavior using here's the branch for buildPythonApplication: https://github.com/tbenst/nixpkgs/tree/mlflow-app |
tbenst commentedNov 25, 2019
Motivation for this change
add mlflow, an Open source platform for the machine learning lifecycle. Note that this package is only partially functional on NixOS, and is not intended to support features requiring conda.
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)