Skip to content
This repository has been archived by the owner on Aug 13, 2018. It is now read-only.

Resources generalisation in knit #101

Merged
merged 8 commits into from
Oct 23, 2017
Merged

Conversation

martindurant
Copy link
Member

@martindurant martindurant commented Oct 21, 2017

  • files is a list for uploading, and should include any zipped python env. Things
    ending in .zip will be treated as archives with a name like the filename minus the
    .zip. Existing resources in hdfs can also be passed (no upload).
  • environment (set of key/values) is passed as a dictionary.
  • for now, dask launch command is simply derived from the environment passed, as
    before.

Meantime:
changed application and client kill to atexit.

- files is a list for uploading, and should include any zipped python env. Things
ending in .zip will be treated as archives with a name like the filename minus the
.zip. Existing resources in hdfs can also be passed (no upload).
- environment is passed as a dictionary.
- for now, dask launch command is simply derived from the environment passed, as
before.

Meantime:
changed application and client kill to atexit.
@martindurant
Copy link
Member Author

Notes:

  • needs careful documentation
  • atexit method works in absolutely getting rid of hanging processes and applications, but often causes error messaging in the console at shutdown. This is presumably because everything it torn down at the same time; I don't know how to avoid it.

@martindurant
Copy link
Member Author

Fixes #100

Explicit environment variables should override what `lang` may set.
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the interface put forth here looks good to me.

@@ -3,7 +3,7 @@

from .utils import *
from .core import *
from .env import CondaCreator
from .env import CondaCreator, zip_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this to the top-level namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a generally useful function, e.g., for what you are doing with folder zipping. The lines above tend to import everything.

knit/core.py Outdated
@@ -248,6 +263,7 @@ def preexec_func():
# preexec_fn not supported on Windows
proc = Popen(args, stdin=PIPE)
self.proc = proc
atexit.register(self.__del__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is what you want to do, as __del__ will be called twice for every object then. Instead the following might be better:

  • On __init__ add the knit object to a weakref.WeakSet
  • Add a atexit.register call to a function that will iterate over this set at close and remove any lingering instances.

The only reason there would be lingering objects is if there is a reference cycle, which leads to no __del__ in the cycle being called. These objects will still be in the weakset, allowing the exit handler to finish them up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be enough to set self.app_id = None, no? del does nothing after that happens.
The error messages you sometimes get come from py4j's watcher thread, which communicates with the java process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering the method will hold on to a reference to the object, preventing actual cleanup until exit in all cases.

@@ -423,7 +444,7 @@ def logs(self, shell=False):
def print_logs(self, shell=False):
"""print out a more console-friendly version of logs()"""
for l, v in self.logs(shell).items():
print('Container ', l, ', id ', v.get('id', 'None'), '\n')
print('\n### Container ', l, ', id ', v.get('id', 'None'), ' ###\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes the printout easier to see, purely visual.


app_id = self.knit.start(command, env=self.env,
bn = os.path.basename(self.env)
pref = bn + '/' + bn[:-4] # like myenv.zip/myenv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? It seems to me the environment zip file shouldn't unzip to a directory with .zip in the name. I'd expect unzip myenv.zip to result in a folder myenv, meaning pref should be myenv/bin/python. This seems to imply that unzip myenv.zip results in a folder myenv.zip with a nested folder named myenv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way you probably should be using os.path.join and os.path.splitext instead of + and [:-4].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zip files are always expanded inside a directory, so if you pass in my.zip made from directory my/, you will get my.zip/my/files. The top level could be named anything, but it seems pretty clear to me.
I wouldn't use path.join, since the containers will always run on a unix system, whereas it should be possible to launch applications from windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't expand a zip file in the worker root folder, or expand all of them in a folder and then cd into that folder? The behavior here is very unexpected (at least to me).

I wouldn't use path.join, since the containers will always run on a unix system, whereas it should be possible to launch applications from windows.

I'd argue that you should always write what's intended, even if the system it's running on will never need to be cross platform. os.path.join means "join some paths together", while x + '/' + x2... means concatenate some strings. Only in context can I see that the string concatenation is joining paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't expand a zip file in the worker root folder, or expand all of them in a folder and then cd into that folder?

No, all you do is give a name to YARN, which takes care of doing the unzipping.

if the system it's running on will never need to be cross platform.

I mean that we are testing linux/unix and everything is fine, but when this will be used on windows/unix, then path.join will break. In fact, yarn - the server - could be run on windows, I suppose, but probably no one does that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, ok. I found very little documentation about this. There's a stack-overflow post saying that spark-submit does the same thing, and there's the yarn docs which indicate how the local cwd is setup for files but not for zip files (first answer in FAQs, can't link to it directly). I still find this behavior surprising and non-intuitive.

I mean that we are testing linux/unix and everything is fine, but when this will be used on windows/unix, then path.join will break.

Since this is meant to be used on the edge node, I'd expect the resourcemanager and edge node to share the same OS. I've run into enough buggy code doing path processing using string operations instead of path operations that maybe I'm overly wary of this.

@martindurant
Copy link
Member Author

@jcrist - is this how it's done? I'm not experienced with weakref.

knit/core.py Outdated
@@ -87,6 +90,7 @@ class Knit(object):

JAR_FILE = "knit-1.0-SNAPSHOT.jar"
JAVA_APP = "io.continuum.knit.Client"
instances = weakref.WeakSet()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably move the instances set to a global, and the cleanup class method to a function, but that's just a style thing. If left on the class I think they both should be private (_ prefixed) at least. Otherwise this looks good.

@martindurant
Copy link
Member Author

I stuck with the class attribute and method as, in my mind, this is better for possible subclassing - not that I expect much of that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants