Unified wrapping method #991

remram44 · 2015-01-08T15:30:29Z

We have several libraries automatically-wrapped as VisTrails packages. Unfortunately, each one is using a different method, which can lead to problems (when using a method that is not optimal), code duplication (they all do basically the same thing), and lesser maintainability.

Packages can be created in different ways:

Statically, i.e. we generate Python code that is loaded normally by VisTrails
Dynamically, i.e. Python creates modules using type() without generating code

Packages can be created from different sources:

Either directly from introspecting the library (and for instance parsing docstrings)
Or from an intermediate representation (XML, JSON, ...)

Having an intermediate representation allows for manual corrections, and if the wrapping is dynamic, is the only way we can be sure the wrapped package is always exactly the same no matter the installed library version.

What we have:

VTK
- dynamic, no intermediate repr, introspects Python classes in vtk module
matplotlib
- static, intermediate XML with patch mechanism, parses docstrings
Java (WIP, Wrapping Java libraries in VisTrails #820)
- static, intermediate JSON, uses Java language introspection and source parser
scikit-learn (WIP, scikit-learn package #955, @amueller)
- dynamic, no intermediate repr, ???
numpy/scipy (WIP, @tacaswell)
- dynamic, intermediate JSON, parses docstrings

The text was updated successfully, but these errors were encountered:

amueller · 2015-01-08T15:36:45Z

Can we maybe discuss next Monday? We are just working on a patch that will improve parsing of docstrings. That will require the newest version then....

remram44 · 2015-01-08T15:40:53Z

For my opinion:

I'm not sure generating Python code is a good idea, since it's kind of hard (even with text templates, as done in matplotlib), doesn't really help understanding what's happening (since the code is usually hard to read, cf matplotlib's). I'm not sure about performance here (parsing Python code is not necessarily fast).

If emitting the modules dynamically, I think having an intermediate representation is a must-have, else the package can have different contents on different machines. I think we all agree on this since all wrappers work this way (except @amueller's?)

I don't really have a preference on the intermediate representation format. I do think having a way to patch (like the matplotlib package's) is a very nice feature.

dakoop · 2015-01-08T15:42:47Z

I created a branch vtk-new-package on the vistrails.org repo that worked to implement a static version similar to the matplotlib representation for VTK. One interesting item was that it seemed like the dynamic generation was faster to load than the static version because of some inefficiencies when parsing port specs. I haven't looked at this for a bit, but it should be included in the conversation.

I agree that an intermediate representation is desirable and patching should be supported.

remram44 · 2015-01-08T15:42:49Z

@amueller: yes definitely, I'm just gathering info here for the next meeting ;)

If we come up with a wonderful and unified system for wrappers (hard in itself), migrating everything is a lot of work, this is definitely not happening overnight.

tacaswell · 2015-01-08T15:43:16Z

My implementation is targeted at parsing numpydoc formatted docstrings + python introspection.

Currently only functions are handled, but getting classes in is on my near-term to-do list.

I am completely sold on the need for intermediate representations.

I live in Queens, would it be helpful for me to meet with you all in person?

amueller · 2015-01-08T16:05:57Z

For the intermediate representation:
That means that you either need to support only a specific version of a package or that you need to write code to make the wrappers compatible with several versions. How do you propose to do that?
I don't see how the package can have the same content using different versions of the wrapped package.

remram44 · 2015-01-08T16:14:09Z

Using an older version of the library would mean that some modules don't work. This is unavoidable, but IMHO better than having VisTrails report that these modules "don't exist".

But this means that users need to update that intermediate representation to use new features of the wrapped library. We can make this easy (provide a button inside VisTrails?).

I didn't think too much about the problem you describe (supporting multiple versions); I'm not sure what we can do if new library versions break older versions of the wrapper.

In any case, we can warn when the wrapper and wrapped versions don't match.

amueller · 2015-01-08T16:19:12Z

I guess I didn't think about the usecase when someone gets a file from someone else with a new module. Then it seems better to say "you need a newer version of the lib for this module to work".
Usually I would have thought the other way around: If you are just creating a new pipeline, I would rather not list modules that don't work anyhow.

Btw, the installation base for most python libraries is very diverse, as packages get regularly updated.
So you should assume that basically none of your users uses the same version as the one you wrapped. At least that is what you should assume for numpy, scipy, matplotlib and scikit-learn.
If you did not install exactly the same version of anaconda and did not upgrade anything, I daresay the chances that all these have the same version as the one you wrapped are basically zero.

tacaswell · 2015-01-08T16:26:12Z

The intermediate representations are relatively cheap (just a text file), it might be worth having versions of that for all of the versions of the library you want to support.

The init.py (which turns that intermediate representation into Modules) can intropsect the version of the wrapped library and select the correct intermediate file.

I don't fully have the full VT data model in my head yet, but could you also propagate this information into the VT version-tracking scheme to allow for smooth(ish) upgrades as well?

dakoop · 2015-01-08T16:34:00Z

I agree with @tacaswell here. Our goal is ensure reproducibility but still allow people to use whatever versions they when possible. I do think that multiple versions is the way to go. With the VTK wrapping, we have had cases where a vistrail that worked on one user's machine will not on another user's machine even though the VisTrails package version is the same because the dynamic wrapping picks up different classes. In addition, the patching should mostly work across multiple versions (it should not need to be updated for each version as these are the special cases that are not handled by the parser).

We already have an upgrade scheme in VisTrails that allows developers to specify the upgrade paths between package versions. It could use some improvement, but it allows mostly automatic upgrades in a way that records which version of the package is being used.

amueller · 2015-01-08T16:34:18Z

The problem is that if you "just" wrap the version that is available at the moment, it will error when someone uses an old version of the library and you load a pipeline that uses a newer version (ports might appear or disappear etc), which I think is the problem that @remram44 alluded to.

Maybe the easiest way would be to store the version with which a pipeline was created and if there is any error in the loading, give a message about version mismatch.

If you do this, however, I don't see the benefit of an intermediate representation.

dakoop · 2015-01-08T16:36:55Z

I think our messages crossed, but the intermediate representation is very useful for patching (when you need to work around an issue with automated generation). I would argue that hard-coding the exceptional cases into the dynamic generator makes updates more difficult.

amueller · 2015-01-08T16:36:56Z

On 01/08/2015 11:34 AM, dakoop wrote:

With the VTK wrapping, we have had cases where a vistrail that worked
on one user's machine will not on another user's machine even though
the VisTrails package version is the same because the dynamic wrapping
picks up different classes.
With the scikit-learn wrappers, that can not really happen.
Scikit-learn uses the same methods in our tests.

amueller · 2015-01-08T16:39:31Z

On 01/08/2015 11:36 AM, dakoop wrote:

I think our messages crossed, but the intermediate representation is
very useful for patching (when you need to work around an issue with
automated generation). I would argue that hard-coding the exceptional
cases into the dynamic generator makes updates more difficult.

My wrappers are functional without hard coded cases ;)
Adding the whole of scikit-learn is literally 359 lines (no
documentation and testing yet, though).

amueller · 2015-01-08T16:43:45Z

Anyhow, I don't want to argue too much if you are all of the opinion that intermediate formats are great. So what does dynamic with intermediate format mean? Does it mean the wrapper is created dynamically from the intermediate format or that the intermediate format is created dynamically?

remram44 · 2015-01-08T16:45:43Z

The former; "dynamic" means no Python source generation. Of course it's not "truly dynamic" if generating from the static intermediate files.

amueller · 2015-01-08T16:50:59Z

Fair enough. So I could achieve that by splitting all of my functions into two, one that writes the variables that I extracted into a json, and one that reads the json and generates a class from it. And reading would happen on load, while the writing would happen offline. Seems like a small-enough change.
Generating the jsons would require a script that goes through the last x releases, creates virtualenvs with the release, and runs the generation. A bit of hackery there but not too bad.
Then on dynamic generation you could just check the version string and match it with the json version.

If you do that

the only way we can be sure the wrapped package is always exactly the same no matter the installed library version.

is not true any more, though.

tacaswell · 2015-01-08T18:11:17Z

@amueller That is exactly what I did in terms of splitting the logic in two.

The code that extracts the meta-data ended up being independent of VT which makes it more generally useful (for example if you wanted to go through a library and add type-hinting). It is also in the back of my head to try to move the numpydoc scraping code into numpydoc or a stand-alone project.

remram44 · 2015-01-21T17:27:17Z

After the discussion in today's meeting:
@rexissimus will try and come up with a nice wrapping system, with intermediate representation (but w/o Python code generation?) and use it for VTK first.

tacaswell · 2015-01-21T17:34:55Z

Bah, sorry I missed today's meeting.

@rexissimus please have a look at wrap_lib.py and scrape.py in https://github.com/Nikea/VTTools/tree/master/vttools which I think are already a nice wrapping system with an intermediate representation.

rexissimus · 2015-01-21T21:40:45Z

Thanks, I will do that.

rexissimus · 2015-03-20T23:16:51Z

The general wrapper based on VTK is here: https://github.com/VisTrails/VisTrails/tree/python-wrapper

It provides a specification for wrapping python functions and classes that vistrails then knows how to execute.

remram44 added C-packages T-task S-wishlist labels Jan 8, 2015

remram44 added the A-question label Jan 14, 2015

This was referenced Apr 2, 2015

Package and Library Versions #542

Closed

Packages that wrap libraries should not dynamically wrap on init #795

Closed

remram44 added the C-upgrades label Apr 8, 2015

remram44 added this to the version 3.0 milestone Oct 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified wrapping method #991

Unified wrapping method #991

remram44 commented Jan 8, 2015

amueller commented Jan 8, 2015

remram44 commented Jan 8, 2015

dakoop commented Jan 8, 2015

remram44 commented Jan 8, 2015

tacaswell commented Jan 8, 2015

amueller commented Jan 8, 2015

remram44 commented Jan 8, 2015

amueller commented Jan 8, 2015

tacaswell commented Jan 8, 2015

dakoop commented Jan 8, 2015

amueller commented Jan 8, 2015

dakoop commented Jan 8, 2015

amueller commented Jan 8, 2015

amueller commented Jan 8, 2015

amueller commented Jan 8, 2015

remram44 commented Jan 8, 2015

amueller commented Jan 8, 2015

tacaswell commented Jan 8, 2015

remram44 commented Jan 21, 2015

tacaswell commented Jan 21, 2015

rexissimus commented Jan 21, 2015

rexissimus commented Mar 20, 2015

Unified wrapping method #991

Unified wrapping method #991

Comments

remram44 commented Jan 8, 2015

amueller commented Jan 8, 2015

remram44 commented Jan 8, 2015

dakoop commented Jan 8, 2015

remram44 commented Jan 8, 2015

tacaswell commented Jan 8, 2015

amueller commented Jan 8, 2015

remram44 commented Jan 8, 2015

amueller commented Jan 8, 2015

tacaswell commented Jan 8, 2015

dakoop commented Jan 8, 2015

amueller commented Jan 8, 2015

dakoop commented Jan 8, 2015

amueller commented Jan 8, 2015

amueller commented Jan 8, 2015

amueller commented Jan 8, 2015

remram44 commented Jan 8, 2015

amueller commented Jan 8, 2015

tacaswell commented Jan 8, 2015

remram44 commented Jan 21, 2015

tacaswell commented Jan 21, 2015

rexissimus commented Jan 21, 2015

rexissimus commented Mar 20, 2015