New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified wrapping method #991
Comments
Can we maybe discuss next Monday? We are just working on a patch that will improve parsing of docstrings. That will require the newest version then.... |
For my opinion: I'm not sure generating Python code is a good idea, since it's kind of hard (even with text templates, as done in matplotlib), doesn't really help understanding what's happening (since the code is usually hard to read, cf matplotlib's). I'm not sure about performance here (parsing Python code is not necessarily fast). If emitting the modules dynamically, I think having an intermediate representation is a must-have, else the package can have different contents on different machines. I think we all agree on this since all wrappers work this way (except @amueller's?) I don't really have a preference on the intermediate representation format. I do think having a way to patch (like the matplotlib package's) is a very nice feature. |
I created a branch I agree that an intermediate representation is desirable and patching should be supported. |
@amueller: yes definitely, I'm just gathering info here for the next meeting ;) If we come up with a wonderful and unified system for wrappers (hard in itself), migrating everything is a lot of work, this is definitely not happening overnight. |
My implementation is targeted at parsing numpydoc formatted docstrings + python introspection. Currently only functions are handled, but getting classes in is on my near-term to-do list. I am completely sold on the need for intermediate representations. I live in Queens, would it be helpful for me to meet with you all in person? |
For the intermediate representation: |
Using an older version of the library would mean that some modules don't work. This is unavoidable, but IMHO better than having VisTrails report that these modules "don't exist". But this means that users need to update that intermediate representation to use new features of the wrapped library. We can make this easy (provide a button inside VisTrails?). I didn't think too much about the problem you describe (supporting multiple versions); I'm not sure what we can do if new library versions break older versions of the wrapper. In any case, we can warn when the wrapper and wrapped versions don't match. |
I guess I didn't think about the usecase when someone gets a file from someone else with a new module. Then it seems better to say "you need a newer version of the lib for this module to work". Btw, the installation base for most python libraries is very diverse, as packages get regularly updated. |
The intermediate representations are relatively cheap (just a text file), it might be worth having versions of that for all of the versions of the library you want to support. The I don't fully have the full VT data model in my head yet, but could you also propagate this information into the VT version-tracking scheme to allow for smooth(ish) upgrades as well? |
I agree with @tacaswell here. Our goal is ensure reproducibility but still allow people to use whatever versions they when possible. I do think that multiple versions is the way to go. With the VTK wrapping, we have had cases where a vistrail that worked on one user's machine will not on another user's machine even though the VisTrails package version is the same because the dynamic wrapping picks up different classes. In addition, the patching should mostly work across multiple versions (it should not need to be updated for each version as these are the special cases that are not handled by the parser). We already have an upgrade scheme in VisTrails that allows developers to specify the upgrade paths between package versions. It could use some improvement, but it allows mostly automatic upgrades in a way that records which version of the package is being used. |
The problem is that if you "just" wrap the version that is available at the moment, it will error when someone uses an old version of the library and you load a pipeline that uses a newer version (ports might appear or disappear etc), which I think is the problem that @remram44 alluded to. Maybe the easiest way would be to store the version with which a pipeline was created and if there is any error in the loading, give a message about version mismatch. If you do this, however, I don't see the benefit of an intermediate representation. |
I think our messages crossed, but the intermediate representation is very useful for patching (when you need to work around an issue with automated generation). I would argue that hard-coding the exceptional cases into the dynamic generator makes updates more difficult. |
On 01/08/2015 11:34 AM, dakoop wrote:
|
On 01/08/2015 11:36 AM, dakoop wrote:
|
Anyhow, I don't want to argue too much if you are all of the opinion that intermediate formats are great. So what does dynamic with intermediate format mean? Does it mean the wrapper is created dynamically from the intermediate format or that the intermediate format is created dynamically? |
The former; "dynamic" means no Python source generation. Of course it's not "truly dynamic" if generating from the static intermediate files. |
Fair enough. So I could achieve that by splitting all of my functions into two, one that writes the variables that I extracted into a json, and one that reads the json and generates a class from it. And reading would happen on load, while the writing would happen offline. Seems like a small-enough change. If you do that
is not true any more, though. |
@amueller That is exactly what I did in terms of splitting the logic in two. The code that extracts the meta-data ended up being independent of VT which makes it more generally useful (for example if you wanted to go through a library and add type-hinting). It is also in the back of my head to try to move the numpydoc scraping code into numpydoc or a stand-alone project. |
After the discussion in today's meeting: |
Bah, sorry I missed today's meeting. @rexissimus please have a look at wrap_lib.py and scrape.py in https://github.com/Nikea/VTTools/tree/master/vttools which I think are already a nice wrapping system with an intermediate representation. |
Thanks, I will do that. |
The general wrapper based on VTK is here: https://github.com/VisTrails/VisTrails/tree/python-wrapper It provides a specification for wrapping python functions and classes that vistrails then knows how to execute. |
We have several libraries automatically-wrapped as VisTrails packages. Unfortunately, each one is using a different method, which can lead to problems (when using a method that is not optimal), code duplication (they all do basically the same thing), and lesser maintainability.
Packages can be created in different ways:
Packages can be created from different sources:
Having an intermediate representation allows for manual corrections, and if the wrapping is dynamic, is the only way we can be sure the wrapped package is always exactly the same no matter the installed library version.
What we have:
vtk
moduleThe text was updated successfully, but these errors were encountered: