Run ReproZip for part of script #358

appukuttan-shailesh · 2019-07-05T14:17:52Z

Would it possible to run ReproZip for part of a python script?

To elaborate a bit... we are developing a tool whereby users would specify multiple parameters, and based on this different models and protocols would be employed for the simulations (i.e. it involves user interactivity). Naturally, the packages and files that are invoked would also vary based on the above, and the 'environment' I wish to save should exclude these initial parts and other housekeeping tasks, and focus solely on the loading and execution of the model.

With this in mind, is it possible to invoke ReproZip from within a python script (as opposed to calling from the terminal CLI) so that I can track (and save) the files/packages that are required between, e.g. , line number x and y of my script (i.e. to be able to enable/disable ReproZip tracing inside a python script)?

I suppose ReproZip wasn't intended to run in this fashion, but I am curious to know if I could employ certain sub-modules or methods to achieve this. I also took a look at the Jupyter plugin to see if some bits might be useful.

I intend to dig deeper, but felt it was much better to ask here to get a better idea of the lay of the land. Thanks in advance.

(apologies if a similar question has been answered previously elsewhere)

The text was updated successfully, but these errors were encountered:

remram44 · 2019-07-05T14:28:39Z

Unfortunately ReproZip is meant to track a process, from its creation.

When you reach the section of interest to you, there is no way for you to find out which part of the already-loaded files are required for this new section. For example, numpy might already have been loaded because it's a requirement of your UI package, and you won't see it getting loaded when you load pandas at the start of your simulation code (because it's already been loaded). ReproZip can't automatically determine that you want numpy but not the UI package.

Would it be possible for you to split this script into two separate script? You could have a first script set everything up through a UI, then call the simulation script, passing the simulation parameters on the command line or via a file. Then you can easily interpose ReproZip to trace this second process.

appukuttan-shailesh · 2019-07-05T14:49:46Z

Thanks for the super quick response. Your suggestion about splitting into two scripts was my "plan B" and I intend to try it out soon. Will update you on this shortly.

p.s. Is there a provision for recording package versioning info (wherever possible)... like a pip freeze but limited to the specific packages that were loaded? This is with the intention of obtaining an environment snapshot that can be displayed to visitors (as a detailed requirements file).

remram44 · 2019-07-05T14:56:33Z

Yes! I'm hoping to add support for common interpreters (Python, R, Ruby) so that version information can be recorded. I completely agree that this information should be in the bundle.

appukuttan-shailesh · 2019-07-05T15:03:04Z

Out of curiosity.... do you have an idea by when this feature might be available? For now, I have been planning to include certain parts of "Sumatra" package to do this version tracking. If this is expected within ReproZip in the near future, I would be inclined to wait :-)

This might be useful for the Python implementation:
https://github.com/open-research/sumatra/blob/20821e8a62fff2869cbdbe74d39aa580c3a19d0a/sumatra/dependency_finder/python.py

remram44 · 2019-07-05T15:09:57Z

Unfortunately, ReproZip is not hooked into the experiment's Python interpreter, so I have to take a different approach. Probably simply reading the .dist-info folders.

remram44 · 2019-07-05T15:15:35Z

[edit: moved to #359]

appukuttan-shailesh · 2019-07-26T10:58:32Z

I have been attempting to splitting my script into two parts, one of which would be invoked through reprozip. The workflow seems to work in general, but I have the following concerns:

Is it possible to specify via CLI a target directory where the .reprozip and .reprozip-trace directories would be created? My situation is that the same (separated out) script could be required to be run several times in parallel, from within the same directory location. In such a case, I would require to be able to specify distinct target locations for each of the runs. --continue and --overwrite don't suffice for me here.
Is it possible to edit the configuration file via CLI? For instance, I don't wish to store information such as values of environment variables. It wouldn't be feasible for me to do so manually each time.
Do you have any tips for reducing the size of the .rpz file? Are there any group of packages that can be ignored (e.g. Miniconda)? (I realize that doing so will not guarantee reproducibility of the outputs).

remram44 · 2019-07-26T14:53:20Z

Is it possible to specify via CLI a target directory where the .reprozip and .reprozip-trace directories would be created?

For .reprozip-trace, you can select its location using -d: reprozip trace -d .reprozip-trace-3 ./mycommand. The .reprozip directory is always in $HOME, but that shouldn't cause more issues than a combined log file in .reprozip/log.

Is it possible to edit the configuration file via CLI?

That's currently not possible, sorry. We would need a lot of different commands to support every use case. You can however change this file from Python using PyYaml. Note that changing the environment might cause the experiment not to run though, since some variables are necessary for the reproduction (I'm thinking PATH, HOME, XDG_*, LANG).

Do you have any tips for reducing the size of the .rpz file?

Some things might not be strictly be needed like fonts (#360) but usually all that gets packed is required for the experiment to run. You can omit your data if it's repeated between all the experiments you trace; there is no automated way to put the data in, but running the upload command to put the data in before reproducing is straightforward.

appukuttan-shailesh · 2019-07-29T15:20:53Z

Thanks for the quick reply. I have implemented your suggestions and have got a working prototype ready. Will start testing this out and collecting feedback from others users. Will get back to you with any further developments.

remram44 · 2019-07-29T18:30:01Z

Glad I could help! I am very interested in your feedback and experience as you attempt this, so don't hesitate to share what you can.

Closing this ticket in favor of #359.

remram44 closed this as completed Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run ReproZip for part of script #358

Run ReproZip for part of script #358

appukuttan-shailesh commented Jul 5, 2019

remram44 commented Jul 5, 2019 •

edited

appukuttan-shailesh commented Jul 5, 2019 •

edited

remram44 commented Jul 5, 2019

appukuttan-shailesh commented Jul 5, 2019

remram44 commented Jul 5, 2019

remram44 commented Jul 5, 2019 •

edited

appukuttan-shailesh commented Jul 26, 2019 •

edited

remram44 commented Jul 26, 2019

appukuttan-shailesh commented Jul 29, 2019 •

edited

remram44 commented Jul 29, 2019 •

edited

Run ReproZip for part of script #358

Run ReproZip for part of script #358

Comments

appukuttan-shailesh commented Jul 5, 2019

remram44 commented Jul 5, 2019 • edited

appukuttan-shailesh commented Jul 5, 2019 • edited

remram44 commented Jul 5, 2019

appukuttan-shailesh commented Jul 5, 2019

remram44 commented Jul 5, 2019

remram44 commented Jul 5, 2019 • edited

appukuttan-shailesh commented Jul 26, 2019 • edited

remram44 commented Jul 26, 2019

appukuttan-shailesh commented Jul 29, 2019 • edited

remram44 commented Jul 29, 2019 • edited

remram44 commented Jul 5, 2019 •

edited

appukuttan-shailesh commented Jul 5, 2019 •

edited

remram44 commented Jul 5, 2019 •

edited

appukuttan-shailesh commented Jul 26, 2019 •

edited

appukuttan-shailesh commented Jul 29, 2019 •

edited

remram44 commented Jul 29, 2019 •

edited