Skip to content

Commit

Permalink
Merge branch '1.0.x' into 'master'
Browse files Browse the repository at this point in the history
  • Loading branch information
remram44 committed Jan 13, 2021
2 parents 979ff8b + cd7c940 commit 1c4f329
Show file tree
Hide file tree
Showing 19 changed files with 61 additions and 61 deletions.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -340,5 +340,5 @@


# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'https://docs.python.org/': None,
intersphinx_mapping = {'https://docs.python.org/3/': None,
'http://rpaths.remram.fr/en/latest/': None}
6 changes: 3 additions & 3 deletions docs/developerguide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ General Development Information

Development happens on `GitHub <https://github.com/ViDA-NYU/reprozip>`__; bug reports and feature requests are welcome. If you are interested in giving us a hand, please do not hesitate to submit a pull request there.

Continuous testing is provided by `Travis CI <https://travis-ci.org/ViDA-NYU/reprozip>`__. Note that ReproZip supports both Python 2 and 3. Test coverage is not very high because there are a lot of operations that are difficult to cover on Travis (for instance, Vagrant VMs cannot be used over there).
Continuous testing is provided by `GitHub Actions <https://github.com/VIDA-NYU/reprozip/actions>`__. Note that ReproZip still tries to support Python 2 as well as Python 3. Test coverage is not very high because there are a lot of operations that are difficult to cover on CI (for instance, Vagrant VMs cannot be used over there).

If you have any questions or need help with the development of an unpacker or plugin, please use our development mailing-list at `dev@reprozip.org <https://vgc.poly.edu/mailman/listinfo/reprozip-users>`__.

Expand All @@ -32,8 +32,8 @@ ReproZip is divided into two steps. The first is packing, which gives a generic

Currently, different unpackers are maintained: the defaults ones (``directory`` and ``chroot``), ``vagrant`` (distributed as `reprounzip-vagrant <https://pypi.org/project/reprounzip-vagrant/>`__) and ``docker`` (distributed as `reprounzip-docker <https://pypi.org/project/reprounzip-docker/>`__). However, the interface is such that new unpackers can be easily added. While taking a look at the "official" unpackers' source is probably a good idea, this page gives some useful information about how they work.

ReproZip Pack Format (``.rpz``)
'''''''''''''''''''''''''''''''
ReproZip Bundle Format (``.rpz``)
'''''''''''''''''''''''''''''''''

An ``.rpz`` file is a ``tar.gz`` archive that contains two directories: ``METADATA``, which contains meta-information from *reprozip*, and ``DATA``, which contains the actual files that were packed and that will be unpacked to the target directory for reproducing the experiment.

Expand Down
6 changes: 3 additions & 3 deletions docs/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ Glossary
distribution package
A software component installed by the Linux distribution's package manager. ReproZip tries to identify from which distribution package each file comes; this allows the reproducer to install the software from his distribution's package manager instead of extracting the files from the ``.rpz`` file.

package (or pack)
bundle (or pack)
A ``.rpz`` file generated by ``reprozip pack``, containing all the files and metadata required to reproduce the experiment on another machine. See :ref:`packing`.

run
A single command line traced by ``reprozip trace [--continue]``. Multiple commands can be traced successively before creating the pack; the reproducer will be able to run them separately using ``reprounzip <unpacker> run <directory> <run-id>``.
A single command line traced by ``reprozip trace [--continue]``. Multiple commands can be traced successively before creating the bundle; the reproducer will be able to run them separately using ``reprounzip <unpacker> run <directory> <run-id>``.

software package
The same as a distribution package.

unpacker
A plugin for the `reprounzip` component that reproduces an experiment from a ``.rpz`` package. The unpackers `chroot`, `directory`, and `installpkgs` are distributed with `reprounzip`; others come in separate packages (`reprounzip-docker` and `reprounzip-vagrant`). See :ref:`unpack-unpackers`.
A plugin for the `reprounzip` component that reproduces an experiment from a ``.rpz`` bundle. The unpackers `chroot`, `directory`, and `installpkgs` are distributed with `reprounzip`; others come in separate packages (`reprounzip-docker` and `reprounzip-vagrant`). See :ref:`unpack-unpackers`.
4 changes: 2 additions & 2 deletions docs/graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ To generate a *provenance graph* related to the experiment execution, the ``repr

$ reprounzip graph graphfile.dot mypackfile.rpz

where `graphfile.dot` corresponds to the graph, and `mypackfile.rpz` corresponds to the experiment package.
where `graphfile.dot` corresponds to the graph, and `mypackfile.rpz` corresponds to the experiment bundle.

Alternatively, you can generate the graph after running ``reprozip trace`` without creating a ``.rpz`` package::
Alternatively, you can generate the graph after running ``reprozip trace`` without creating a ``.rpz`` bundle::

$ reprounzip graph [-d tracedirectory] graphfile.dot

Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ReproZip's Documentation

Welcome to ReproZip's documentation!

`ReproZip <https://www.reprozip.org/>`__ is a tool aimed at simplifying the process of creating reproducible experiments from *command-line executions*. It tracks operating system calls and creates a package that contains all the binaries, files, and dependencies required to run a given command on the author's computational environment. A reviewer can then extract the experiment in his own environment to reproduce the results, even if the environment has a different operating system from the original one.
`ReproZip <https://www.reprozip.org/>`__ is a tool aimed at simplifying the process of creating reproducible experiments from *command-line executions*. It tracks operating system calls and creates a bundle that contains all the binaries, files, and dependencies required to run a given command on the author's computational environment. A reviewer can then extract the experiment in his own environment to reproduce the results, even if the environment has a different operating system from the original one.

Currently, ReproZip can only pack experiments that originally run on Linux.

Expand Down
6 changes: 3 additions & 3 deletions docs/jupyter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ The notebook will execute from top-to-bottom and *reprozip-jupyter* traces that

.. image:: figures/rzj-running.png

*reprozip-jupyter* will name the resulting ReproZip package (*.rpz*) as ``notebookname_datetime.rpz`` and save it to the same working directory the notebook is in:
*reprozip-jupyter* will name the resulting ReproZip bundle (*.rpz*) as ``notebookname_datetime.rpz`` and save it to the same working directory the notebook is in:

.. image:: figures/rzj-pkg.png

Note that the notebook file itself (``.ipynb``) is not included in the package, so you should share or archive both of those files. The reason is that a lot of services can render notebooks (GitHub, OSF...), and they wouldn't be able to if it was in the RPZ file.
Note that the notebook file itself (``.ipynb``) is not included in the bundle, so you should share or archive both of those files. The reason is that a lot of services can render notebooks (GitHub, OSF...), and they wouldn't be able to if it was in the RPZ file.

Unpacking
=========
Expand All @@ -61,7 +61,7 @@ On the command line, you would:

1. Set up the experiment using *reprounzip-docker*::

$ reprounzip docker setup <package.rpz> <directory>
$ reprounzip docker setup <bundle.rpz> <directory>

2. Rerun the notebook using *reprozip-jupyter*::

Expand Down
26 changes: 13 additions & 13 deletions docs/packing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ If you run the command multiple times, *reprozip* might ask you if you want to c

$ reprozip trace --continue <command-line>

Note that the final package will be able to reproduce any of the runs, and files shared by multiple runs are only stored once.
Note that the final bundle will be able to reproduce any of the runs, and files shared by multiple runs are only stored once.

By default, if the operating system is based on Debian or RPM packages (e.g.: Ubuntu, CentOS, Fedora, ...), *reprozip* will also try to automatically identify the distribution packages from which the files come, using the available package manager of the system. This is useful to provide more detailed information about the dependencies, as well as to further help when reproducing the experiment. However, note that the ``trace`` command can take some time doing that after the experiment finishes, depending on the number of file dependencies that the experiment has. To disable this feature, users may use the flag ``--dont-identify-packages``::

Expand All @@ -35,9 +35,9 @@ The database, together with a *configuration file* (see below), are placed in a
Editing the Configuration File
==============================

The configuration file, which can be found in ``.reprozip-trace/config.yml``, contains all the information necessary for creating the experiment package. This file is generated by the tracer and drives the packing step.
The configuration file, which can be found in ``.reprozip-trace/config.yml``, contains all the information necessary for creating the experiment bundle. This file is generated by the tracer and drives the packing step.

It is very likely that you won't need to modify this file, as the automatically-generated one should be sufficient to create a working package. However, in some cases, you may want to edit it prior to the creation of the package to add or remove files used by your experiment. This can be particularly useful, for instance, to remove big files that can be obtained elsewhere when reproducing the experiment, to keep the size of package small, and also to remove sensitive information that the experiment may use. The configuration file can also be used to edit the main command line, to add or remove environment variables, and to edit information regarding input/output files.
It is very likely that you won't need to modify this file, as the automatically-generated one should be sufficient to create a working bundle. However, in some cases, you may want to edit it prior to the creation of the package to add or remove files used by your experiment. This can be particularly useful, for instance, to remove big files that can be obtained elsewhere when reproducing the experiment, to keep the size of package small, and also to remove sensitive information that the experiment may use. The configuration file can also be used to edit the main command line, to add or remove environment variables, and to edit information regarding input/output files.

.. _packing-config-general:

Expand Down Expand Up @@ -130,24 +130,24 @@ Note that users can always reset the configuration file to its initial state by

.. warning::

When editing a configuration file, make sure your changes are as restrictive as possible, modifying only the necessary information. Removing important information and changing the structure of the file may cause issues while creating the package or unpacking the experiment.
When editing a configuration file, make sure your changes are as restrictive as possible, modifying only the necessary information. Removing important information and changing the structure of the file may cause issues while creating the bundle or unpacking the experiment.

.. _packing-pack:

Creating a Package
==================
Creating a Bundle
=================

After tracing all the runs from the experiment and optionally editing the configuration file, the experiment package can be created by using the following command::
After tracing all the runs from the experiment and optionally editing the configuration file, the experiment bundle can be created by using the following command::

$ reprozip pack <package-name>
$ reprozip pack <bundle>

where `<package-name>` is the name given to the package. This command generates a ``.rpz`` file in the current directory, which can then be sent to others so that the experiment can be reproduced. For more information regarding the unpacking step, please see :ref:`unpacking`.
where `<bundle>` is the name given to the package. This command generates a ``.rpz`` file in the current directory, which can then be sent to others so that the experiment can be reproduced. For more information regarding the unpacking step, please see :ref:`unpacking`.

Note that, by using ``reprozip pack``, files will be copied from your environment to the package; as such, you should not change any file that the experiment used before packing it, otherwise the package will contain different files from the ones the experiment used when it was originally traced.

.. warning::

Before sending your package to others, it is advisable to test it and ensure that the reproduction of the experiment works.
Before sending your bundle to others, it is advisable to test it and ensure that the reproduction of the experiment works.

.. _packing-further:

Expand All @@ -157,11 +157,11 @@ Further Considerations
Packing Multiple Command Lines
++++++++++++++++++++++++++++++

As mentioned before, ReproZip allows multiple runs (i.e., command lines) to be traced and included in the same package. Alternatively, users can create a simple **script** that runs all the command lines, and pass *that* to ``reprozip trace``. However, in this case, there will be no flexibility in choosing a single run to be reproduced, since the entire script will be re-executed.
As mentioned before, ReproZip allows multiple runs (i.e., command lines) to be traced and included in the same bundle. Alternatively, users can create a simple **script** that runs all the command lines, and pass *that* to ``reprozip trace``. However, in this case, there will be no flexibility in choosing a single run to be reproduced, since the entire script will be re-executed.

Note that this flexibility has the caveat that users may reproduce the runs in a different order than the one originally used while tracing. If the order is important for the reproduction (e.g.: each run represents a step in a dataflow), please make sure to inform the correct reproduction order to whoever wants to replicate the experiment. This can also be obtained by running ``reprounzip graph``; please refer to :ref:`provenance-graph` for more information.

ReproZip can also combine multiple traces into a single one, in order to create a single package, using the ``reprozip combine`` command. The runs of each subsequent trace are simply appended in order.
ReproZip can also combine multiple traces into a single one, in order to create a single bundle, using the ``reprozip combine`` command. The runs of each subsequent trace are simply appended in order.

Packing GUI and Interactive Tools
+++++++++++++++++++++++++++++++++
Expand Down Expand Up @@ -207,7 +207,7 @@ Note the use of ``trap`` to avoid exiting the entire script when pressing ``Ctrl
Excluding Sensitive and Third-Party Information
+++++++++++++++++++++++++++++++++++++++++++++++

ReproZip automatically tries to identify log and temporary files, removing them from the package, but the configuration file should be edited to remove any sensitive information that the experiment uses, or any third-party file/software that should not be distributed. Note that the ReproZip team is **not responsible** for personal and non-authorized files that may get distributed in a package; users should double-check the configuration file and their package before sending it to others.
ReproZip automatically tries to identify log and temporary files, removing them from the bundle, but the configuration file should be edited to remove any sensitive information that the experiment uses, or any third-party file/software that should not be distributed. Note that the ReproZip team is **not responsible** for personal and non-authorized files that may get distributed in a package; users should double-check the configuration file and their package before sending it to others.

Identifying Output Files
++++++++++++++++++++++++
Expand Down
2 changes: 1 addition & 1 deletion docs/reprozip.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ The truth is computational reproducibility can be very painful to achieve for a

For reviewers, even with a compendium in their hands, it may be hard to reproduce the results. There may be no instructions about how to execute the code and explore it further; the experiment may not run on his operating system; there may be missing libraries; library versions may be different; and several issues may arise while trying to install all the required dependencies, a problem colloquially known as `dependency hell <https://en.wikipedia.org/wiki/Dependency_hell>`__.

ReproZip helps alleviate these problems by allowing the user to easily capture all the necessary components in a single, distributable package. Also, the tool makes it easier to reproduce an experiment by providing different unpacking methods and interfaces that avoids the need to install all the required dependencies and that makes it possible to run the experiment under different inputs.
ReproZip helps alleviate these problems by allowing the user to easily capture all the necessary components in a single, distributable bundle. Also, the tool makes it easier to reproduce an experiment by providing different unpacking methods and interfaces that avoids the need to install all the required dependencies and that makes it possible to run the experiment under different inputs.
2 changes: 1 addition & 1 deletion docs/traceschema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This table contains information about all the processes. A process is identified

Note that processes are different from programs, and there is no one-to-one relationship with executions. A process is created by `clone(2) <https://linux.die.net/man/2/clone>`__ or `fork(2) <https://linux.die.net/man/2/fork>`__ and not necessarily followed by `execve(2) <https://linux.die.net/man/2/execve>`__. By contrast, a program can change its image by calling execve(2) without creating new processes (i.e., without changing *pid*).

Each entry in the ``processes`` table has the id of its parent, i.e. the process that created it by calling clone(2) or fork(2), except the original process that *reprozip* created, for which parent is NULL. There is thus exactly one process with a NULL parent per run stored in the pack.
Each entry in the ``processes`` table has the id of its parent, i.e. the process that created it by calling clone(2) or fork(2), except the original process that *reprozip* created, for which parent is NULL. There is thus exactly one process with a NULL parent per run stored in the bundle.

::

Expand Down
4 changes: 2 additions & 2 deletions docs/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,8 @@ Please feel free to contact us at users@reprozip.org if you encounter issues whi
2. If, while packing, the user chose not to include some packages, `reprounzip` will try to install the ones from the package manager, which may not be compatible.
3. If you are using ``reprounzip vagrant`` or ``reprounzip docker``, ReproZip may be failing to detect the closest base system for unpacking the experiment.
:Solution:
1. Use the files inside the experiment package to ensure compatibility.
2. Contact the author of the ReproZip package to ask for a new package with all software packages included.
1. Use the files inside the experiment bundle to ensure compatibility.
2. Contact the author of the ReproZip bundle to ask for a new package with all software packages included.
3. Try a different base system that you think it is closer to the original one by using the option ``--base-image`` when running these unpackers.

------------
Expand Down

0 comments on commit 1c4f329

Please sign in to comment.