Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a python-only installation #1861

Merged
merged 27 commits into from
Jan 19, 2024

Conversation

scarlehoff
Copy link
Member

@scarlehoff scarlehoff commented Nov 24, 2023

This pull requests implements a python-only installation defined in the pyproject.toml file.

While it might look like the PR is changing a lot of files, this is mostly a reordering, so let me describe all changes here:

Build system

I'm using poetry as the build system as we need some more advance features than what setuptools offers. For the user however nothing changes, you can use poetry for development but you can also utilize pip install . or pip install -e . just the same.

cmake

This PR doesn't remove cmake, but it has been modified so that it doesn't install the python stuff.

conda

conda will not be removed since it is needed for lhapdf and pandoc.
We also needed it for now to get some actual package for reportengine which at the moment is broken in pip (or the repo).

How to use

If you want to already start using this installation method, you can just go to the root of the repository and do:

pip install .

If you want to install all extras you can also do:

pip install .[docs,tests,qed]

This will install both the n3fit and validphys packages. The datafiles are no longer in ${CONDA_PREFIX}/share/NNPDF but rather they are part of the validphys package.

If you are under a conda installation with lhapdf and pandoc you are done. If you are not, you need to install lhapdf and pandoc by yourself. In another PR (#1799) I will also add a possibility to work without lhapdf, first as an experimental thing, later as something more useful.

Data and Theory locations

Up to now, the cmake installation would put everything (data, theories, etc) into a share/NNPDF folder in the conda environment. Since a pure python installation cannot go around copying data to arbitrary folders in the system, I've changed a bit this configuration:

Datafiles and theory.db

I believe that datafiles need to be synchronised with the status of the code. Therefore, they are now installed as data coming from the python package.
The same is true for theory.db.
This also brings another advantage, now you don't need to reinstall (or copy) when you are working on implementing new data, you can just do pip install -e . and also the data and theory will be editable, so development becomes much easier.
It also allows you to share the very heave share folder between NNPDF installations.
This will be all in validphys2/src/validphys/datafiles

Theories and results

Theories and results are downloaded from the server. They should be shared among every possible installation that you have in your system (no need to have 6 copies of theory_200).

WIP

nnprofile.yaml

WIP

Other considerations

If I've done everything correctly, nobody should see any difference upon using the code implemented in this PR. Nor developers or users. Everything will work the same as it always has. Even custom configurations like using a non-default NNPDF_PROFILE_PATH will work out of the box (as this takes precedence over the new location).

The only difference will be a practical one (you will notice that theories that you thought you already downloaded are being downloaded again).

Possible breaking changes

I cannot think of any situation in which this PR introduces a breaking change. Of course, if people is directly adding or removing stuff from the share/NNPDF folder, without using a custom nnprofile, now they have to move it to the new location... but this has always been discouraged.

Testing method

Beyond the tests, I'm running all runcards in the examples folder of validphys. Mainly to make sure that we are not using any of the files in the C++ folders that I'm willingly ignoring.

This is the script that I'm using for testing in a clean environment should anyone want to do the same:

With cmake (by-the-bookdocs installation):

ENVIR=condapdf_clean
conda create -n ${ENVIR} python==3.10 -y
conda activate ${ENVIR}
conda install nnpdf --only-deps -y
conda install pytest hypothesis pytest-mpl -y
cmake -S . -B build_folder  -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX}
cmake --build build_folder
cmake --install build_folder
pytest --pyargs --mpl validphys n3fit

Note that as per the docs you might need some extra stuff if you are using mac os or if you are missing some compilers in your system. And if you go that route you might also need to follow other instructions there.

I'll remove the python installation from cmake (with the appropriate changes to the documentation and with a big warning from cmake) in another PR.

Copy link
Member

@alecandido alecandido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes that are not related to imports or data are due to Black, correct?

Copy link
Member

@alecandido alecandido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, run poetry lock and commit the lock file as well

@scarlehoff
Copy link
Member Author

I've rearranged things here and there (e.g., things like theory.db is under data, but theories are now in a different place, so I needed to change theoryid anyway so I modified a bit that class). There are not that many changes due to black

@scarlehoff
Copy link
Member Author

scarlehoff commented Nov 24, 2023

Please, run poetry lock and commit the lock file as well

No, there will be no lock files (at least for the time being) because the installation will still be a conda thing. The (default) installation is done with --no-deps.

Eventually I hope to be able to get rid of conda as well, for now only lhapdf and pandoc are blocking me but this is for the future.

@scarlehoff
Copy link
Member Author

This PR is ready for review. Once I tested everything works (might need more includes of data files) next batch of changes will be another PR pointing to this one to remove a lot of libnnpdf/nnpdfcpp that is not used by buildmaster or evolven3fit.

@RoyStegeman
Copy link
Member

Thanks I'll have a look as soon as I have some time. I like the first comment since all questions I was about to ask were answered there ;)

CMakeLists.txt Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@RoyStegeman
Copy link
Member

Why did you put the datafiles folder under validphys and not just in the nnpdf root dir?

@scarlehoff
Copy link
Member Author

Why did you put the datafiles folder under validphys and not just in the nnpdf root dir?

Because I'm installing them as part of the validphys package.

@RoyStegeman
Copy link
Member

That's what I assumed but you should still be able to put it outside vp no? https://docs.python.org/3.3/distutils/setupscript.html#installing-additional-files

Maybe that only makes the structure more confusing though

@scarlehoff
Copy link
Member Author

I'm not sure, in the poetry documentation (which is what I'm using to build/install) there's no mention about supporting moving things from one location to another.

@alecandido
Copy link
Member

Poetry intentionally avoids implementing some features (like pre/post-install hooks) in order to avoid some variability (and keep everything simpler).

https://python-poetry.org/docs/pyproject/#include-and-exclude

True that the feature you're claiming is quite declarative, so they could. But I guess the idea is that it would be surprising to install a Python package (from PyPI or anywhere else) and it then modifies other paths outside the environment.
E.g. you might think that an environment is isolated, but then you and up modifying thins user-wide (not system-wide, just because you will be prevented by missing sudo).

If that's required, then you should have vp copying/moving the files from its folder to somewhere else at runtime.
But honestly, I'd avoid this (if possible).

@alecandido
Copy link
Member

The other option is vp-get data, that I would like :P

@scarlehoff
Copy link
Member Author

We can add later a link to the folder in the root of the repository to make things easier. But right now it's in nnpdfcpp/data which is not necessarily more intuitive so I'll leave it like it is.

Copy link
Member

@RoyStegeman RoyStegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried following your isntructions but still got an error, will try to look at it later. For now, I needed to install some more in your test env (I actually don't know if all are needed, but obviously the cpp compiler and cmake are):

conda install pytest hypothesis pytest-mpl gxx_linux-64 pkg-config swig cmake sysroot_linux-64=2.17 -y

pyproject.toml Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@scarlehoff
Copy link
Member Author

I tried following your isntructions but still got an error, will try to look at it later. For now, I needed to install some more in your test env (I actually don't know if all are needed, but obviously the cpp compiler and cmake are):

conda install pytest hypothesis pytest-mpl gxx_linux-64 pkg-config swig cmake sysroot_linux-64=2.17 -y

Ah, the C++ you still need in by-the-docs installation, but you can install it also from the system. So if you usually need it when you use conda, you would need it here as well.

What error do you get?

Copy link
Member

@Radonirinaunimi Radonirinaunimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this! Finally was able to test this carefully and everything worked fine (tested it both on my local computers and our Amsterdam cluster).

@scarlehoff scarlehoff changed the base branch from master to develop_20240119 January 19, 2024 12:04
@scarlehoff scarlehoff merged commit 7033541 into develop_20240119 Jan 19, 2024
6 checks passed
@scarlehoff scarlehoff deleted the change_installation_method branch January 19, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
destroyingc++ devtools Build, automation and workflow waiting for next tag PR which are now completed and are waiting for a next tag
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants