Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4846 - Allow importing Python package with C-extension e.g. numpy in the labs CLI #4868

Merged
merged 11 commits into from
Jul 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
message(STATUS "Using CMake ${CMAKE_VERSION}")
cmake_minimum_required(VERSION 3.19.0)
cmake_minimum_required(VERSION 3.20.0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using cmake_path which is new in 3.20

https://cmake.org/cmake/help/latest/command/cmake_path.html

cmake_policy(SET CMP0048 NEW)

set(CMAKE_CXX_STANDARD 20)
Expand Down
4 changes: 2 additions & 2 deletions python/SetupPython.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ set(Python_USE_STATIC_LIBS OFF)
# find_package(Python) has the problem that on github actions in particular it'll pick up the most recent python (eg 3.9) from the tool cache
# even if you have used the setup-python action and set it to 3.8
if (PYTHON_VERSION)
find_package(Python ${PYTHON_VERSION} EXACT COMPONENTS Interpreter Development REQUIRED)
find_package(Python ${PYTHON_VERSION} EXACT REQUIRED COMPONENTS Interpreter Development OPTIONAL_COMPONENTS NumPy)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using NumPy to test loading a native gem because the FindPython (cmake) has a NumPy component so it's easy to test whether the user actually has it installed.

else()
find_package(Python COMPONENTS Interpreter Development REQUIRED)
find_package(Python REQUIRED COMPONENTS Interpreter Development OPTIONAL_COMPONENTS NumPy)
endif()

execute_process(COMMAND ${Python_EXECUTABLE} -m pytest --version
Expand Down
53 changes: 41 additions & 12 deletions python/engine/PythonEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,12 @@ void addToPythonPath(const openstudio::path& includePath) {
if (!includePath.empty()) {
PyObject* sys = PyImport_ImportModule("sys");
PyObject* sysPath = PyObject_GetAttrString(sys, "path");
Py_DECREF(sys); // PyImport_ImportModule returns a new reference, decrement it
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_DECREF some stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many instances there are where we miss these decrements. Or we have one that should not be. I bet there are a few remaining.


// fmt::print("Prepending '{}' to sys.path\n", includePath);
PyObject* unicodeIncludePath = PyUnicode_FromString(includePath.string().c_str());
PyList_Insert(sysPath, 0, unicodeIncludePath);
Py_DECREF(sysPath); // PyObject_GetAttrString returns a new reference, decrement it
}
}

Expand All @@ -41,30 +44,56 @@ void PythonEngine::pyimport(const std::string& importName, const std::string& in
PyImport_ImportModule(importName.c_str());
}

void PythonEngine::setupPythonPath(const std::vector<openstudio::path>& includeDirs, const openstudio::path& pythonHomeDir) {
for (const auto& includeDir : includeDirs) {
addToPythonPath(includeDir);
}
if (!pythonHomeDir.empty()) {
wchar_t* a = Py_DecodeLocale(pythonHomeDir.generic_string().c_str(), nullptr);
Py_SetPythonHome(a);
void PythonEngine::setupPythonPath(const std::vector<openstudio::path>& includeDirs) {

// Iterate in reverse order since addToPythonPath always inserts at pos 0
// --python_path path1 --python_path path2 => includeDirs = ["path1", "path2"]
// std::ranges::reverse_view needs modern compilers
for (auto it = includeDirs.rbegin(); it != includeDirs.rend(); it++) {
addToPythonPath(*it);
Comment on lines +47 to +53
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixup the order here, unrelated

}
}

PythonEngine::PythonEngine(int argc, char* argv[]) : ScriptEngine(argc, argv), program(Py_DecodeLocale(pythonProgramName, nullptr)) {
// TODO: modernize and use PyConfig (new in 3.8): https://docs.python.org/3/c-api/init_config.html
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for later


// this frozen flag tells Python that the package and library have been frozen for embedding, so it shouldn't warn about missing prefixes
Py_FrozenFlag = 1;

// Set the PYTHONPATH / PYTHONHOME to the E+ shipped standard library
// I think we need to set the python path before initializing the library
// Path to the E+ shipped standard library
auto pathToPythonPackages = getEnergyPlusDirectory() / "python_standard_lib";
wchar_t* a = Py_DecodeLocale(pathToPythonPackages.make_preferred().string().c_str(), nullptr);
Py_SetPath(a);
Py_SetPythonHome(a);

// The PYTHONPATH / PYTHONHOME should be set before initializing Python
// If this Py_SetPath is called before Py_Initialize, then Py_GetPath won't attempt to compute a default search path
// The default search path is affected by the Py_SetPythonHome
// * if the user passed --python_home, we use that as the Python Home, and do not use Py_SetPath. But later we add the E+ standard_lib anyways
// so it takes precedence (to limit incompatibility issues...)
// * If the user didn't pass it, we use Py_SetPath set to the E+ standard_lib

std::vector<std::string> args(argv, std::next(argv, static_cast<std::ptrdiff_t>(argc)));
bool pythonHomePassed = false;
auto it = std::find(args.cbegin(), args.cend(), "--python_home");
if (it != args.cend()) {
openstudio::path pythonHomeDir(*std::next(it));
wchar_t* h = Py_DecodeLocale(pythonHomeDir.make_preferred().string().c_str(), nullptr);
Py_SetPythonHome(h);
pythonHomePassed = true;
} else {
wchar_t* a = Py_DecodeLocale(pathToPythonPackages.make_preferred().string().c_str(), nullptr);
Py_SetPath(a);
}
Comment on lines +66 to +84
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deal with --python_home . Can't do it later, it has to be done before Py_Initialize... so grab it from the argc/argv...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If feels like a code smell that we are parsing argc / argv at this point instead of having everything parsed out by the CLI11 library. I think I am guilty of establishing this pattern in ScriptEngine and I think I did it because the ruby init functions (at least one of them) accept argc and argv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruby_sysinit does take argc, argv yes.

Plus, here this is an order thing, and specific to Python while the CLI only sees a ScriptEngineInstance, so adding an explicit PythonHome argument isn't great either.

If you see another cleaner way to do it, this can be addressed later. There are other things that probably need cleanup too (including simplifying the folder / Cmake structure probably).

Another thing: we need to process the argv data manually at the start of the CLI anyways since we do support (for legacy reasons) some usage like openstudio labs test.rb that is translated as being openstudio labs execute_ruby_script test.rb . cf https://github.com/NREL/OpenStudio/blob/2af6123b91304e12060af360d0afb24083336a4c/src/cli/main.cpp#L45C1-L89


Py_SetProgramName(program); // optional but recommended

Py_Initialize();

if (pythonHomePassed) {
addToPythonPath(pathToPythonPackages);
}
#if defined(__APPLE__) || defined(__linux___) || defined(__unix__)
addToPythonPath(pathToPythonPackages / "lib-dynload");
#endif
Comment on lines +90 to +95
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib-dynload is present in the E+ python_standard_lib, we should add it


PyObject* m = PyImport_AddModule("__main__");
if (m == nullptr) {
throw std::runtime_error("Unable to add module __main__ for python script execution");
Expand Down
2 changes: 1 addition & 1 deletion python/engine/PythonEngine.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class PythonEngine final : public ScriptEngine
ScriptObject eval(std::string_view sv) override;
void exec(std::string_view sv) override;

virtual void setupPythonPath(const std::vector<openstudio::path>& includeDirs, const openstudio::path& pythonHomeDir) override;
virtual void setupPythonPath(const std::vector<openstudio::path>& includeDirs) override;

protected:
void* getAs_impl(ScriptObject& obj, const std::type_info&) override;
Expand Down
55 changes: 55 additions & 0 deletions src/cli/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,61 @@ if(BUILD_TESTING)

# ============ EndForward a Path properly no matter the slashes employed ============

# ============ Native Ruby Gems / Python Modules - C extensions =============
if (Python_NumPy_FOUND)
Comment on lines +340 to +341
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only enable the tests if numpy found


if(NOT EXISTS "${Python_ROOT_DIR}")
# Python_STDLIB: we expect it to be
# Unix: ~/.pyenv/versions/3.8.12/lib/python3.8 or
# or on CI: /usr/lib/python3.8/ ... with numpy if install via pip3 and not apt install python3-numpy in `/usr/local/lib/python3.8/dist-packages/`
# Windows C:\Python38\Lib
cmake_path(GET Python_STDLIB PARENT_PATH Python_ROOT_DIR)
if(UNIX)
cmake_path(GET Python_ROOT_DIR PARENT_PATH Python_ROOT_DIR)
endif()
endif()

if(UNIX)
if(EXISTS "${Python_SITELIB}")
set(PYTHON_PATH "${Python_SITELIB}" "${Python_STDLIB}/lib-dynload")
else()
set(PYTHON_PATH "${Python_STDLIB}/lib-dynload")
endif()

if(NOT APPLE)
set(EXTRA_LOCAL_DIST "/usr/local/lib/python3.8/dist-packages")
if (EXISTS "${EXTRA_LOCAL_DIST}")
list(APPEND PYTHON_PATH "${EXTRA_LOCAL_DIST}")
endif()
endif()
else()
set(PYTHON_PATH "$<SHELL_PATH:${Python_SITELIB}>")
endif()

message(DEBUG "PYTHON_PATH=${PYTHON_PATH}")

add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.explicit_sys_path_insert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first this series of tests kind of seemed off to me, because they are testing the environment of the developer/build environment, but when I think more carefully, I guess that is the point. The dev environment is a proxy for whatever environment a user may have, and since you already have the scaffolding in place to find a random numpy installation it turns out to be a convenient way to formumulate a test. Seems nice to me.

The only thing this leaves on the table are custom extensions that we choose to ship by default, but I believe that will be addressed in a different issue.

COMMAND $<TARGET_FILE:openstudio> labs execute_python_script execute_python_script_with_numpy.py ${Python_STDLIB}
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)

add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.python_path
COMMAND $<TARGET_FILE:openstudio> labs
"$<$<BOOL:${PYTHON_PATH}>:--python_path;$<JOIN:${PYTHON_PATH},;--python_path;>>"
execute_python_script execute_python_script_with_numpy.py
COMMAND_EXPAND_LISTS
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)

add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.python_home
COMMAND $<TARGET_FILE:openstudio> labs
--python_home "$<SHELL_PATH:${Python_ROOT_DIR}>"
execute_python_script execute_python_script_with_numpy.py
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)
else()
message(AUTHOR_WARNING "Cannot run the python numpy test, as numpy isn't installed on your system python")
endif()

file(GLOB RUBY_TEST_SRC
# find all CLI test
Expand Down
12 changes: 10 additions & 2 deletions src/cli/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -150,28 +150,33 @@ int main(int argc, char* argv[]) {
experimentalApp
->add_option("-I,--include", includeDirs, "Add additional directory to add to front of Ruby $LOAD_PATH (may be used more than once)")
->option_text("DIR")
->check(CLI::ExistingDirectory)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some CLI11 validators to ensure the various args are valid directories to begin with and catch mistakes

->group(rubySpecificOptionsGroupName);

std::vector<openstudio::path> gemPathDirs;
experimentalApp
->add_option("--gem_path", gemPathDirs,
"Add additional directory to add to front of GEM_PATH environment variable (may be used more than once)")
->option_text("DIR")
->check(CLI::ExistingDirectory)
->group(rubySpecificOptionsGroupName);

openstudio::path gemHomeDir;
experimentalApp->add_option("--gem_home", gemHomeDir, "Set GEM_HOME environment variable")
->option_text("DIR")
->check(CLI::ExistingDirectory)
->group(rubySpecificOptionsGroupName);

openstudio::path bundleGemFilePath;
experimentalApp->add_option("--bundle", bundleGemFilePath, "Use bundler for GEMFILE")
->option_text("GEMFILE")
->check(CLI::ExistingFile)
->group(rubySpecificOptionsGroupName);

openstudio::path bundleGemDirPath;
experimentalApp->add_option("--bundle_path", bundleGemDirPath, "Use bundler installed gems in BUNDLE_PATH")
->option_text("BUNDLE_PATH")
->check(CLI::ExistingDirectory)
->group(rubySpecificOptionsGroupName);

// std::vector<std::string>
Expand Down Expand Up @@ -204,16 +209,19 @@ int main(int argc, char* argv[]) {
->add_option("--python_path", pythonPathDirs,
"Add additional directory to add to front of PYTHONPATH environment variable (may be used more than once)")
->option_text("DIR")
->check(CLI::ExistingDirectory)
->group(pythonSpecificOptionsGroupName);

openstudio::path pythonHomeDir;
experimentalApp->add_option("--python_home", pythonHomeDir, "Set PYTHONHOME environment variable")
->option_text("DIR")
->check(CLI::ExistingDirectory)
->group(pythonSpecificOptionsGroupName);

// This is a callback that's stored on the ScriptEngineInstance, triggered only the first time
std::function<void()> runSetupPythonPath = [&pythonEngine, &pythonPathDirs, &pythonHomeDir]() {
pythonEngine->setupPythonPath(pythonPathDirs, pythonHomeDir);
std::function<void()> runSetupPythonPath = [&pythonEngine, &pythonPathDirs]() {
// pythonHomeDir is retrieved from (argc, argv) actually, as Py_SetPythonHome has to be called before Py_Initialize
pythonEngine->setupPythonPath(pythonPathDirs);
pythonEngine->registerType<openstudio::measure::ModelMeasure*>("openstudio::measure::ModelMeasure *");
pythonEngine->registerType<openstudio::measure::EnergyPlusMeasure*>("openstudio::measure::EnergyPlusMeasure *");
pythonEngine->registerType<openstudio::measure::ReportingMeasure*>("openstudio::measure::ReportingMeasure *");
Expand Down
49 changes: 49 additions & 0 deletions src/cli/test/execute_python_script_with_numpy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"""Test for importing a python package with a C extension, numpy here."""

import sys
from pathlib import Path


def test_import_numpy():
"""Common test function."""
print(f"{sys.path=}")
import numpy as np

a = np.array([0, 1, 2, 3])
assert a.sum() == 6


def test_with_sys_path_explicit(std_lib: Path):
"""Explicitly insert site-packages and lib-dynload in sys.path."""
site_pack = std_lib / "site-packages"
if site_pack.exists():
sys.path.insert(0, str(site_pack))
# Only for unix
dynload = std_lib / "lib-dynload"
if dynload.exists():
sys.path.insert(0, str(dynload))
# Only for debian
dist_pack = std_lib.parent.parent / 'local/lib/python3.8/dist-packages'
if dist_pack.exists():
sys.path.insert(0, str(dist_pack))

test_import_numpy()


def test_with_pythonpath_pythonhome():
"""Rely on passing PYTHONPATH/PYTHONHOME."""
test_import_numpy()


if __name__ == "__main__":
if len(sys.argv) == 2:
std_lib = Path(sys.argv[1])
print("Passed {std_lib=}")

if std_lib.is_dir():
test_with_sys_path_explicit(std_lib)
else:
raise ValueError(f"Python std_lib doesn't exist at {std_lib}")

else:
test_with_pythonpath_pythonhome()
2 changes: 1 addition & 1 deletion src/scriptengine/ScriptEngine.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ class ScriptEngine
const openstudio::path& gemHomeDir, const openstudio::path& bundleGemFilePath,
const openstudio::path& bundleGemDirPath, const std::string& bundleWithoutGroups){};

virtual void setupPythonPath(const std::vector<openstudio::path>& includeDirs, const openstudio::path& pythonHomeDir){};
virtual void setupPythonPath(const std::vector<openstudio::path>& includeDirs){};

template <typename T>
T getAs(ScriptObject& obj) {
Expand Down
13 changes: 11 additions & 2 deletions src/utilities/core/DynamicLibraryPOSIX.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,17 @@ struct DynamicLibrary
return symbol;
}

explicit DynamicLibrary(openstudio::path location)
: m_location{std::move(location)}, m_handle{dlopen(m_location.c_str(), RTLD_LAZY | RTLD_LOCAL), m_handle_deleter} {
explicit DynamicLibrary(openstudio::path location) : m_location{std::move(location)} {
int flags = RTLD_LAZY | RTLD_LOCAL; // NOLINT(misc-const-correctness, hicpp-signed-bitwise)

// This seems to work on Mac without RTLD_GLOBAL...
#ifdef __linux__
if (m_location.filename().generic_string().find("python") != std::string::npos) {
// https://stackoverflow.com/questions/67891197/ctypes-cpython-39-x86-64-linux-gnu-so-undefined-symbol-pyfloat-type-in-embedd
flags = RTLD_LAZY | RTLD_GLOBAL;
}
#endif
m_handle = {dlopen(m_location.c_str(), flags), m_handle_deleter};
Comment on lines +34 to +44
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit that bothers me a bit @kbenne

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we actually need these flags ever? I'm not sure about that. I think there is a decent chance they got added unnessarily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which flag(s)? RTLD_LAZY?

if (!m_handle) {
throw std::runtime_error(fmt::format("Unable to load library '{}', reason: '{}'", m_location.string(), dlerror()));
}
Expand Down