New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Python to Codespaces & Test #19
Conversation
This Stack Overflow suggests that Python needs some coercion. I cant believe their underlying C does not do this already. I'll have to think a bit because the goal of this project is to ensure that code using the environment does not have to change. https://stackoverflow.com/questions/235435/environment-variables-in-python-on-linux |
Reached out here because seems Python is not using |
Maybe it uses |
For anyone else looking at this, its due to the fact that CPython uses Since this is being shipped as a layer, I think you have a copule of routes:
Footnotes:
|
I have it on good authority (a private project I am working on, which is commercial so I cannot share), that hooking into |
I took another look at the python bevahior and I think I've nailed down the discrepancy. Any of the This means while python does link to the getenv() function, the only usage of it is getting configuration variables at the interpreter level. Anything arbitrary which was set by the user is coming in via LTrace getenv: Python vs Rubyvagrant@ubuntu-jammy:~$ ltrace -e getenv ruby -e "puts ENV['EXAMPLE']"
libruby-3.0.so.3.0->getenv("RUBY_THREAD_VM_STACK_SIZE") = nil
libruby-3.0.so.3.0->getenv("RUBY_THREAD_MACHINE_STACK_SIZE") = nil
libruby-3.0.so.3.0->getenv("RUBY_FIBER_VM_STACK_SIZE") = nil
libruby-3.0.so.3.0->getenv("RUBY_FIBER_MACHINE_STACK_SIZE") = nil
libruby-3.0.so.3.0->getenv("RUBY_SHARED_FIBER_POOL_FREE_STAC"...) = nil
libruby-3.0.so.3.0->getenv("RUBYOPT") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_FREE_SLOTS") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_INIT_SLOTS") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_GROWTH_FACTOR") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_GROWTH_MAX_SLOTS") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_FREE_SLOTS_MIN_RATI"...) = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_FREE_SLOTS_MAX_RATI"...) = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_FREE_SLOTS_GOAL_RAT"...) = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_HEAP_OLDOBJECT_LIMIT_FAC"...) = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_MALLOC_LIMIT") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_MALLOC_LIMIT_MAX") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_MALLOC_LIMIT_GROWTH_FACT"...) = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_OLDMALLOC_LIMIT") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_OLDMALLOC_LIMIT_MAX") = nil
libruby-3.0.so.3.0->getenv("RUBY_GC_OLDMALLOC_LIMIT_GROWTH_F"...) = nil
libruby-3.0.so.3.0->getenv("RUBYLIB") = nil
libruby-3.0.so.3.0->getenv("DEBIAN_RUBY_STANDALONE") = nil
libruby-3.0.so.3.0->getenv("DEBIAN_DISABLE_RUBYGEMS_INTEGRAT"...) = nil
libruby-3.0.so.3.0->getenv("GEM_SKIP") = nil
libruby-3.0.so.3.0->getenv("GEM_REQUIREMENT_DID_YOU_MEAN") = nil
libruby-3.0.so.3.0->getenv("GEM_HOME") = nil
libruby-3.0.so.3.0->getenv("GEM_PATH") = nil
libruby-3.0.so.3.0->getenv("HOME") = "/home/vagrant"
libruby-3.0.so.3.0->getenv("XDG_DATA_HOME") = nil
libruby-3.0.so.3.0->getenv("GEM_VENDOR") = nil
libruby-3.0.so.3.0->getenv("GEM_VENDOR") = nil
libruby-3.0.so.3.0->getenv("GEM_SPEC_CACHE") = nil
libruby-3.0.so.3.0->getenv("EXAMPLE") = nil
+++ exited (status 0) +++
vagrant@ubuntu-jammy:~$ ltrace -e getenv python3 -c "import os; print(os.environ.get('EXAMPLE', ''))"
+++ exited (status 0) +++ |
I find that a |
Does wrapt require a package install be present or could it be vendor'ed into a solution that gets installed next to Crypteia's shared object? The goal is for any code in the container to not take on any burden to change outside adding Crypteia and LD_PRELOAD... or in Python's case... maybe a little more. So curious to see how this would work. Python is the lowest on my programming skill set chain. |
The |
Did some work with @mpeteuil yesterday around the wip-python branch and it is looking really really good. One thing that stumped us was why ctypes was not working as we expected. So I did two things. First, I added node and python to the base devcontainer. Seems the latest rust devcontainer lost node and was causing some errors on main where I wanted to figure this out. NOTE: This will cause a slight conflict with this branch but easy to resolve. Second, I did a little experiment in the ctypes branch. See linked commit and screenshot below. So here is a summary of what we co-learned. First, ctypes to me is python's FFI interface. So neat. It did not occur to me that we would be bypassing a core feature of Crypteia's interface which is nix process based. Every process is isolated and if that process needs x-crypteia env vars via getenv it will kick off an isolated SSM request, create a tmp file, load the secrets into memory, and finally deletes the tmp file. Levering a FFI interface to libc within a process does not stack the LD_PRELOAD and hence directly calls getenv and as such does not kick things into motion. Michael had this really neat idea to reach directly to libcrypteia and that works really well. You can see here how I keep it DRY by using the os environ for our LD_PRELOAD. import ctypes, os
crypteia = os.environ.get('LD_PRELOAD')
getenv = ctypes.cdll.LoadLibrary(crypteia).getenv
getenv.restype = ctypes.c_char_p
getenv(b'SECRET')) The screenshot and branch further illustrates how our secure tmp file works. We should not need to solve for this since the main python process which is using ctypes would ( I'm guessing in theory ) have the loaded secrets into memory via the python package loaded up front via PYTHONPATH. So seems we have a path ( no pun ) forward now. |
Directly calling into the |
So I got #24 done along with a followup commit (d2927df) to fix how the Docker in Docker tests for ad-hoc tests. There is only one, Amazon Linux. Python will use that technique for Python 2.7. You may not in the commit we are simply doing a setup (build) and high level libcrypteia tests. So the idea is that the DnD test for Python will set the TEST_LANG and invoke Python just as Amazon Linux focuses on Node. Reminder, the DnD test for Python 2.7 will be much simpler than the Amazon Linux since it does not need to build two containers (build & runtime) but just a single one with Python 2.7 in it. We now have language shields too. |
I may rebase this branch to main and get things in order by cherry-picking a commit from the |
Thanks @metaskills for pairing with me a couple times to clarify some things and unblock me when I got hung up on exactly how Crypteia worked. I still have a few open questions and todos. Todos
Questions
|
6afe606
to
7dcb93b
Compare
There are two approaches to including
|
I feel option 1 has been the general consensus since we did work in the loader. |
Updated based on some offline conversation. If this looks good I'll:
|
535d890
to
9634087
Compare
In order to leverage the Crypteia hook into getenv via LD_PRELOAD, we need to connect to the OS getenv system call with Python. Python does not inherently do this via any standard means of getting environment variables (`os.eviron['VARIABLE']`, `os.environ.get("VARIABLE")`, `os.getenv("VARIABLE")`, `os.getenvb(b"VARIABLE")`). In order to actually wire up to the system `getenv` call, we need to use the ctypes library (or write a C extension, or the like). Here we leverage ctypes to get a handle to getenv and then use wrapt to patch os.envion at interpreter boot time via usercustomize.py. The usercustomize.py file will be imported last in the python boot sequence, which you can verify yourself by starting python with -v or using PYTHONVERBOSE (i.e. python -v, PYTHONVERBOSE=1 python). This means that any use of os.environ after interpreter boot time will be making calls using the ctypes hook into the system `getenv` call. We're explicitly trying to support Python 2.7+ at this time, so the code needs to work in both 2.7 and 3+. For more information see: https://github.com/GrahamDumpleton/wrapt/blob/develop/blog/13-ordering-issues-when-monkey-patching-in-python.md https://docs.python.org/3/library/site.html https://github.com/GrahamDumpleton/wrapt/blob/develop/src/wrapt/wrappers.py * Don't check crypteia python for dependency conflicts The reason we needed to check for dependency conflicts was that if the host application installed a package that the crypteia Python package depended on, then we wanted to avoid those collisions. In the case where we vendor dependencies under the crypteia Python package's umbrella (essentially treating it like a module within the crypteia package) there shouldn't be conflicts because those would not be importable by users just running `import dep` style imports. The only dependency we have at the moment is wrapt, so we can import it from within crypteia by using `from crypteia.wrapt import x,y,z`. That allows us to use the wrapt functionality while not worrying about host applications colliding with it.
This validates that the crypteia binary can be built in a regular debian-based docker image. It also runs the tests to verify that Python 2.7 does indeed hook into system getenv when wired up correctly in a non-lambda context.
Gonna cut a 1.0 release soon... thank you SO MUCH everyone! |
Testing to make sure Python works as expected with libc. Seems it does not, this will fail on
test/libcrypteia/override-python.sh
for now and this PR will sort out why. cc @bekiya