# Patching Python builtins (third-party library compatibility)

Not every Python library is implemented to accept pathlib-compatible objects like those implemented by cloudpathlib. Many libraries will only accept strings as filepaths. These libraries then may internally use `open`, functions from `os` and `os.path`, or other core library modules like `glob` to navigate paths and manipulate them.

This means that out-of-the-box you can't just pass a `CloudPath` object to any method of function and have it work. For those implemented with `pathlib`, this will work. For anything else the code will throw an exception at some point.

The long-term solution is to ask developers to implement their library to support either (1) pathlib-compatible objects for files and directories, or (2) file-like objects passed directly (e.g., so you could call `CloudPath.open` in your code and pass the the file-like object to the library).

The short-term workaround that will be compatible with some libraries is to patch the builtins to make `open`, `os`, `os.path`, and `glob` work with `CloudPath` objects. Because this overrides default Python functionality, this is not on by default. When patched, these functions will use the `CloudPath` version if they are passed a `CloudPath` and will fallback to their normal implementations otherwise.

These methods can be enabled by setting the following environment variables:
 - `CLOUDPATHLIB_PACTH_ALL=1` - patch all the builtins we implement: `open`, `os` functions, and `glob`
 - `CLOUDPATHLIB_PACTH_OPEN=1` - patch the builtin `open` method
 - `CLOUDPATHLIB_PACTH_OS_FUNCTIONS=1` - patch the `os` functions
 - `CLOUDPATHLIB_PACTH_GLOB=1` - patch the `glob` module

You can set environment variables in many ways, but it is common to either pass it at the command line with something like `CLOUDPATHLIB_PACTH_ALL=1 python my_script.py` or to set it in your Python script with `os.environ['CLOUDPATHLIB_PACTH_ALL'] = 1`. Note, these _must_ be set before any `cloudpathlib` methods are imported.

Alternatively, you can call methods to patch the functions.

```python
from cloudpathlib import patch_open, patch_os_functions, patch_glob

# patch builtins
patch_open()
patch_os_functions()
patch_glob()
```

These patch methods are all context managers, so if you want to control where the patch is active, you can use them in a `with` statement. For example:

In [1]:
%load_ext autoreload
%autoreload 2

In [1]:
from glob import glob

from cloudpathlib import patch_glob, CloudPath

try:
    glob(CloudPath("s3://cloudpathlib-test-bucket/manual-tests/**/*dir*/**"))
except Exception as e:
    print("Unpatched version fails:")
    print(e)


with patch_glob():
    print("Patched succeeds:")
    print(glob(CloudPath("s3://cloudpathlib-test-bucket/manual-tests/**/*dir*/**/*")))

    # or equivalently
    print("`glob` module now is equivalent to `CloudPath.glob`")
    print(glob("**/*dir*/**/*", root_dir=CloudPath("s3://cloudpathlib-test-bucket/manual-tests/")))

Unpatched version fails:
'S3Path' object is not subscriptable
Patched succeeds:
[S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirB/fileB'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/fileC'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/nested-dir/test.file'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirB/fileB'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/fileC'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD/fileD')]
`glob` module now is equivalent to `CloudPath.glob`
[S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirB/fileB'), S3Path('

We can see a similar result for patching the functions in the `os` module.

In [13]:
import os

from cloudpathlib import patch_os_functions, CloudPath

print(os.path.isdir(CloudPath("s3://cloudpathlib-test-bucket/manual-tests/")))


# try:
#     os.path.isdir("s3://cloudpathlib-test-bucket/manual-tests/")
# except Exception as e:
#     print("Unpatched version fails:")
#     print(e)


with patch_os_functions():
    result = os.path.isdir(CloudPath("s3://cloudpathlib-test-bucket/manual-tests/"))
    print("Patched version of `os.path.isdir` returns: ", result)

False
Patched version of `os.path.isdir` returns:  None


## Patching `open`

Sometimes code uses the Python built-in `open` to open files and operate on them. Because of the way that is implemented, it only accepts a string to operate on. Unfortunately, that breaks usage with cloudpathlib.

Instead, we can patch the built-in `open` to handle all the normal circumstances, and—if the argument is a `CloudPath`—use cloudpathlib to do the opening.

### Patching `open` in Jupyter notebooks

Jupyter notebooks require one extra step becaue they have their own version of `open` that is injected into the global namespace of the notebook. This means that you must _additionally_ replace that version of open with the patched version if you want to use `open` in a notebook. This can be done with the `patch_open` method by adding the following to the top of the notebook.

```python
from cloudpathlib import patch_open

# replace jupyter's `open` with one that works with CloudPath
open = patch_open()
```

Here's an example that doesn't work right now (for example, if you depend on a thrid-party library that calls `open`).

In [16]:
from cloudpathlib import CloudPath, patch_open


# example of a function within a third-party library
def library_function(filepath: str):
    with open(filepath, "r") as f:
        print(f.read())


# create file to read
cp = CloudPath("s3://cloudpathlib-test-bucket/patching_builtins/file.txt")

# fails with a TypeError if passed a CloudPath
try:
    library_function(cp)
except Exception as e:
    print(e)

[Errno 2] No such file or directory: '/var/folders/sz/c8j64tx91mj0jb0vd1s4wj700000gn/T/tmpvnzs5qnd/cloudpathlib-test-bucket/patching_builtins/file.txt'


In [4]:
from cloudpathlib import CloudPath, patch_open

# jupyter patch
# open = patch_open()

with patch_open():
    # example of a function within a third-party library
    def library_function(filepath: str):
        with open(filepath, "r") as f:
            print(f.read())


    # create file to read
    cp = CloudPath("s3://cloudpathlib-test-bucket/patching_builtins/file.txt")

    library_function(cp)

TypeError: ContextDecorator.__call__() takes 2 positional arguments but 3 were given

In [3]:
%debug

> [0;32m/var/folders/sz/c8j64tx91mj0jb0vd1s4wj700000gn/T/ipykernel_34335/3906426398.py[0m(9)[0;36mlibrary_function[0;34m()[0m
[0;32m      7 [0;31m[0;31m# example of a function within a third-party library[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      8 [0;31m[0;32mdef[0m [0mlibrary_function[0m[0;34m([0m[0mfilepath[0m[0;34m:[0m [0mstr[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 9 [0;31m    [0;32mwith[0m [0mopen[0m[0;34m([0m[0mfilepath[0m[0;34m,[0m [0;34m"r"[0m[0;34m)[0m [0;32mas[0m [0mf[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m     10 [0;31m        [0mprint[0m[0;34m([0m[0mf[0m[0;34m.[0m[0mread[0m[0;34m([0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m     11 [0;31m[0;34m[0m[0m
[0m
<contextlib._GeneratorContextManager object at 0x1113b3ce0>
*** TypeError: ContextDecorator.__call__() missing 1 required positional argument: 'func'


# `open`

#os

True