A Python filesystem abstraction of Google Cloud Storage (GCS) implemented as a PyFilesystem2 extension.
With GCSFS, you can interact with Google Cloud Storage as if it was a regular filesystem.
Apart from the nicer interface, this will highly decouple your code from the underlying storage mechanism: Exchanging the storage backend with an
in-memory filesystem for testing or any other
filesystem like S3FS becomes as easy as replacing
For a full reference on all the PyFilesystem possibilities, take a look at the PyFilesystem Docs!
Install the latest GCSFS version by running:
$ pip install fs-gcsfs
Or in case you are using conda:
$ conda install -c conda-forge fs-gcsfs
Instantiating a filesystem on Google Cloud Storage (for a full reference visit the Documentation):
from fs_gcsfs import GCSFS gcsfs = GCSFS(bucket_name="mybucket")
Alternatively you can use a FS URL to open up a filesystem:
from fs import open_fs gcsfs = open_fs("gs://mybucket/root_path?strict=False")
You can use GCSFS like your local filesystem:
>>> from fs_gcsfs import GCSFS >>> gcsfs = GCSFS(bucket_name="mybucket") >>> gcsfs.tree() ├── foo │ ├── bar │ │ ├── file1.txt │ │ └── file2.csv │ └── baz │ └── file3.txt └── file4.json >>> gcsfs.listdir("foo") ["bar", "baz"] >>> gcsfs.isdir("foo/bar") True
Uploading a file is as easy as:
from fs_gcsfs import GCSFS gcsfs = GCSFS(bucket_name="mybucket") with open("local/path/image.jpg", "rb") as local_file: with gcsfs.open("path/on/bucket/image.jpg", "wb") as gcs_file: gcs_file.write(local_file.read())
You can even sync an entire bucket on your local filesystem by using PyFilesystem's utility methods:
from fs_gcsfs import GCSFS from fs.osfs import OSFS from fs.copy import copy_fs gcsfs = GCSFS(bucket_name="mybucket") local_fs = OSFS("local/path") copy_fs(gcsfs, local_fs)
For exploring all the possibilities of GCSFS and other filesystems implementing the PyFilesystem interface, we recommend visiting the official PyFilesystem Docs!
To develop on this project make sure you have pipenv installed and run the following from the root directory of the project:
$ pipenv install --dev --three
This will create a virtualenv with all packages and dev-packages installed.
Expose your bucket name as an environment variable
$TEST_BUCKET and run the tests via:
$ pipenv run pytest
Note that the tests mostly wait for I/O, therefore it makes sense to highly parallelize them with xdist.
Credits go to S3FS which was the main source of inspiration and shares a lot of code with GCSFS.