libChEBIpy: a Python API for accessing the ChEBI database
Details of the API are available: http://libchebi.github.io/libChEBI%20API.pdf
To customize the root of the download directory, set the following environment variable:
export LIBCHEBIPY_DOWNLOAD_DIR=/path/to/folder
if not set, the library will default to the folder libChEBI
in the user's home,
e.g., /home/<username>/libChEBI
.
The library has a set of parsers that include:
This is the default parser, and (as stated above) you can customize the download
directory either by exporting LIBCHEBIPY_DOWNLOAD_DIR
to the environment, or by
instantiating a ChebiClient with it as follows:
from libchebipy._chebi_entity import ChebiEntity
chebi_entity = ChebiEntity(15903, download_dir="/path/to/directory")
The above is equivalent to selected a filesystem parser (the default):
chebi_entity = ChebiEntity(15903, download_dir="/path/to/directory", parser="filesystem")
If you don't want to use a filesystem cache, or otherwise want to use a Google
Storage cache, then you can initialize your ChebiEntity to use a googlestorage
parser:
from libchebipy import ChebiEntity
chebi_entity = ChebiEntity("15903", parser="googlestorage")
You need a few extra Python modules installed for the Google Storage client and gs-chunked-io:
pip install gs-chunked-io
pip install google-cloud-storage
You are required to have your GOOGLE_APPLICATION_CREDENTIALS
(the path to the json
file with your credentials) exported to the environment, along with the name
of the Google Storage bucket. To take a more conservative approach, the bucket must
exist. You can create a bucket in the Google Cloud console
and then export your variables as follows:
export GOOGLE_APPLICATION_CREDENTIALS=$PWD/auth/credentials.json
export GOOGLE_STORAGE_BUCKET=libchebi-testing
If you want a storage prefix (akin to a "folder" in your bucket) you can export:
export STORAGE_PREFIX=libchebi-cache
We would then import ChebiEntity, which will find the parsers already in the namespace and not re-import (and thus override your custom function):
from libchebipy import ChebiEntity
Then initialize a ChebiEntity, but set the parser to be googlestorage.
entity = ChebiEntity('CHEBI:15365', parser="googlestorage")
If your environment variables aren't defined for the bucket or credentials, you'll get an error. Note that the Google Storage parser still requires write access to a temporary directory to read the files from.
It's possible to extract content directly into Google Storage, that would look like this:
# If the blob is a zip file, extract into storage
if filepath.endswith('.zip'):
with gscio.AsyncReader(blob) as fh:
zfile = zipfile.ZipFile(fh)
filepath = zfile.namelist()[0]
for contentfilename in myzip.namelist():
contentfile = zfile.read(contentfilename)
nested_blob = self.bucket.blob(self.bucket_name + "/" + contentfilename)
nested_blob.upload_from_string(contentfile)
elif filepath.endswith('.gz'):
unzipped_filepath = filename[:-len('.gz')]
filepath = unzipped_filepath
# Checks if exists, and timestamp
if not self._is_current(unzipped_filepath):
unzipped_blob = self.bucket.blob(unzipped_filepath)
with gscio.AsyncReader(blob) as fh:
gzip_reader = gzip.GzipFile(fileobj=fh)
unzipped_blob.upload_from_string(gzip_reader.read())
However, we would ultimately still need to write these files to a temporary location to allow for custom reading with different encodings, e.g., something like this:
with io.open(filename, "r", encoding="cp1252") as textfile:
next(textfile)
So if you are able to reproduce that but getting content directly from a Google Storage read, then the Google Storage client would not need to write anything to a temporary file. Please open a pull request if you can contribute this change!