Skip to content

libChEBI/libChEBIpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

libChEBIpy

libChEBIpy: a Python API for accessing the ChEBI database

Details of the API are available: http://libchebi.github.io/libChEBI%20API.pdf

Environment

To customize the root of the download directory, set the following environment variable:

export LIBCHEBIPY_DOWNLOAD_DIR=/path/to/folder

if not set, the library will default to the folder libChEBI in the user's home, e.g., /home/<username>/libChEBI.

Custom Storage

The library has a set of parsers that include:

Filesystem

This is the default parser, and (as stated above) you can customize the download directory either by exporting LIBCHEBIPY_DOWNLOAD_DIR to the environment, or by instantiating a ChebiClient with it as follows:

from libchebipy._chebi_entity import ChebiEntity
chebi_entity = ChebiEntity(15903, download_dir="/path/to/directory")

The above is equivalent to selected a filesystem parser (the default):

chebi_entity = ChebiEntity(15903, download_dir="/path/to/directory", parser="filesystem")

Google Storage

If you don't want to use a filesystem cache, or otherwise want to use a Google Storage cache, then you can initialize your ChebiEntity to use a googlestorage parser:

from libchebipy import ChebiEntity
chebi_entity = ChebiEntity("15903", parser="googlestorage")

You need a few extra Python modules installed for the Google Storage client and gs-chunked-io:

pip install gs-chunked-io
pip install google-cloud-storage

You are required to have your GOOGLE_APPLICATION_CREDENTIALS (the path to the json file with your credentials) exported to the environment, along with the name of the Google Storage bucket. To take a more conservative approach, the bucket must exist. You can create a bucket in the Google Cloud console and then export your variables as follows:

export GOOGLE_APPLICATION_CREDENTIALS=$PWD/auth/credentials.json
export GOOGLE_STORAGE_BUCKET=libchebi-testing

If you want a storage prefix (akin to a "folder" in your bucket) you can export:

export STORAGE_PREFIX=libchebi-cache

We would then import ChebiEntity, which will find the parsers already in the namespace and not re-import (and thus override your custom function):

from libchebipy import ChebiEntity

Then initialize a ChebiEntity, but set the parser to be googlestorage.

entity = ChebiEntity('CHEBI:15365', parser="googlestorage")

If your environment variables aren't defined for the bucket or credentials, you'll get an error. Note that the Google Storage parser still requires write access to a temporary directory to read the files from.

Improvements

It's possible to extract content directly into Google Storage, that would look like this:

# If the blob is a zip file, extract into storage
if filepath.endswith('.zip'):
    with gscio.AsyncReader(blob) as fh:
        zfile = zipfile.ZipFile(fh)
        filepath = zfile.namelist()[0]
        for contentfilename in myzip.namelist():
            contentfile = zfile.read(contentfilename)
            nested_blob = self.bucket.blob(self.bucket_name + "/" + contentfilename)
            nested_blob.upload_from_string(contentfile)

elif filepath.endswith('.gz'):
    unzipped_filepath = filename[:-len('.gz')]
    filepath = unzipped_filepath

    # Checks if exists, and timestamp
    if not self._is_current(unzipped_filepath):
        unzipped_blob = self.bucket.blob(unzipped_filepath)
        with gscio.AsyncReader(blob) as fh:
            gzip_reader = gzip.GzipFile(fileobj=fh)
            unzipped_blob.upload_from_string(gzip_reader.read())

However, we would ultimately still need to write these files to a temporary location to allow for custom reading with different encodings, e.g., something like this:

with io.open(filename, "r", encoding="cp1252") as textfile:
    next(textfile)

So if you are able to reproduce that but getting content directly from a Google Storage read, then the Google Storage client would not need to write anything to a temporary file. Please open a pull request if you can contribute this change!