# Alluxio API in AI/ML Demo

Alluxio supports three main AI/ML APIs:
- POSIX API. Access dataset just like local file system folder
- S3 API. Access dataset just like S3 dataset
- Python API. Access dataset via Alluxio Python library

This demo will demonstrate how to use these three APIs to access data stored in S3 originally and cached by Alluxio.

In [1]:
# Set dataset location
dataset = "s3://ai-ref-arch/demo/api"

## Alluxio POSIX API

Alluxio POSIX API turns S3 dataset into your local folder with data locality

Set POSIX API parameters

In [2]:
mount_point = "/Users/alluxio/mnt/fuse"
alluxio_fuse_dir = '/Users/alluxio/alluxioFolder/alluxio/dora/integration/fuse/bin'

Mount S3 dataset to local mount point

In [3]:
import os
os.environ['PATH'] = f"{alluxio_fuse_dir}:{os.environ['PATH']}"
! alluxio-fuse mount $dataset $mount_point

Mounting s3://ai-ref-arch/demo/api to /Users/alluxio/mnt/fuse
Successfully mounted s3://ai-ref-arch/demo/api to /Users/alluxio/mnt/fuse


In [4]:
# List all files in the specified directory
list_result = os.listdir(mount_point)
print(list_result)

['Alluxio', 'Hello']


In [5]:
# Loop through each file and read its contents
for file_name in list_result:
    file_path = os.path.join(mount_point, file_name)
    with open(file_path, 'r') as file:
        print(f"Contents of {file_name}:")
        for line in file:
            print(line.strip())
        print("\n")

Contents of Alluxio:
Welcome to Alluxio!


Contents of Hello:
Hello Alluxio!




## Alluxio Python API

Alluxio Python API is based on Alluxio RESTful API to do efficiently data listing and reading

In [6]:
etcd_host="localhost"

In [7]:
from alluxio import AlluxioFileSystem
alluxio = AlluxioFileSystem(etcd_host=etcd_host)

In [8]:
# List all files in the specified directory
list_result = alluxio.listdir(dataset)
print(list_result)

[{'mType': 'file', 'mName': 'Alluxio', 'mLength': 20}, {'mType': 'file', 'mName': 'Hello', 'mLength': 15}]


In [9]:
# Loop through each file and read its contents
for file in list_result:
    file_path = f"{dataset}/{file['mName']}"
    print(alluxio.read(file_path))

b'Welcome to Alluxio!\n'
b'Hello Alluxio!\n'


## Alluxio S3 API

In [10]:
# Configure worker addresses and Alluxio underlying storage address to use Alluxio S3 API
worker_host="localhost"
alluxio_under_storage="s3://ai-ref-arch/"

In [11]:
import boto3
import json
alluxios3 = boto3.client(
            service_name="s3",
            aws_access_key_id="alluxio",  # alluxio user name
            aws_secret_access_key="SK...",  # dummy value
            endpoint_url=f"http://{worker_host}:29998"
        )

In [12]:
def subtract_path(path, parent_path):
    if "://" in path and "://" in parent_path:
        # Remove the parent_path from path
        relative_path = path[len(parent_path) :]
    else:
        # Get the relative path for local paths
        relative_path = os.path.relpath(path, start=parent_path)
    return relative_path

def get_bucket_path(full_path):
    alluxio_path = subtract_path(full_path, alluxio_under_storage)
    parts = alluxio_path.split("/", 1)
    if len(parts) == 0:
        self.logger.error(
            "Alluxio S3 API can only execute under a directory under "
            "dora root. This directory will be used as S3 bucket name"
        )
        return None
    elif len(parts) == 1:
        return parts[0], None
    else:
        return parts[0], parts[1]

In [13]:
# List all files in the specified directory
bucket, path = get_bucket_path(dataset)
list_result = alluxios3.list_objects_v2(Bucket=bucket, Prefix=path)["Contents"]
print(list_result)

[{'Key': 'api/', 'LastModified': datetime.datetime(1969, 12, 31, 16, 0, tzinfo=tzutc()), 'Size': 0}, {'Key': 'api/Alluxio', 'LastModified': datetime.datetime(2023, 9, 5, 16, 18, 30, tzinfo=tzutc()), 'Size': 20}, {'Key': 'api/Hello', 'LastModified': datetime.datetime(2023, 9, 5, 16, 18, 46, tzinfo=tzutc()), 'Size': 15}]


In [14]:
# Loop through each file and read its contents
files = [obj["Key"] for obj in list_result if not obj["Key"].endswith('/')]
for file in files:
    file_path = f"{alluxio_under_storage}{bucket}/{file}"
    bucket, path = get_bucket_path(file_path)
    print(alluxios3.get_object(Bucket=bucket, Key=path)["Body"].read())

b'Welcome to Alluxio!\n'
b'Hello Alluxio!\n'
