# Alluxio API in AI/ML Demo

Alluxio supports three main AI/ML APIs:
- POSIX API. Access dataset just like local file system folder
- S3 API. Access dataset just like S3 dataset
- Python API. Access dataset via Alluxio Python library

This demo will demonstrate how to use these three APIs to access data stored in S3 originally and cached by Alluxio.

In [None]:
# Set dataset location
dataset = "s3://ai-ref-arch/demo/api"

## Alluxio POSIX API

Alluxio POSIX API turns S3 dataset into your local folder with data locality

Set POSIX API parameters

In [None]:
mount_point = "/tmp/mnt/alluxio/fuse"

Mount S3 dataset to local mount point

In [None]:
! alluxio-fuse mount $dataset $mount_point

List directory and read file just like local data

In [None]:
# List all files in the specified directory
list_result = os.listdir(mount_point)
print(list_result)

In [None]:
# Loop through each file and read its contents
for file_name in list_result:
    file_path = os.path.join(mount_point, file_name)
    with open(file_path, 'r') as file:
        print(f"Contents of {file_name}:")
        for line in file:
            print(line.strip())
        print("\n")

## Alluxio Pyhton API

Alluxio Python API is based on Alluxio RESTful API to do efficiently data listing and reading

In [None]:
etcd_host="localhost"

In [None]:
from alluxio import AlluxioFileSystem
alluxio = AlluxioFileSystem(etcd_host=etcd_host)

In [1]:
# List all files in the specified directory
list_result = alluxio.listdir(dataset)
print(list_result)

In [None]:
# Loop through each file and read its contents
for file in list_result:
    file_path = f"{dataset}/{file["mName"]}"
    print(alluxio.read(file_path))

## Alluxio S3 API

In [None]:
# Configure worker addresses and Alluxio underlying storage address to use Alluxio S3 API
worker_host="localhost"
alluxio_under_storage="s3://ai-ref-arch/"

In [None]:
import boto3
alluxios3 = boto3.client(
            service_name="s3",
            aws_access_key_id="alluxio",  # alluxio user name
            aws_secret_access_key="SK...",  # dummy value
            endpoint_url=f"http://{worker_host}:29998"
        )

In [None]:
def subtract_path(self, path, parent_path):
    if "://" in path and "://" in parent_path:
        # Remove the parent_path from path
        relative_path = path[len(parent_path) :]
    else:
        # Get the relative path for local paths
        relative_path = os.path.relpath(path, start=parent_path)
    return relative_path

def get_bucket_path(self, full_path):
    alluxio_path = self.subtract_path(full_path, alluxio_under_storage)
    parts = alluxio_path.split("/", 1)
    if len(parts) == 0:
        self.logger.error(
            "Alluxio S3 API can only execute under a directory under "
            "dora root. This directory will be used as S3 bucket name"
        )
        return None
    elif len(parts) == 1:
        return parts[0], None
    else:
        return parts[0], parts[1]

In [None]:
# List all files in the specified directory
bucket, path = get_bucket_path(dataset)
list_result = alluxios3.list_objects_v2(Bucket=bucket, Prefix=path)
print(list_result)

files = [path_info.get("Key") for path_info in list_result]

# Loop through each file and read its contents
for file in list_result:
    file_path = f"{dataset}/{file.get("Key")}"
    bucket, path = get_bucket_path(file_path)
    print(alluxios3.get_object(Bucket=bucket, Key=path))