# Eagle Exporter Testing

This notebook demonstrates how to use the Eagle Exporter library.

In [None]:
from eagle_exporter.cli import export_metadata

# Basic export to Parquet
folder = r"D:\Andrew\45k_filter.library"
s5cmd = None
dest = r"out.parquet"
hf_public = False
include_images = False

export_metadata(folder, s5cmd, dest, hf_public, include_images)

## Exporting with Images

The following example shows how to export to a Hugging Face dataset with images included.

In [None]:
from eagle_exporter.cli import export_metadata

# Example for exporting to Hugging Face with images
folder = r"E:\Datasets\eagle_quick_rate_novelai\eagle_template.library"
s5cmd = None
dest = r"datatmp/nai-distill_01_batch03_eagle.library"  # Replace with your Hugging Face username/repo
hf_public = True
include_images = True

# Note: This will upload to Hugging Face if you have proper credentials set up
# Uncomment to run:

# retry 5 times
for i in range(5):
    try:
        export_metadata(folder, s5cmd, dest, hf_public, include_images)
        break
    except Exception as e:
        print(f"Failed to export, retrying... {e}")
        continue    

[37m2025-03-02 23:21:03 [INFO] ls: Listing contents of E:\Datasets\eagle_quick_rate_novelai\eagle_template.library\images[0m


Listing local files: 0files [00:00, ?files/s]

Loading concurrent:   0%|          | 0/1000 [00:00<?, ?it/s]

Loading images: 100%|██████████| 1000/1000 [00:00<00:00, 2262.18it/s]

[37m2025-03-02 23:21:07 [INFO] saves: Saving to hf://datatmp/nai-distill_01_batch03_eagle.library[0m





Uploading the dataset shards:   0%|          | 0/3 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/4 [00:00<?, ?ba/s]

'(MaxRetryError("HTTPSConnectionPool(host='hf-hub-lfs-us-east-1.s3-accelerate.amazonaws.com', port=443): Max retries exceeded with url: /repos/d6/3f/d63f244601c9de2e80617ed6ba476d45fb3d590cf087e267762d314016aef1aa/3a34bd475e36b69baf8371bd6d707680a001df77f38d1a9ad91d24dbda766972?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIA2JU7TKAQLC2QXPN7%2F20250303%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250303T042127Z&X-Amz-Expires=86400&X-Amz-Signature=004452171a6a8307c17097fab5175b80d7977e5be75e2af7ef144efe941a53b7&X-Amz-SignedHeaders=host&partNumber=1&uploadId=rpWBIZdzVxiKlGrNWdonw.Iv7fMH_kLZTNOoJXKbyQ1uRjsFYKZMjMfGb5ayv1rtto9d6Qbq.jn8Fb2SbHNa96vu0xBGBeqrI1920cb.WBS228OUhpyj4u5nAoIAJ2W2&x-id=UploadPart (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2384)')))"), '(Request ID: d766ecfe-eec3-46dc-9dce-6a1605a66edb)')' thrown while requesting PUT https://hf-hub-lfs-us-east-1.s3-accelerate.amazonaws.com/repos/d6/3f/d

Creating parquet from Arrow format:   0%|          | 0/4 [00:00<?, ?ba/s]

Creating parquet from Arrow format:   0%|          | 0/4 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/1.60k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/1.60k [00:00<?, ?B/s]

## Using the Core API

For more control, you can use the core API directly:

In [None]:
from eagle_exporter.core import build_dataframe

# Build DataFrame without loading images
folder = r"D:\Andrew\45k_filter.library"
df = build_dataframe(folder, include_images=False)

print(f"DataFrame shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

DataFrame shape: (123, 10)
Columns: ['filename', 'folders', 'tags', 'annotation', 'url', 'height', 'width', 'palette_color', 'palette_ratio', 'image_path']


In [None]:
# Build DataFrame with images
df_with_images = build_dataframe(folder, include_images=True)

print(f"DataFrame with images shape: {df_with_images.shape}")
print(f"Columns: {list(df_with_images.columns)}")

# Check the first image
first_image = df_with_images['image'].iloc[0]
print(f"First image is None: {first_image is None}")
if first_image is not None:
    print(f"Image type: {type(first_image)}")
    print(f"Image size: {len(first_image)} bytes")

Loading images: 100%|██████████| 123/123 [00:05<00:00, 21.34it/s]


DataFrame with images shape: (123, 11)
Columns: ['filename', 'folders', 'tags', 'annotation', 'url', 'height', 'width', 'palette_color', 'palette_ratio', 'image_path', 'image']
First image is None: False
Image type: <class 'bytes'>
Image size: 245678 bytes
