# Eagle Exporter Testing

This notebook demonstrates how to use the Eagle Exporter library.

In [None]:
from eagle_exporter.cli import export_metadata

# Basic export to Parquet
folder = r"D:\Andrew\45k_filter.library"
s5cmd = None
dest = r"out.parquet"
hf_public = False
include_images = False

export_metadata(folder, s5cmd, dest, hf_public, include_images)

## Exporting with Images

The following example shows how to export to a Hugging Face dataset with images included.

In [3]:
from eagle_exporter.cli import export_metadata

# Example for exporting to Hugging Face with images
folder = r"E:\Datasets\eagle_quick_rate_novelai\eagle_template.library"
s5cmd = None
dest = r"datatmp/nai-distill_01_batch01_eagle2.library"  # Replace with your Hugging Face username/repo
hf_public = False
include_images = True

# Note: This will upload to Hugging Face if you have proper credentials set up
# Uncomment to run:
export_metadata(folder, s5cmd, dest, hf_public, include_images)

[37m2025-03-02 15:22:40 [INFO] ls: Listing contents of E:\Datasets\eagle_quick_rate_novelai\eagle_template.library\images[0m


Listing local files: 0files [00:00, ?files/s]

Loading concurrent:   0%|          | 0/1000 [00:00<?, ?it/s]

Loading images: 100%|██████████| 1000/1000 [00:00<00:00, 2359.42it/s]


CastError: Couldn't cast
filename: string
size: int64
tags: list<item: string>
  child 0, item: string
folders: list<item: null>
  child 0, item: null
isDeleted: bool
url: string
annotation: string
star: int64
height: int64
width: int64
palette_color: string
palette_ratio: int64
image: struct<bytes: binary>
  child 0, bytes: binary
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 1733
to
{'image': Image(mode=None, decode=True, id=None)}
because column names don't match

## Using the Core API

For more control, you can use the core API directly:

In [None]:
from eagle_exporter.core import build_dataframe

# Build DataFrame without loading images
folder = r"D:\Andrew\45k_filter.library"
df = build_dataframe(folder, include_images=False)

print(f"DataFrame shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

DataFrame shape: (123, 10)
Columns: ['filename', 'folders', 'tags', 'annotation', 'url', 'height', 'width', 'palette_color', 'palette_ratio', 'image_path']


In [None]:
# Build DataFrame with images
df_with_images = build_dataframe(folder, include_images=True)

print(f"DataFrame with images shape: {df_with_images.shape}")
print(f"Columns: {list(df_with_images.columns)}")

# Check the first image
first_image = df_with_images['image'].iloc[0]
print(f"First image is None: {first_image is None}")
if first_image is not None:
    print(f"Image type: {type(first_image)}")
    print(f"Image size: {len(first_image)} bytes")

Loading images: 100%|██████████| 123/123 [00:05<00:00, 21.34it/s]


DataFrame with images shape: (123, 11)
Columns: ['filename', 'folders', 'tags', 'annotation', 'url', 'height', 'width', 'palette_color', 'palette_ratio', 'image_path', 'image']
First image is None: False
Image type: <class 'bytes'>
Image size: 245678 bytes
