# Wine AI Dataset - Quick Start Guide

This notebook demonstrates how to use the Wine AI Dataset for machine learning projects.

## 1. Load the Dataset

The easiest way to get started is using the `WineDataset.load_latest()` method:

In [None]:
from wine_ai.data.loaders import WineDataset, WineImageLoader

# Load the dataset with train/validation/test splits
dataset = WineDataset.load_latest()

print(f"Train set: {len(dataset.train)} samples")
print(f"Validation set: {len(dataset.validation)} samples")
print(f"Test set: {len(dataset.test)} samples")

## 2. Explore the Data

In [None]:
# Look at the first few samples
print("Dataset columns:", list(dataset.train.columns))
print("\nFirst sample:")
print(dataset.train.iloc[0])

In [None]:
# Check wine categories distribution
print("Wine Category Distribution:")
print(dataset.train['wine_category'].value_counts())

## 3. Work with Images

In [None]:
from PIL import Image
import matplotlib.pyplot as plt

# Create an image loader
image_loader = WineImageLoader()

# Find a sample with an existing image
sample_with_image = dataset.train[dataset.train['image_filename'].apply(image_loader.exists)].iloc[0]
print(f"Wine: {sample_with_image['name']}")
print(f"Category: {sample_with_image['wine_category']}")
print(f"Price: ${sample_with_image['price']}")

# Load and display the image
image_path = image_loader.path_for(sample_with_image['image_filename'])
image = Image.open(image_path)

plt.figure(figsize=(6, 8))
plt.imshow(image)
plt.axis('off')
plt.title(sample_with_image['name'])
plt.show()

## 4. Generate Text Data for Training

In [None]:
# Use built-in iterators for common tasks
print("Sample wine names for language modeling:")
for i, name in enumerate(dataset.iter_name_sequences()):
    if i >= 5:  # Show first 5
        break
    print(f"  {name}")

print("\nSample name-description pairs for conditional generation:")
for i, (name, description) in enumerate(dataset.iter_description_pairs()):
    if i >= 3:  # Show first 3
        break
    print(f"  Name: {name}")
    print(f"  Description: {description[:100]}...")
    print()

## 5. Convert to Hugging Face Dataset

In [None]:
from wine_ai.data.loaders import to_hf_dataset

# Convert training split to Hugging Face dataset
hf_dataset = to_hf_dataset(dataset.train)
print(f"Hugging Face dataset: {hf_dataset}")
print(f"Features: {hf_dataset.features}")

## Next Steps

- Check `notebooks/00_data_exploration.ipynb` for detailed data analysis
- Use `wine-train` CLI command to train models
- Use `wine-validate` to check data quality
- Explore the `src/wine_ai/` package for model implementations