# Jupyter File Drag & Drop Widget - Demo

This notebook demonstrates how to use the drag-and-drop file upload widget for JupyterLab.

**Supported file formats:**
- CSV (`.csv`)
- Excel (`.xlsx`, `.xlsm`, `.xls`) - with multi-sheet support
- Feather (`.feather`)
- Parquet (`.parquet`)

## 1. FileDrop: Quick Start

The `FileDrop` class provides a simple one-line API for drag-and-drop file uploads.

In [None]:
import sys
sys.path.insert(0, '..')  # Add parent directory to find the module

from ipyfiledrop import FileDrop

# One-line creation with named drop zones
fd = FileDrop("Dataset A", "Dataset B")
fd.display()

In [None]:
# Access loaded DataFrames
print("Loaded datasets:", list(fd.datasets.keys()))

# Access individual DataFrame (returns selected sheet for Excel files)
df_a = fd["Dataset A"]  # Returns DataFrame or None
if df_a is not None:
    print(f"Dataset A shape: {df_a.shape}")
    display(df_a.head())

## 2. Multi-Sheet Excel Support

When you drop an Excel file with multiple sheets, a **dropdown selector** appears to switch between sheets.

**Try it:** Drop an Excel file with multiple sheets to see the dropdown appear.

In [None]:
# Create a widget for Excel files
fd_excel = FileDrop("Excel Data")
fd_excel.display()

In [None]:
fd_excel['Excel Data']

In [None]:
# After dropping a multi-sheet Excel file:

# Get the currently selected DataFrame
df = fd_excel["Excel Data"]
if df is not None:
    print(f"Selected sheet shape: {df.shape}")

# Get ALL sheets as a dictionary
all_sheets = fd_excel.get_all_sheets("Excel Data")
if all_sheets:
    print(f"\nAvailable sheets: {list(all_sheets.keys())}")
    for name, sheet_df in all_sheets.items():
        print(f"  - {name}: {sheet_df.shape[0]} rows × {sheet_df.shape[1]} columns")

In [None]:
# Programmatically select a different sheet
# (This also updates the dropdown in the widget)

if fd_excel.get_all_sheets("Excel Data"):
    sheets = list(fd_excel.get_all_sheets("Excel Data").keys())
    if len(sheets) > 1:
        fd_excel.select_sheet("Excel Data", sheets[1])  # Select second sheet
        print(f"Selected sheet: {sheets[1]}")

## 3. All Supported File Formats

The widget supports CSV, Excel (xlsx/xlsm/xls), Feather, and Parquet files.

In [None]:
# Create drop zones for different file types
fd_formats = FileDrop("CSV", "Excel", "Feather", "Parquet")
fd_formats.display()

In [None]:
# Check what's loaded
for label in ["CSV", "Excel", "Feather", "Parquet"]:
    df = fd_formats[label]
    if df is not None:
        print(f"{label}: {df.shape[0]} rows × {df.shape[1]} columns")
    else:
        print(f"{label}: (no file loaded)")

## 4. Dynamic Drop Zone Management

Add or remove drop zones dynamically.

In [None]:
fd_dynamic = FileDrop("Initial")
fd_dynamic.display()

In [None]:
# Add new drop zones (method chaining supported)
fd_dynamic.add("Added 1").add("Added 2")
print(fd_dynamic)

In [None]:
# Remove a drop zone
fd_dynamic.remove("Added 1")
print(fd_dynamic)

## 5. Embedding in ipywidgets Containers

Use the `.ui` property to embed FileDrop in Accordion, Tab, VBox, etc.

In [None]:
import ipywidgets as widgets
from IPython.display import display

# Embedding in Accordion
fd_acc1 = FileDrop("Training", "Validation")
fd_acc2 = FileDrop("Test", retain_data=True)

accordion = widgets.Accordion(children=[fd_acc1.ui, fd_acc2.ui])
accordion.set_title(0, "Train/Val Data")
accordion.set_title(1, "Test Data")

display(accordion)

In [None]:
# Embedding in Tab
fd_tab1 = FileDrop("CSV Files")
fd_tab2 = FileDrop("Excel Files")

tab = widgets.Tab(children=[fd_tab1.ui, fd_tab2.ui])
tab.set_title(0, "CSV")
tab.set_title(1, "Excel")

display(tab)

In [None]:
# Embedding with a button and output
fd_btn = FileDrop("Upload")
btn = widgets.Button(description="Process Data", button_style="primary")
output = widgets.Output()

def on_click(b):
    with output:
        output.clear_output()
        df = fd_btn["Upload"]
        if df is not None:
            print(f"Processing {df.shape[0]} rows...")
            display(df.describe())
        else:
            print("No file uploaded yet!")

btn.on_click(on_click)

display(widgets.VBox([fd_btn.ui, btn, output]))

## 6. IFrameDropWidget: Low-Level API

For more control, use `IFrameDropWidget` directly with the `on_data_ready` callback.

In [None]:
from ipyfiledrop import IFrameDropWidget

# Install global listener (FileDrop does this automatically)
IFrameDropWidget.install_global_listener()

In [None]:
# Define callback - receives Dict[str, DataFrame]
def on_data_ready(filename, data):
    """Called when a file is loaded.
    
    Args:
        filename: Name of the uploaded file
        data: Dict[str, DataFrame] - keys are sheet names for Excel, 'data' for others
    """
    print(f"\nLoaded: {filename}")
    print(f"Sheets/Keys: {list(data.keys())}")
    for name, df in data.items():
        print(f"  {name}: {df.shape[0]} rows × {df.shape[1]} columns")

# Create widget with callback
widget = IFrameDropWidget(on_data_ready=on_data_ready)
widget.display()

In [None]:
# Access data via properties
if widget.data:
    print(f"Available sheets: {widget.sheet_names}")
    print(f"Currently selected: {widget.selected_key}")
    print(f"\nSelected DataFrame:")
    display(widget.selected_dataframe.head())
else:
    print("No data loaded yet. Drop a file in the widget above!")

## 7. Multi-File Drop & Archive Support

Use `retain_data=True` to accumulate multiple files. Archives (.zip, .tar.gz) are automatically extracted.

In [None]:
# Accumulate mode - files stack up instead of replacing
fd_multi = FileDrop("Multi-File", retain_data=True)
fd_multi.display()
print("Drop multiple files - they will accumulate!\n")
print("You can also drop .zip or .tar.gz archives.")

In [None]:
# View accumulated data
all_data = fd_multi.get_all_data("Multi-File")
print(f"Loaded files: {list(all_data.keys())}")

# Check for failed imports (e.g. unsupported files in an archive)
failed = fd_multi.get_failed_imports("Multi-File")
if failed:
    print(f"\nFailed imports:")
    for f in failed:
        print(f"  - {f['filename']}: {f['error']}")

# Clear accumulated data programmatically
# fd_multi.clear("Multi-File")

## 8. Data Import Pipeline

The pipeline automatically extracts core data from messy files, cleans it, and can combine multiple files.

**Features:**
- `extract_core=True`: Extract core data table from messy files with headers/footers
- `clean="standard"`: Apply cleaning preset (normalize columns, drop empty rows, etc.)
- `fd.combine()`: Combine multiple DataFrames into one

In [None]:
# Test the pipeline with sample messy data
import pandas as pd
from ipyfiledrop import extract_core_data, clean_dataframe, combine_dataframes

# Load the messy sample_log.csv directly (simulating what happens on drop)
raw = pd.read_csv('data/sparse_messy/sample_log.csv', header=None)
print(f"Raw data shape: {raw.shape}")
print(f"\nRaw data preview (first 8 rows):")
display(raw.head(8))

In [None]:
# Extract core data from the messy file
result = extract_core_data(raw)

print(f"Extraction Results:")
print(f"  Core shape: {result.core.shape}")
print(f"  Header row detected: {result.header_row}")
print(f"  Data range: rows {result.data_range[0]}-{result.data_range[1]}")
print(f"  Confidence: {result.confidence:.2f}")
print(f"\nExtracted metadata: {result.metadata}")
print(f"Extracted footer: {result.footer}")
print(f"\nCore data preview:")
display(result.core.head())

In [None]:
# Apply cleaning to normalize column names and remove empty rows
cleaned = clean_dataframe(result.core, preset='standard')

print(f"Cleaned columns: {list(cleaned.columns)}")
print(f"Cleaned shape: {cleaned.shape}")
display(cleaned.head())

In [None]:
# Combine multiple DataFrames with source tracking
data = {
    'batch1.csv': cleaned.head(5),
    'batch2.csv': cleaned.tail(5),
}
combined = combine_dataframes(data, add_source=True)

print(f"Combined shape: {combined.shape}")
print(f"Sources: {combined['_source'].unique().tolist()}")
display(combined)

## 9. FileDrop with Pipeline Integration

Use `extract_core` and `clean` parameters directly in FileDrop.

In [None]:
# Create FileDrop with pipeline enabled
# - extract_core: Automatically extract core data from messy files
# - clean: Apply 'standard' cleaning preset
# - retain_data: Accumulate multiple files

fd_pipeline = FileDrop(
    "Messy Data",
    extract_core=True,
    clean="standard",
    retain_data=True
)
fd_pipeline.display()

print("Drop data/sparse_messy/sample_log.csv to test the pipeline!")

In [None]:
# After dropping a file, access extracted data
try:
    extracted = fd_pipeline.extract("Messy Data")
    print(f"Core shape: {extracted.core.shape}")
    print(f"Metadata: {extracted.metadata}")
    print(f"Footer: {extracted.footer}")
    print(f"Confidence: {extracted.confidence:.2f}")
except ValueError as e:
    print(f"No data yet: {e}")

In [None]:
# Combine all dropped files into one DataFrame
try:
    combined = fd_pipeline.combine("Messy Data", add_source=True)
    print(f"Combined shape: {combined.shape}")
    display(combined.head(10))
except ValueError as e:
    print(f"No data yet: {e}")

## 10. Cleaning Presets

Available presets: `'none'`, `'minimal'`, `'standard'`, `'aggressive'`

In [None]:
from ipyfiledrop import CLEANING_PRESETS

for name, cleaners in CLEANING_PRESETS.items():
    cleaner_names = [c.__name__ for c in cleaners]
    print(f"{name}: {cleaner_names if cleaner_names else '(no cleaning)'}")

In [None]:
# Example: Use individual cleaners for custom pipeline
from ipyfiledrop import normalize_columns, strip_whitespace, drop_empty_rows

# Custom cleaner chain
fd_custom = FileDrop(
    "Custom Clean",
    cleaners=[normalize_columns, strip_whitespace, drop_empty_rows]
)
fd_custom.display()

In [None]:
fd_custom["Custom Clean"]

### Normalize Column Options

The `normalize_columns` cleaner has options to preserve:
- `preserve_case=True`: Keep original case (default: lowercase)
- `preserve_dashes=True`: Keep dashes `-` (default: replace with `_`)
- `preserve_dots=True`: Keep dots `.` (default: replace with `_`)

In [None]:
import pandas as pd
from ipyfiledrop import normalize_columns, make_normalize_columns

# Sample DataFrame with various special characters
df = pd.DataFrame({
    'Sample-ID': [1, 2],
    'Test.Type': ['A', 'B'],
    'Result (Value)': [10.5, 20.3],
    'Version v1.2': ['x', 'y']
})
print("Original DataFrame:")
display(df)

In [None]:
# Compare different normalization options
print("Default (lowercase, all special chars -> _):")
display(normalize_columns(df))

print("\npreserve_dashes=True (keeps dashes for IDs like SAMP-001):")
display(normalize_columns(df, preserve_dashes=True))

print("\npreserve_dots=True (keeps dots for versions like v1.2):")
display(normalize_columns(df, preserve_dots=True))

print("\npreserve_case=True, preserve_dashes=True:")
display(normalize_columns(df, preserve_case=True, preserve_dashes=True))

In [None]:
# Use make_normalize_columns() factory for FileDrop cleaner chains
from ipyfiledrop import make_normalize_columns, strip_whitespace, drop_empty_rows

fd_preserve = FileDrop(
    "Preserve Special Chars",
    cleaners=[
        make_normalize_columns(preserve_case=True, preserve_dashes=True),
        strip_whitespace,
        drop_empty_rows
    ]
)
fd_preserve.display()
print("Columns will preserve case and dashes: Sample-ID, Test_Type, etc.")

In [None]:
fd_preserve["Preserve Special Chars"]

### Strip Whitespace Options

The `strip_whitespace` cleaner removes leading/trailing whitespace. Use `normalize_inner=True` to also collapse multiple inner spaces to a single space.

In [None]:
import pandas as pd
from ipyfiledrop import strip_whitespace, make_strip_whitespace

# Sample DataFrame with various whitespace issues
df = pd.DataFrame({
    'Name': ['  John   Doe  ', '  Jane    Smith  '],
    'Address': ['  123   Main   St  ', '  456   Oak   Ave  ']
})
print("Original DataFrame:")
display(df)

In [None]:
# Default: only strip edges, preserve inner whitespace
print("strip_whitespace() - edges only:")
display(strip_whitespace(df))

print("\nstrip_whitespace(normalize_inner=True) - collapse inner spaces:")
display(strip_whitespace(df, normalize_inner=True))

In [None]:
# Use make_strip_whitespace() factory for FileDrop cleaner chains
from ipyfiledrop import make_strip_whitespace, normalize_columns, drop_empty_rows

fd_normalize_ws = FileDrop(
    "Normalize Whitespace",
    cleaners=[
        normalize_columns,
        make_strip_whitespace(normalize_inner=True),
        drop_empty_rows
    ]
)
fd_normalize_ws.display()
print("Inner whitespace will be collapsed: '  John   Doe  ' -> 'John Doe'")

In [None]:
# View the result after dropping a file
fd_normalize_ws["Normalize Whitespace"] 

## 11. Full datasets Property

The `datasets` property returns all loaded data with full metadata.

In [None]:
# Create FileDrop and load some files
fd_meta = FileDrop("Data 1", "Data 2")
fd_meta.display()

In [None]:
# Inspect the datasets property after loading files
for label, info in fd_meta.datasets.items():
    print(f"\n{label}:")
    print(f"  Filename: {info['filename']}")
    print(f"  Selected: {info['selected']}")
    print(f"  Available sheets: {list(info['data'].keys())}")