# Test Case: PDF Table Extraction and Interactive Widget

**Use Case:** Extract tables from PDF documents and create interactive visualizations

**Dataset:** PDF file with tabular data

**Goal:** 
- Test PDF extractor and data profile extraction
- Create an interactive table widget from PDF data
- Demonstrate PDF support in vibe-widgets


In [None]:
# Setup
import sys
from pathlib import Path
import pandas as pd
import numpy as np

sys.path.insert(0, str(Path.cwd()))

import vibe_widget as vw
from vibe_widget.data_parser.preprocessor import preprocess_data

import os
API_KEY = os.getenv("ANTHROPIC_API_KEY")

print("Setup complete!")


Setup complete!


## Step 1: Check PDF File


In [3]:
!pip install 'camelot-py[base]'


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
# Check if PDF file exists
pdf_path = Path("testdata/foo.pdf")

print(f"PDF file: {pdf_path}")
print(f"File exists: {pdf_path.exists()}")
if pdf_path.exists():
    print(f"File size: {pdf_path.stat().st_size / 1024:.2f} KB")


PDF file: testdata/foo.pdf
File exists: True
File size: 82.19 KB


## Step 2: Extract Data Profile from PDF


In [5]:
# Extract profile from PDF
print("Extracting data profile from PDF...")
profile = preprocess_data(
    pdf_path,
    api_key=API_KEY,
    context={"description": "PDF document with tabular data"},
    augment=True
)

print(f"\nProfile extracted!")
print(f"Source type: {profile.source_type}")
print(f"Shape: {profile.shape}")
print(f"Columns: {len(profile.columns)}")
print(f"\nColumn names:")
for col in profile.columns:
    print(f"  - {col.name} ({col.dtype})")


Extracting data profile from PDF...


  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4
  parsed = pd.to_datetime(sample, errors='coerce')
  parsed = pd.to_datetime(sample, errors='coerce')
  parsed = pd.to_datetime(sample, errors='coerce')
  parsed = pd.to_datetime(sample, errors='coerce')
  parsed = pd.to_datetime(sample, errors='coerce')
  parsed = pd.to_datetime(sample, errors='coerce')
  parsed = pd.to_datetime(sample, errors='coerce')



Profile extracted!
Source type: pdf
Shape: (6, 7)
Columns: 7

Column names:
  - Cycle 
Name (text)
  - KI 
(1/km) (text)
  - Distance 
(mi) (text)
  - Percent Fuel Savings (text)
  - Column_4 (text)
  - Column_5 (text)
  - Column_6 (text)


## Step 3: Preview Extracted Data


## Step 4: Create Interactive Table Widget from PDF


In [7]:
# Create an interactive table widget from the PDF
print("Creating interactive table widget from PDF...")

widget = vw.create(
    "create an interactive table with sorting, filtering, and search capabilities",
    pdf_path,
    api_key=API_KEY,
    context=profile,  # Use the pre-computed profile
    use_preprocessor=True
)

print("\nWidget created successfully!")
widget


Creating interactive table widget from PDF...
create an interactive table with sorting, filtering, and search capabilities

Data Profile: # Dataset Profile: pdf
**Source:** `testdata/foo.pdf`

## Overview
- **Shape:** 6 × 7
- **Completeness:** 100.0%
- **Domain:** Transportation and Fuel Efficiency / Vehicle Fleet Management
- **Purpose:** This dataset analyzes fuel savings potential across different driving cycles by comparing various driving behavior modifications. It appears to be used for evaluating how changes in speed, acceleration, stopping patterns, and idle time can impact fuel consumption in vehicle operations, likely for fleet optimization or eco-driving research.

## Fields

### `Cycle 
Name` (text)
- Count: 6

### `KI 
(1/km)` (text)
- Count: 6

### `Distance 
(mi)` (text)
- Count: 6

### `Percent Fuel Savings` (text)
- Count: 6

### `Column_4` (text)
- Count: 6

### `Column_5` (text)
- Count: 6

### `Column_6` (text)
- Count: 6

## Sample Data
```json
[
  {
    "Cycle \nN

<vibe_widget.core.VibeWidget object at 0x1682a5dd0>


Widget created successfully!


<vibe_widget.core.VibeWidget object at 0x1682a5dd0>

## Step 5: Alternative - Create Widget with Direct DataFrame

If you've already extracted the DataFrame, you can also pass it directly:


In [8]:
# Alternative approach: Extract DataFrame first, then create widget
try:
    import camelot
    
    tables = camelot.read_pdf(str(pdf_path), pages='all', flavor='lattice')
    if len(tables) == 0:
        tables = camelot.read_pdf(str(pdf_path), pages='all', flavor='stream')
    
    if len(tables) > 0:
        df_extracted = tables[0].df
        if len(df_extracted) > 0:
            df_extracted.columns = df_extracted.iloc[0]
            df_extracted = df_extracted[1:]
            df_extracted = df_extracted.reset_index(drop=True)
        
        # Create widget directly from DataFrame
        widget2 = vw.create(
            "create an interactive data table with column sorting and row filtering",
            df_extracted,
            api_key=API_KEY,
            use_preprocessor=True
        )
        
        print("Widget created from DataFrame!")
        widget2
    else:
        print("No tables found in PDF")
except Exception as e:
    print(f"Error: {e}")


  "sample": df.head(3).to_dict(orient="records"),


create an interactive data table with column sorting and row filtering


 CONTEXT::DATA_INFO:

 {'columns': ['Cycle \nName', 'KI \n(1/km)', 'Distance \n(mi)', 'Percent Fuel Savings', '', '', ''], 'dtypes': {'Cycle \nName': 'object', 'KI \n(1/km)': 'object', 'Distance \n(mi)': 'object', 'Percent Fuel Savings': 'object', '': 'object'}, 'shape': (6, 7), 'sample': [{'Cycle \nName': '', 'KI \n(1/km)': '', 'Distance \n(mi)': '', 'Percent Fuel Savings': 'Improved \nSpeed', '': 'Decreased \nIdle'}, {'Cycle \nName': '2012_2', 'KI \n(1/km)': '3.30', 'Distance \n(mi)': '1.3', 'Percent Fuel Savings': '5.9%', '': '17.4%'}, {'Cycle \nName': '2145_1', 'KI \n(1/km)': '0.68', 'Distance \n(mi)': '11.2', 'Percent Fuel Savings': '2.4%', '': '2.7%'}]}
import * as d3 from "https://esm.sh/d3@7";

function render({ model, el }) {
  const data = model.get("data");
  
  el.innerHTML = '';
  
  const style = document.createElement('style');
  style.textContent = `
    @import url('https://fonts.googleapis.com/cs

  data_json = df.to_dict(orient="records")


<vibe_widget.core.VibeWidget object at 0x13da6a390>

Widget created from DataFrame!
