[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/shared-tables.ipynb)&nbsp;&nbsp;
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/shared-tables.ipynb)

# Working with Shared Tables

Pixeltable's shared tables feature allows you to distribute and collaborate on data by publishing tables to the cloud and cloning them locally. This enables data sharing, offline work, and collaborative data science workflows.

## What are Shared Tables?

Shared tables in Pixeltable allow you to:
- Replicate tables from the cloud and work with them locally
- Publish local tables to the cloud for sharing
- Keep local replicas synchronized with cloud versions
- Enable data distribution and collaborative work

## Key Functions

**For publishers (sharing your data):**
- `pxt.publish(local_table, cloud_uri)` - Publish local table (source) to cloud (destination)
- `my_table.push()` - Push changes from your table to the cloud (method on your table object)

**For consumers (accessing shared data):**
- `pxt.replicate(cloud_uri, local_path)` - Replicate cloud table (source) to local (destination)
- `shared_table.pull()` - Update your table with latest cloud changes (method on your table object)

**Note**: All functions follow the pattern: `function(source, destination)`
- **Global functions**: `pxt.publish()`, `pxt.replicate()` - operate on sources/destinations
- **Table methods**: `my_table.push()`, `shared_table.pull()` - methods you call on your table objects

**General:**
- `pxt.get_table(path)` - Gets a handle to an existing table
- `pxt.list_tables()` - Lists all available tables


In [None]:
%pip install -qU pixeltable


import pixeltable as pxt
from pixeltable import exceptions as excs

# Create a clean environment for the demo
pxt.drop_dir('shared_demo', force=True)  # Ensure a clean slate for the demo
pxt.create_dir('shared_demo')


In [None]:
# First, let's create a sample table to work with
schema = {
    'id': pxt.Int,
    'name': pxt.String,
    'value': pxt.Float,
}
sample_table = pxt.create_table('shared_demo.sample_data', schema)

# Add some sample data
sample_table.insert([
    {'id': 1, 'name': 'Alice', 'value': 42.5},
    {'id': 2, 'name': 'Bob', 'value': 37.8},
    {'id': 3, 'name': 'Charlie', 'value': 91.2}
])

print("Created sample table with data:")
sample_table.show()


## For Publishers: Sharing Your Data

If you have a table you want to share with others, you can publish it to the cloud.


In [None]:
# Define the cloud URI where you want to publish the table
cloud_uri = "pxt://org:my_db/shared.sample_data"

print(f"Publishing table to: {cloud_uri}")
print(f"Source: shared_demo.sample_data (local table)")
print(f"Destination: {cloud_uri} (cloud URI)")

# Note: This will fail with placeholder URI - replace with real URI to test
try:
    published_uri = pxt.publish('shared_demo.sample_data', cloud_uri)  # source, destination
    print(f"✅ Successfully published table!")
    print(f"Published URI: {published_uri}")
except excs.Error as e:
    print(f"❌ Error publishing table: {e}")
    print("Note: This example uses a placeholder URI. Replace with a real Pixeltable cloud URI to test.")


## For Consumers: Accessing Shared Data

If you want to work with a table that someone else has published, you have two options:

### Option 1: Replicate (Recommended)
Create a read-only replica that stays connected to the upstream cloud table:
- ✅ Can pull updates from the cloud when data changes
- ✅ Always have the latest version
- ❌ Read-only - must create views to work with the data
- Use: `pxt.replicate(cloud_uri, local_path)` - source (cloud), destination (local)

### Option 2: Create Table
Create a regular table from cloud data with no upstream connection:
- ✅ Can work with data directly (insert, update, delete)
- ✅ No upstream dependency
- ❌ No automatic updates - data becomes stale
- Use: `pxt.create_table(local_path, source=cloud_uri)`


In [None]:
# Option 1: Replicate (recommended for most use cases)
cloud_uri = "pxt://org:my_db/shared.sample_data"
local_replica_path = "shared_demo.replicated_data"

print("=== Option 1: Replicate (Read-only with upstream connection) ===")
print(f"Source: {cloud_uri} (cloud table)")
print(f"Destination: {local_replica_path} (local path)")

try:
    replica_table = pxt.replicate(cloud_uri, local_replica_path)  # source, destination
    print(f"✅ Successfully replicated table!")
    print(f"Replica path: {replica_table._path()}")
    print(f"Row count: {replica_table.count()}")
    
    # Show the replicated data
    print("\nReplicated data:")
    replica_table.show()
    
    # Demonstrate that it's read-only - this would fail:
    print("\n⚠️  Replica tables are read-only. To work with data, create a view:")
    try:
        view = pxt.create_view("shared_demo.replica_view", replica_table, 
                              additional_columns={'computed_value': replica_table.value * 2})
        print("✅ Created view on replica table:")
        view.show()
    except excs.Error as e:
        print(f"Note: {e}")
    
except excs.Error as e:
    print(f"❌ Error replicating table: {e}")
    print("Note: This example uses a placeholder URI. Replace with a real Pixeltable cloud URI to test.")

print("\n" + "="*60)

# Option 2: Create Table (for direct data manipulation)
print("=== Option 2: Create Table (Direct data manipulation) ===")
local_table_path = "shared_demo.local_copy"

print(f"Creating regular table from: {cloud_uri}")
print(f"Local table path: {local_table_path}")

try:
    # Note: This would be pxt.create_table(local_table_path, source=cloud_uri)
    # For demo purposes, we'll create from our existing sample table
    regular_table = pxt.create_table(local_table_path, {'id': pxt.Int, 'name': pxt.String, 'value': pxt.Float})
    regular_table.insert(sample_table.select().collect())
    
    print(f"✅ Successfully created regular table!")
    print(f"Table path: {regular_table._path()}")
    print(f"Row count: {regular_table.count()}")
    
    # Show that you can modify it directly
    print("\nRegular table (can be modified directly):")
    regular_table.show()
    
    print("\n✅ Can insert/update/delete directly (no upstream connection):")
    regular_table.insert([{'id': 5, 'name': 'Eve', 'value': 88.9}])
    print("After inserting new row:")
    regular_table.show()
    
except excs.Error as e:
    print(f"❌ Error creating table: {e}")


## Keeping Tables Synchronized

Once you have a published table or replica table, you can keep them synchronized with updates.


In [None]:
# Publishers: Push updates to the cloud
print("=== Publisher Workflow ===")
sample_table.insert([{'id': 4, 'name': 'Diana', 'value': 65.3}])
print("Added new row to original table:")
sample_table.show()

print(f"\nPushing updates from table: {sample_table._path()}")
print("Note: my_table.push() calls the push() method on YOUR table object")
try:
    sample_table.push()  # Method on your table object (my_table.push())
    print("✅ Successfully pushed updates to cloud!")
except excs.Error as e:
    print(f"❌ Error pushing updates: {e}")
    print("Note: This will fail if the table wasn't successfully published.")

print("\n" + "="*50)

# Consumers: Pull updates from the cloud
print("=== Consumer Workflow ===")
print("Pulling updates for table...")
print("Note: shared_table.pull() calls the pull() method on YOUR table object")
try:
    # Assuming we have a replica table handle
    replica_table.pull()  # Method on your table object (shared_table.pull())
    print("✅ Successfully pulled updates!")
    print("Updated data:")
    replica_table.show()
except excs.Error as e:
    print(f"❌ Error pulling updates: {e}")
    print("Note: This will fail if we don't have a successfully replicated table.")


## Cloud URI Format

Pixeltable uses a specific URI format for cloud tables: `pxt://org:db_name/directory.table_name`


In [None]:
# Examples of valid cloud URIs:
examples = [
    "pxt://acme:prod/data_science.sales_data",
    "pxt://research:dev/experiments.results",
    "pxt://team:staging/models.predictions"
]

print("Examples of valid cloud URIs:")
for uri in examples:
    print(f"  - {uri}")

print("\nKey points:")
print("- Use dots (.) not slashes (/) for path separators")
print("- The format is: pxt://organization:database/directory.table_name")
print("- Both directory and table_name are optional")


## Summary

Shared tables in Pixeltable enable collaborative data science workflows with two distinct user roles:

### For Publishers (Data Owners)
- **Publish**: Share your local tables with others using `pxt.publish(local_table, cloud_uri)` (source, destination)
- **Update**: Push local changes to the cloud using `my_table.push()` (method on your table object)

### For Consumers (Data Users)  
**Two approaches available:**

1. **Replicate** (Recommended): `pxt.replicate(cloud_uri, local_path)` (source, destination) → Read-only replica with upstream connection
   - ✅ Can pull updates with `shared_table.pull()` (method on your table object)
   - ❌ Read-only - create views to work with data
   
2. **Create Table**: `pxt.create_table(local_path, source=cloud_uri)` → Regular table with no upstream connection
   - ✅ Can modify data directly
   - ❌ No automatic updates - data becomes stale

### Argument Order Pattern
**Always**: `function(source, destination)`
- `pxt.publish(local_table, cloud_uri)` - local table (source) → cloud (destination)
- `pxt.replicate(cloud_uri, local_path)` - cloud table (source) → local (destination)

### Key Points
- Choose **Replicate** if you want to stay synchronized with upstream changes
- Choose **Create Table** if you want to modify the data independently
- Always wrap operations in try-catch blocks for proper error handling
- Use the cloud URI format: `pxt://org:db_name/directory.table_name`

### Important: Local vs Global Operations
- **Global functions** (`pxt.publish()`, `pxt.replicate()`) operate on source/destination paths
- **Table methods** (`my_table.push()`, `shared_table.pull()`) are called on your pixeltable objects
- Remember: `my_table.push()` and `shared_table.pull()` are methods you call on YOUR table objects, not global functions
