[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/shared-tables.ipynb)&nbsp;&nbsp;
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/shared-tables.ipynb)

# Working with Shared Tables

Pixeltable's shared tables feature allows you to distribute data by publishing tables to the cloud and accessing them locally. This enables data sharing and distributed data science workflows.

## What are Shared Tables?

Shared tables in Pixeltable allow you to:
- Create read-only replicas from the cloud and work with them locally
- Publish local tables to the cloud for sharing
- Keep local read-only replicas synchronized with cloud versions
- Enable data distribution and sharing

## Key Functions

**Sharing your data:**
- `pxt.publish(local_table, cloud_uri, access='public'|'private')` - Publish local table (source) to cloud (destination)
- `my_table.push()` - Push changes from your local table to the cloud

**Accessing shared data:**
- `pxt.replicate(cloud_uri, local_path)` - Create read-only replica from cloud table (source) to local (destination)
- `shared_table.pull()` - Update your local read-only replica table with latest cloud changes

**Note**: 
- **Global functions**: `pxt.publish()`, `pxt.replicate()` - operate on sources/destinations
- **Table methods**: `my_table.push()`, `shared_table.pull()` - called on your tables

**General:**
- `pxt.get_table(path)` - Gets a handle to an existing table
- `pxt.list_tables()` - Lists all available tables


In [None]:
%pip install -qU pixeltable

import pixeltable as pxt
from pixeltable import exceptions as excs


## Sharing Your Data

If you have a table you want to share with others, you can publish it to the cloud.

### API Key Configuration

Before you can share data with Pixeltable Cloud, you need to configure your API key. You can do this in two ways:

#### Option 1: Environment Variable (Recommended)
```bash
export PIXELTABLE_API_KEY="your_api_key_here"
```

#### Option 2: Configuration File
Add the following to your Pixeltable `config.toml` file:
```toml
[pixeltable]
api_key = "your_api_key_here"
```

> **Note**: The config file is typically located in your Pixeltable home directory. You can find this location by running `pxt.get_config()` in Python.


### Creating a Sample Table

Let's create a simple table to demonstrate publishing:


In [None]:
# Create a clean environment for this example
pxt.drop_dir('shared_demo', force=True)
pxt.create_dir('shared_demo')

# Create a sample table to work with
schema = {
    'id': pxt.Int,
    'name': pxt.String,
    'value': pxt.Float,
}
sample_table = pxt.create_table('shared_demo.sample_data', schema)

# Add some sample data
sample_table.insert([
    {'id': 1, 'name': 'Alice', 'value': 42.5},
    {'id': 2, 'name': 'Bob', 'value': 37.8},
    {'id': 3, 'name': 'Charlie', 'value': 91.2}
])

print("Created sample table with data:")
sample_table.show()


### Cloud URI Format

Pixeltable uses a specific URI format for cloud tables: `pxt://username/table_name`

**Examples:**
- `pxt://alice/sales_data`
- `pxt://bob/experiments.results`
- `pxt://team/shared.dataset_name`

**Note**: For organization, you can use dot notation like `shared.dataset_name`, but extra directories are optional.


### Access Control Options

When publishing tables, you can control who can access them:

- **`access='private'` (default)**: Only you can access the shared table
- **`access='public'`**: Anyone with the table URI can access and create read-only replicas of the table

```python
# Private table (default)
pxt.publish('my_table', 'pxt://alice/sales_data')

# Public table
pxt.publish('my_table', 'pxt://alice/sales_data', access='public')
```


In [None]:
# Define the cloud URI where you want to publish the table
cloud_uri = "pxt://alice/shared.sample_data"

# Publish local table to cloud (requires API key)
try:
    published_uri = pxt.publish('shared_demo.sample_data', cloud_uri, access='public')
    print(f"Published: {published_uri}")
except excs.Error as e:
    print(f"Error: {e}")
    print("Note: Requires API key and real cloud URI")


## Updating Your Shared Table

After you've published a table, you can continue to make changes locally and push those updates to the cloud.

**Note**: Pushing updates also requires a Pixeltable API key.


In [None]:
# Make changes to your local table
sample_table.insert([{'id': 4, 'name': 'Diana', 'value': 65.3}])
sample_table.show()

# Push updates from local table to cloud (requires API key)
try:
    sample_table.push()  # my_table.push() - pushes from local to cloud
    print("Updated cloud table")
except excs.Error as e:
    print(f"Error: {e}")


In [None]:
# Option 1: Read-Only Replica
cloud_uri = "pxt://alice/shared.sample_data"
local_replica_path = "shared_demo.replicated_data"

try:
    replica_table = pxt.replicate(cloud_uri, local_replica_path)
    print(f"Replicated: {replica_table._path()}")
    replica_table.show()
    
    # Create view on replica (replicas are read-only)
    view = pxt.create_view("shared_demo.replica_view", replica_table, 
                          additional_columns={'computed_value': replica_table.value * 2})
    print("Created view on replica:")
    view.show()
    
except excs.Error as e:
    print(f"Error: {e}")

# You can convert a replica to an independent table at any point
print("\n" + "="*40)
print("Converting replica to independent table:")
try:
    # Create independent table from replica data
    independent_table = pxt.create_table("shared_demo.independent_copy", {'id': pxt.Int, 'name': pxt.String, 'value': pxt.Float})
    independent_table.insert(replica_table.select().collect())
    
    print(f"Created independent table: {independent_table._path()}")
    print("Note: This table can be modified but won't receive upstream updates")
    independent_table.show()
    
except excs.Error as e:
    print(f"Error: {e}")


## Accessing Shared Data

If you want to work with a table that someone else has published, you have two options:

**Note**: Accessing shared data does not require an API key - you can create read-only replicas and work with shared tables directly.



### Option 1: Read-Only Replica
Create a read-only replica that stays connected to the upstream cloud table:
- ✅ Can pull updates from the cloud when data changes
- ✅ Can stay synchronized with upstream changes (manual pull required)
- ❌ Read-only - must create views to work with the data
- Use: `pxt.replicate(cloud_uri, local_path)` - source (cloud), destination (local)

**Note**: You can convert a replica to an independent table at any point using `pxt.create_table()`, but you'll no longer be able to pull updates from the upstream cloud table.

### Option 2: Create Table
Create a regular table from cloud data with no upstream connection:
- ✅ Can work with data directly (insert, update, delete)
- ✅ No upstream dependency
- ❌ No automatic updates - data becomes stale
- Use: `pxt.create_table(local_path, source=cloud_uri)`


In [None]:
# Option 2: Create Table
local_table_path = "shared_demo.local_copy"

try:
    # Create regular table from cloud data
    regular_table = pxt.create_table(local_table_path, {'id': pxt.Int, 'name': pxt.String, 'value': pxt.Float})
    regular_table.insert(sample_table.select().collect())
    
    print(f"Created table: {regular_table._path()}")
    regular_table.show()
    
    # Can modify directly
    regular_table.insert([{'id': 5, 'name': 'Eve', 'value': 88.9}])
    print("After inserting new row:")
    regular_table.show()
    
except excs.Error as e:
    print(f"Error: {e}")


## Keeping Tables Synchronized

Once you have a published table or replica table, you can keep them synchronized with updates.


In [None]:
# Push updates to the cloud
sample_table.insert([{'id': 5, 'name': 'Eve', 'value': 88.9}])
sample_table.show()

try:
    sample_table.push()  # my_table.push() - pushes from local to cloud
    print("Updated cloud table")
except excs.Error as e:
    print(f"Error: {e}")

print("\n" + "="*30)

# Pull updates from cloud to local replica
try:
    replica_table.pull()  # shared_table.pull() - updates local replica from cloud
    print("Updated local replica")
    replica_table.show()
except excs.Error as e:
    print(f"Error: {e}")


## Cloud URI Format (Advanced)

Pixeltable uses a specific URI format for cloud tables: `pxt://username/table_name`

For organization, you can use dot notation: `pxt://username/directory.table_name`


In [None]:
# Examples of valid cloud URIs:
examples = [
    "pxt://alice/sales_data",
    "pxt://bob/experiments.results", 
    "pxt://team/shared.dataset_name"
]

print("Examples of valid cloud URIs:")
for uri in examples:
    print(f"  - {uri}")

print("\nKey points:")
print("- Use dots (.) not slashes (/) for organization")
print("- The format is: pxt://username/table_name")
print("- Directory organization (like 'shared.dataset') is optional")


## Summary

Shared tables in Pixeltable enable collaborative data science workflows:

### Sharing Your Data
- **Publish**: Share your local tables with others using `pxt.publish(local_table, cloud_uri)` (source, destination)
- **Update**: Push local changes to the cloud using `my_table.push()` (method on your table object)

### Accessing Shared Data
**Two approaches available:**

1. **Read-Only Replica**: `pxt.replicate(cloud_uri, local_path)` (source, destination) → Read-only replica with upstream connection
   - ✅ Can pull updates with `shared_table.pull()` (method on your table object)
   - ❌ Read-only - create views to work with data
   
2. **Create Table**: `pxt.create_table(local_path, source=cloud_uri)` → Regular table with no upstream connection
   - ✅ Can modify data directly
   - ❌ No automatic updates - data becomes stale

### Argument Order Pattern
**Always**: `function(source, destination)`
- `pxt.publish(local_table, cloud_uri)` - local table (source) → cloud (destination)
- `pxt.replicate(cloud_uri, local_path)` - cloud table (source) → local (destination)

### Key Points
- Choose **Read-Only Replica** if you want to stay synchronized with upstream changes
- Choose **Create Table** if you want to modify the data independently
- Always wrap operations in try-catch blocks for proper error handling
- Use the cloud URI format: `pxt://username/table_name`

### Important: Local vs Global Operations
- **Global functions** (`pxt.publish()`, `pxt.replicate()`) operate on source/destination paths
- **Table methods** (`my_table.push()`, `shared_table.pull()`) are called on your pixeltable objects
- Remember: `my_table.push()` and `shared_table.pull()` are methods you call on YOUR table objects, not global functions
