# SDK Reference Table - `table()` - write

Ocean Data Platform offers both API and Python SDK interfaces. This notebook highlights the Python SDK.

| Interface | API | SDK | 
| ---------- | ---------- | ---------- |
| Catalog | STAC and OGC | Coming (Python and R) |
| Files | OGC-Read only | Python, Coming (R) |
| Table | OGC-Read only | Python, Coming (R) |
| Grid | Coming (Python and R) | Coming (Python and R) |

## Installation

```bash
pip install -U odp-sdk
```

## Client Initialization

In [1]:
from odp.client import Client

In [2]:
import pyarrow as pa

In [3]:
# Auto authentication which opens browser to performance authentication process (not in our Workspaces)
client = Client()

In [5]:
# API Key authentication (don't need to open browser).
# You can generate an API key in the Ocean Data Platform web interface, under your user profile.
client = Client(api_key="your-api-key")

## Dataset Access

With an initialized `Client` you can access different datasets by using the datasets' UUID (click API in the page of the dataset you have created).

## Get Dataset

In [4]:
# Get dataset
dataset = client.dataset("7458cf09-bcf4-4a08-8cd0-f935b41ede58") # Exchange this UUID to your own UUID

The `dataset` from this UUID will be used in the examples below.

## Create and work with schema and table

### schema()

A key concept of the table is the schema. 

PyArrow Schema is:
- A special PyArrow object (pyarrow.lib.Schema)
- Created FROM a list of PyArrow Field objects
- Behaves LIKE a list (indexing, iteration)
- Immutable (can't be modified after creation, but the `alter()` handles this through creating a new schema and reingests the data, see below)

There is a comprehensive set of data types in PyArrow: https://arrow.apache.org/docs/python/api/datatypes.html

In [17]:
# Create a schema
schema = pa.schema([
    ('id', pa.int64()),
    ('name', pa.string()),
    ('size', pa.int64()),
    ('sensitive', pa.bool_())
])

### create()

In order to create a table you need to have a dataset. There can only be one table so if you are trying to create a table when there is already one, you will get an error.

In [18]:
dataset.table.create(schema)

In [19]:
print(dataset.table.schema())

id: int64
name: string
size: int64
sensitive: bool


In [68]:
dataset.table.stats()

{"num_rows": 0, "size": 0}

### alter()

You can do different alterations of the schema.:
- Add a column
- Drop a column
- Change name of a column
- Change type of a column

All changes are done by creating a new schema.

In [11]:
# Get current schema to reference
old_schema = dataset.table.schema()

# Build entirely new schema from scratch
new_schema = pa.schema([
    # Copy existing fields
    old_schema.field("id"),
    
    # Add new field
    pa.field("place", pa.string()),
    
    # Copy more fields
    old_schema.field("name"),
    
    # Modify a field (change type)
    pa.field("size", pa.float64()),  # Changed from int64

    # Drop a field (not adding the field 'sensitive' to the new schema)
    
])

In [12]:
# Apply new schema
dataset.table.alter(schema=new_schema)

{}

In [13]:
print(dataset.table.schema())

id: int64
place: string
name: string
size: double


### drop()

This drops the table data and schema. It is **irreversible**.

In [14]:
dataset.table.drop()

In [15]:
print(dataset.table.schema())

None


## Working with rows in table

You insert or modify data using transactions. Key points:

- Use `with` statement - automatically handles commit/rollback
- `tx.insert(list_of_dicts)` - insert rows as list of dictionaries
- `tx.insert_batch(record_batch)` - insert PyArrow RecordBatch
- Auto-commit - on successful exit from `with` block
- Auto-rollback - if exception occurs
- All or nothing - entire transaction succeeds or fails together

Summary: Always use transactions (`with dataset.table as tx:`) to insert rows. This ensures data integrity and automatic rollback on errors!

### tx.insert()

In [20]:
# insert a row
with dataset.table as tx:
    tx.insert([
        {"id": 123, "place": "Sea", "name": "Kattegat", "size": 30000},
        {"id": 456, "place": "Sea", "name": "Skagerakk", "size": 32000},
        {"id": 789, "place": "Sea", "name": "Baltic", "size": 400000}
    ])

In [21]:
df = dataset.table.select().all().dataframe()

In [22]:
df.head()

Unnamed: 0,id,name,size,sensitive
0,123,Kattegat,30000,
1,456,Skagerakk,32000,
2,789,Baltic,400000,


### tx.replace()

How the replace method works:
- Finds rows matching your query
- Removes them from the table
- Returns them to you (as an iterator)
- You modify them
- You re-insert them (or skip to delete)

** Note that the filter can return multiple rows and applies to all the rows. **

In [23]:
# replace a row

with dataset.table as tx:
    for row in tx.replace(filter="name == 'Baltic'").rows():
        # Row is removed from table at this point
        
        # Option 1: Modify and re-insert (UPDATE)
        row['size'] = 415000
        tx.insert([row])
        
        # Option 2: Don't re-insert (DELETE)
        # Just skip it
        
        # Option 3: Insert multiple rows
        #tx.insert([row, modified_row, another_row])'''

In [26]:
df = dataset.table.select().all().dataframe()
df.head()

Unnamed: 0,id,name,size,sensitive
0,789,Baltic,415000,
1,123,Kattegat,30000,
2,456,Skagerakk,32000,


Since the changes are applied to all rows returned, you must be careful when using.

Tips

- Replace() returns an iterator - you can loop through matching rows
- Query affects ALL matches - if 100 rows match, 100 rows are updated
- Check count first - use select() to preview
- Use specific queries - avoid accidentally updating too many rows
- Batch large updates - process in smaller transactions
- Empty query = ALL rows - be very careful!