# ScyllaDB - Getting Started

This notebook demonstrates basic connection to ScyllaDB and simple operations.

## What You'll Learn
- Connect to ScyllaDB cluster
- Create a keyspace
- Create a table
- Basic CRUD operations

## Prerequisites
- ScyllaDB running locally (use `make up` or `make up-cluster`)
- Python cassandra-driver installed

In [1]:
# Import required libraries
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement
import time
import sys

## 1. Connect to ScyllaDB

First, let's establish a connection to ScyllaDB. By default, ScyllaDB runs on port 9042.

In [2]:
# Configuration
SCYLLA_HOST = "127.0.0.1"
SCYLLA_PORT = 9042

print(f"Connecting to ScyllaDB at {SCYLLA_HOST}:{SCYLLA_PORT}...")

try:
    cluster = Cluster([SCYLLA_HOST], port=SCYLLA_PORT)
    session = cluster.connect()
    print("[OK] Connected successfully!")
    
    # Get cluster information
    metadata = cluster.metadata
    print(f"\nCluster name: {metadata.cluster_name}")
    print(f"Nodes in cluster: {len(metadata.all_hosts())}")
    
    for host in metadata.all_hosts():
        print(f"  - {host.address} (Datacenter: {host.datacenter}, Rack: {host.rack})")
        
except Exception as e:
    print(f"[FAILED] Connection failed: {e}")
    print("\nMake sure ScyllaDB is running:")
    print("  make up          # Start single node")
    print("  make up-cluster  # Start cluster")

Connecting to ScyllaDB at 127.0.0.1:9042...
[OK] Connected successfully!

Cluster name: 
Nodes in cluster: 1
  - 127.0.0.1 (Datacenter: datacenter1, Rack: rack1)


## 2. Create a Keyspace

A keyspace in ScyllaDB is similar to a database in relational databases. It defines the replication strategy for the data.

In [3]:
# Create a keyspace
KEYSPACE = "demo_keyspace"

create_keyspace_query = f"""
    CREATE KEYSPACE IF NOT EXISTS {KEYSPACE}
    WITH replication = {{
        'class': 'SimpleStrategy',
        'replication_factor': 1
    }}
"""

session.execute(create_keyspace_query)
print(f"[OK] Keyspace '{KEYSPACE}' created successfully")

# Set the keyspace for this session
session.set_keyspace(KEYSPACE)
print(f"[OK] Using keyspace '{KEYSPACE}'")

[OK] Keyspace 'demo_keyspace' created successfully
[OK] Using keyspace 'demo_keyspace'


## 3. Create a Table

Let's create a simple users table with a few columns.

In [4]:
# Create a table
create_table_query = """
    CREATE TABLE IF NOT EXISTS users (
        user_id uuid PRIMARY KEY,
        username text,
        email text,
        age int,
        created_at timestamp
    )
"""

session.execute(create_table_query)
print("[OK] Table 'users' created successfully")

# Show table schema
result = session.execute(f"DESCRIBE TABLE {KEYSPACE}.users")
print("\nTable Schema:")
for row in result:
    print(row)

[OK] Table 'users' created successfully

Table Schema:
Row(keyspace_name='demo_keyspace', type='table', name='users', create_statement="CREATE TABLE demo_keyspace.users (\n    user_id uuid,\n    age int,\n    created_at timestamp,\n    email text,\n    username text,\n    PRIMARY KEY (user_id)\n) WITH bloom_filter_fp_chance = 0.01\n    AND caching = {'keys': 'ALL','rows_per_partition': 'ALL'}\n    AND comment = ''\n    AND compaction = {'class': 'SizeTieredCompactionStrategy'}\n    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}\n    AND crc_check_chance = 1\n    AND dclocal_read_repair_chance = 0\n    AND default_time_to_live = 0\n    AND gc_grace_seconds = 864000\n    AND max_index_interval = 2048\n    AND memtable_flush_period_in_ms = 0\n    AND min_index_interval = 128\n    AND read_repair_chance = 0\n    AND speculative_retry = '99.0PERCENTILE';\n")


## 4. Insert Data (CREATE)

Let's insert some sample user records.

In [5]:
import uuid
from datetime import datetime

# Insert sample users
users_data = [
    {"username": "alice", "email": "alice@example.com", "age": 28},
    {"username": "bob", "email": "bob@example.com", "age": 35},
    {"username": "charlie", "email": "charlie@example.com", "age": 42},
    {"username": "diana", "email": "diana@example.com", "age": 31},
]

insert_query = """
    INSERT INTO users (user_id, username, email, age, created_at)
    VALUES (?, ?, ?, ?, toTimestamp(now()))
"""

prepared_stmt = session.prepare(insert_query)

user_ids = []
for user in users_data:
    user_id = uuid.uuid4()
    user_ids.append((user_id, user["username"]))
    
    session.execute(prepared_stmt, (
        user_id,
        user["username"],
        user["email"],
        user["age"]
    ))
    
    print(f"[OK] Inserted user: {user['username']} (ID: {user_id})")

print(f"\n[OK] {len(users_data)} users inserted successfully")

[OK] Inserted user: alice (ID: 3a9a767a-a630-4eea-be66-1193a249b4c3)
[OK] Inserted user: bob (ID: 824442b7-9325-4d11-b68d-6833fc5f8f9a)
[OK] Inserted user: charlie (ID: 8a626bbc-a814-4fda-92f3-22576292333b)
[OK] Inserted user: diana (ID: 07d5de82-aa1e-4629-af76-7b8de76f8059)

[OK] 4 users inserted successfully


## 5. Read Data (READ)

Query the data we just inserted.

In [6]:
# Query all users
query = "SELECT user_id, username, email, age, created_at FROM users"
rows = session.execute(query)

print("All Users:")
print("-" * 80)
print(f"{'Username':<15} {'Email':<25} {'Age':<5} {'User ID':<36}")
print("-" * 80)

for row in rows:
    print(f"{row.username:<15} {row.email:<25} {row.age:<5} {str(row.user_id):<36}")

print("-" * 80)

All Users:
--------------------------------------------------------------------------------
Username        Email                     Age   User ID                             
--------------------------------------------------------------------------------
charlie         charlie@example.com       42    b0d514dd-99d0-440f-a130-6acf28c2f833
bob             bob@example.com           35    824442b7-9325-4d11-b68d-6833fc5f8f9a
alice           alice@example.com         30    ae65754d-6851-47f7-98ac-ceab6badbbe5
bob             bob@example.com           35    45c66e65-0fc5-44cb-9a1a-77b4d93001a3
charlie         charlie@example.com       42    8a626bbc-a814-4fda-92f3-22576292333b
alice           alice@example.com         28    3a9a767a-a630-4eea-be66-1193a249b4c3
diana           diana@example.com         31    07d5de82-aa1e-4629-af76-7b8de76f8059
--------------------------------------------------------------------------------


## 6. Read Specific Record

Query a specific user by their ID.

In [7]:
# Query a specific user by ID
if user_ids:
    user_id, username = user_ids[0]
    
    query = "SELECT * FROM users WHERE user_id = ?"
    prepared = session.prepare(query)
    result = session.execute(prepared, (user_id,))
    
    print(f"Querying user: {username} (ID: {user_id})")
    print("-" * 80)
    
    for row in result:
        print(f"Username:   {row.username}")
        print(f"Email:      {row.email}")
        print(f"Age:        {row.age}")
        print(f"Created:    {row.created_at}")
    
    print("-" * 80)

Querying user: alice (ID: 3a9a767a-a630-4eea-be66-1193a249b4c3)
--------------------------------------------------------------------------------
Username:   alice
Email:      alice@example.com
Age:        28
Created:    2026-01-02 19:50:33.864000
--------------------------------------------------------------------------------


## 7. Update Data (UPDATE)

Update an existing user's information.

In [8]:
user_ids

[(UUID('3a9a767a-a630-4eea-be66-1193a249b4c3'), 'alice'),
 (UUID('824442b7-9325-4d11-b68d-6833fc5f8f9a'), 'bob'),
 (UUID('8a626bbc-a814-4fda-92f3-22576292333b'), 'charlie'),
 (UUID('07d5de82-aa1e-4629-af76-7b8de76f8059'), 'diana')]

In [9]:
# Update a user's age
if user_ids:
    user_id, username = user_ids[0]
    new_age = 30

    print(f"new_age = {new_age}, type(new_age) = {type(new_age)}")
    print(f"user_id = {user_id}, type(user_id) = {type(user_id)}")

    update_query = "UPDATE users SET age = ? WHERE user_id = ?"
    prepared_update_query = session.prepare(update_query)
    session.execute(prepared_update_query, (new_age, user_id,))
    
    print(f"[OK] Updated {username}'s age to {new_age}")
    
    # Verify the update
    query = "SELECT username, age FROM users WHERE user_id = ?"
    prepared = session.prepare(query)
    result = session.execute(prepared, (user_id,))
    for row in result:
        print(f"  Verified: {row.username} is now {row.age} years old")

new_age = 30, type(new_age) = <class 'int'>
user_id = 3a9a767a-a630-4eea-be66-1193a249b4c3, type(user_id) = <class 'uuid.UUID'>
[OK] Updated alice's age to 30
  Verified: alice is now 30 years old


## 8. Delete Data (DELETE)

Delete a user record.

In [10]:
# Delete a user
if len(user_ids) > 1:
    user_id, username = user_ids[-1]
    
    delete_query = "DELETE FROM users WHERE user_id = ?"
    prepared = session.prepare(delete_query)
    session.execute(prepared, (user_id,))
    
    print(f"[OK] Deleted user: {username}")
    
    # Verify the deletion
    count = session.execute("SELECT COUNT(*) FROM users").one()[0]
    print(f"  Remaining users: {count}")

[OK] Deleted user: diana
  Remaining users: 6


## 9. Cleanup (Optional)

Uncomment the following cell to clean up the test data.

In [11]:
# Uncomment to drop the keyspace
# session.execute(f"DROP KEYSPACE IF EXISTS {KEYSPACE}")
# print(f"[OK] Keyspace '{KEYSPACE}' dropped")

# Close the connection
cluster.shutdown()
print("[OK] Connection closed")

[OK] Connection closed


## Summary

In this notebook, you learned:
- [OK] How to connect to ScyllaDB
- [OK] How to create keyspaces and tables
- [OK] How to perform CRUD operations (Create, Read, Update, Delete)
- [OK] How to use prepared statements for better performance

## Next Steps

Check out the other notebooks:
- `02-advanced-queries.ipynb` - Learn about advanced queries and indexing
- `03-data-modeling.ipynb` - Best practices for data modeling in ScyllaDB