e6data Python Connector

Introduction

The e6data Connector for Python provides an interface for writing Python applications that can connect to e6data and perform operations. It includes automatic support for blue-green deployments, ensuring seamless failover during server updates without query interruption.

Dependencies

Make sure to install below dependencies and wheel before install e6data-python-connector.

# Amazon Linux / CentOS dependencies
yum install python3-devel gcc-c++ -y

# Ubuntu/Debian dependencies
apt install python3-dev g++ -y


# Pip dependencies
pip install wheel

To install the Python package, use the command below:

pip install --no-cache-dir e6data-python-connector

Prerequisites

Open Inbound Port 80 in the Engine Cluster.
Limit access to Port 80 according to your organizational security policy. Public access is not encouraged.
Access Token generated in the e6data console.

Create a Connection

Use your e6data Email ID as the username and your access token as the password.

from e6data_python_connector import Connection

username = '<username>'  # Your e6data Email ID.
password = '<password>'  # Access Token generated in the e6data console.

host = '<host>'  # IP address or hostname of the cluster to be used.
database = '<database>'  # Database to perform the query on.
port = 80  # Port of the e6data engine.
catalog_name = '<catalog_name>'

conn = Connection(
    host=host,
    port=port,
    username=username,
    database=database,
    password=password
)

Connection Parameters

The Connection class supports the following parameters:

Parameter	Type	Required	Default	Description
`host`	str	Yes	-	IP address or hostname of the e6data cluster
`port`	int	Yes	-	Port of the e6data engine (typically 80)
`username`	str	Yes	-	Your e6data Email ID
`password`	str	Yes	-	Access Token generated in the e6data console
`database`	str	No	None	Database to perform queries on
`catalog`	str	No	None	Catalog name
`cluster_name`	str	No	None	Name of the cluster for cluster-specific operations
`secure`	bool	No	False	Enable SSL/TLS for secure connections
`auto_resume`	bool	No	True	Automatically resume cluster if suspended
`grpc_options`	dict	No	None	Additional gRPC configuration options

Secure Connection Example

To establish a secure connection using SSL/TLS:

conn = Connection(
    host=host,
    port=443,  # Typically 443 for secure connections
    username=username,
    password=password,
    database=database,
    cluster_name='production-cluster',
    secure=True  # Enable SSL/TLS
)

Cluster-Specific Connection

When working with multiple clusters, specify the cluster name:

conn = Connection(
    host=host,
    port=port,
    username=username,
    password=password,
    database=database,
    cluster_name='analytics-cluster-01',  # Specify cluster name
    secure=True
)

Perform a Queries & Get Results

query = 'SELECT * FROM <TABLE_NAME>'  # Replace with the query.

cursor = conn.cursor(catalog_name=catalog_name)
query_id = cursor.execute(query)  # The execute function returns a unique query ID, which can be use to abort the query.
all_records = cursor.fetchall()
for row in all_records:
   print(row)

To fetch all the records:

records = cursor.fetchall()

To fetch one record:

record = cursor.fetchone()

To fetch limited records:

limit = 500
records = cursor.fetchmany(limit)

To fetch all the records in buffer to reduce memory consumption:

records_iterator = cursor.fetchall_buffer()  # Returns generator
for item in records_iterator:
    print(item)

To get the execution plan after query execution:

import json
explain_response = cursor.explain_analyse()
query_planner = json.loads(explain_response.get('planner'))

To abort a running query:

query_id = '<query_id>'  # query id from execute function response.
cursor.cancel(query_id)

Switch database in an existing connection:

database = '<new_database_name>'  # Replace with the new database.
cursor = conn.cursor(database, catalog_name)

Get Query Time Metrics

import json
query = 'SELECT * FROM <TABLE_NAME>'

cursor = conn.cursor(catalog_name)
query_id = cursor.execute(query)  # execute function returns query id, can be use for aborting the query.
all_records = cursor.fetchall()
explain_response = cursor.explain_analyse()
query_planner = json.loads(explain_response.get('planner'))

execution_time = query_planner.get("total_query_time")  # In milliseconds
queue_time = query_planner.get("executionQueueingTime")  # In milliseconds
parsing_time = query_planner.get("parsingTime")  # In milliseconds
row_count = query_planner.rowcount

Get Schema - a list of Databases, Tables or Columns

The following code returns a dictionary of all databases, all tables and all columns connected to the cluster currently in use. This function can be used without passing database name to get list of all databases.

databases = conn.get_schema_names()  # To get list of databases.
print(databases)

database = '<database_name>'  # Replace with actual database name.
tables = conn.get_tables(database=database)  # To get list of tables from a database.
print(tables)

table_name = '<table_name>'  # Replace with actual table name.
columns = conn.get_tables(database=database, table=table_name)  # To get the list of columns from a table.
columns_with_type = list()
"""
Getting the column name and type.
"""
for column in columns:
   columns_with_type.append(dict(column_name=column.fieldName, column_type=column.fieldType))
print(columns_with_type)

Code Hygiene

It is recommended to clear the cursor, close the cursor and close the connection after running a function as a best practice. This enhances performance by clearing old data from memory.

cursor.clear() # Not needed when aborting a query
cursor.close()
conn.close()

Code Example

The following code is an example which combines a few functions described above.

from e6data_python_connector import Connection
import json

username = '<username>'  # Your e6data Email ID.
password = '<password>'  # Access Token generated in the e6data console.

host = '<host>'  # IP address or hostname of the cluster to be used.
database = '<database>'  # # Database to perform the query on.
port = 80  # Port of the e6data engine.

sql_query = 'SELECT * FROM <TABLE_NAME>'  # Replace with the actual query.

catalog_name = '<catalog_name>'  # Replace with the actual catalog name.

conn = Connection(
    host=host,
    port=port,
    username=username,
    database=database,
    password=password
)

cursor = conn.cursor(db_name=database, catalog_name=catalog_name)
query_id = cursor.execute(sql_query)
all_records = cursor.fetchall()
explain_response = cursor.explain_analyse()
planner_result = json.loads(explain_response.get('planner'))
execution_time = planner_result.get("total_query_time") / 1000  # Converting into seconds.
row_count = cursor.rowcount
columns = [col[0] for col in cursor.description]  # Get the column names and merge them with the results.
results = []
for row in all_records:
   row = dict(zip(columns, row))
   results.append(row)
   print(row)
print('Total row count {}, Execution Time (seconds): {}'.format(row_count, execution_time))
cursor.clear()
cursor.close()
conn.close()

Zero Downtime Deployment

🚀 Zero Downtime Features

The e6data Python Connector provides automatic zero downtime deployment support through intelligent blue-green deployment strategy management:

✅ No Code Changes Required

Your existing applications automatically benefit from zero downtime deployment without any modifications:

# Your existing code works exactly the same
from e6data_python_connector import Connection

conn = Connection(
    host='your-host',
    port=80,
    username='your-email',
    password='your-token',
    database='your-database'
)

cursor = conn.cursor()
cursor.execute("SELECT * FROM your_table")
results = cursor.fetchall()

🔄 Automatic Strategy Detection

Detects active deployment strategy (blue/green) on connection
Caches strategy information for optimal performance
Automatically switches strategies when deployments occur

🛡️ Seamless Query Protection

Running queries continue uninterrupted during deployments
New queries automatically use the new deployment strategy
Graceful transitions ensure no query loss or failures

⚡ Performance Optimized

< 100ms additional latency on first connection (one-time cost)
0ms overhead for 95% of queries (cached strategy)
< 1KB additional memory usage per connection

🔧 Thread & Process Safe

Full support for multi-threaded applications
Process-safe shared memory management
Concurrent query execution without conflicts

Advanced Configuration (Optional)

For enhanced monitoring and performance tuning:

# Enhanced gRPC configuration for zero downtime
grpc_options = {
    'keepalive_timeout_ms': 60000,      # 1 minute keepalive timeout
    'keepalive_time_ms': 30000,         # 30 seconds keepalive interval
    'max_receive_message_length': 100 * 1024 * 1024,  # 100MB
    'max_send_message_length': 100 * 1024 * 1024,     # 100MB
}

conn = Connection(
    host='your-host',
    port=80,
    username='your-email',
    password='your-token',
    database='your-database',
    grpc_options=grpc_options
)

Environment Configuration

Configure zero downtime features using environment variables:

# Strategy cache timeout (default: 300 seconds)
export E6DATA_STRATEGY_CACHE_TIMEOUT=300

# Maximum retry attempts (default: 5)
export E6DATA_MAX_RETRY_ATTEMPTS=5

# Enable debug logging for strategy operations
export E6DATA_STRATEGY_LOG_LEVEL=INFO

Testing Zero Downtime

Use the included mock server for testing and development:

# Terminal 1: Start mock server
python mock_grpc_server.py

# Terminal 2: Run test client
python test_mock_server.py

# Or use the convenience script
./run_mock_test.sh

📚 Comprehensive Documentation

Explore detailed documentation in the docs/zero-downtime/ directory:

📋 Overview - Complete guide and feature overview
🔧 API Reference - Detailed API documentation
🌊 Flow Documentation - Process flows and diagrams
💼 Business Logic - Business rules and decisions
🏗️ Architecture - System architecture and design
⚙️ Configuration - Complete configuration guide
🧪 Testing - Testing strategies and tools
🔍 Troubleshooting - Common issues and solutions
🚀 Migration Guide - Step-by-step migration instructions

Key Benefits

Feature	Benefit
Zero Downtime	Applications continue running during e6data deployments
Automatic	No code changes or manual intervention required
Reliable	Robust error handling and automatic recovery
Fast	Minimal performance impact with intelligent caching
Safe	Thread-safe and process-safe operation
Monitored	Comprehensive logging and monitoring capabilities

Migration

Existing applications automatically benefit from zero downtime deployment:

Update connector: pip install --upgrade e6data-python-connector
No code changes: Your existing code works without modifications
Monitor: Use enhanced logging to monitor strategy transitions
Validate: Test with your existing applications

For detailed migration instructions, see the Migration Guide.

Performance Optimization

Memory Efficiency

Use fetchall_buffer() for memory-efficient large result sets
Automatic cleanup of query-strategy mappings
Bounded memory usage with TTL-based caching

Network Performance

Configure gRPC options for optimal network performance
Intelligent keepalive settings for connection stability
Message size optimization for large queries

Connection Management

Enable connection pooling for better resource utilization
Automatic connection health monitoring
Graceful connection recovery and retry logic

See TECH_DOC.md for detailed technical documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
docs/zero-downtime		docs/zero-downtime
e6data_python_connector		e6data_python_connector
test		test
.gitignore		.gitignore
BLUE_GREEN_STRATEGY.md		BLUE_GREEN_STRATEGY.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MOCK_SERVER_README.md		MOCK_SERVER_README.md
NOTICE		NOTICE
README.md		README.md
TECH_DOC.md		TECH_DOC.md
cluster.proto		cluster.proto
e6x_engine.proto		e6x_engine.proto
e6x_vector.thrift		e6x_vector.thrift
mock_grpc_server.py		mock_grpc_server.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_mock_test.sh		run_mock_test.sh
server.thrift		server.thrift
setup.cfg		setup.cfg
setup.py		setup.py

License

e6data/e6data-python-connector

Folders and files

Latest commit

History

Repository files navigation

e6data Python Connector

Introduction

Dependencies

Prerequisites

Create a Connection

Connection Parameters

Secure Connection Example

Cluster-Specific Connection

Perform a Queries & Get Results

Get Query Time Metrics

Get Schema - a list of Databases, Tables or Columns

Code Hygiene

Code Example

Zero Downtime Deployment

🚀 Zero Downtime Features

✅ No Code Changes Required

🔄 Automatic Strategy Detection

🛡️ Seamless Query Protection

⚡ Performance Optimized

🔧 Thread & Process Safe

Advanced Configuration (Optional)

Environment Configuration

Testing Zero Downtime

📚 Comprehensive Documentation

Key Benefits

Migration

Performance Optimization

Memory Efficiency

Network Performance

Connection Management

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 35

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages