# Upstream SDK CKAN Integration Demo

This notebook demonstrates the CKAN integration capabilities of the Upstream SDK for publishing environmental monitoring data to CKAN data portals.

## Overview

The Upstream SDK provides seamless integration with CKAN (Comprehensive Knowledge Archive Network) data portals for:
- 📊 **Dataset Publishing**: Automatically create CKAN datasets from campaign data
- 📁 **Resource Management**: Upload sensor configurations and measurement data as resources
- 🏢 **Organization Support**: Publish data under specific CKAN organizations
- 🔄 **Update Management**: Update existing datasets with new data
- 🏷️ **Metadata Integration**: Rich metadata tagging and categorization

## Features Demonstrated

- CKAN client setup and configuration
- Campaign data export and preparation
- Dataset creation with comprehensive metadata
- Resource management (sensors and measurements)
- Organization and permission handling
- Error handling and validation

## Prerequisites

- Valid Upstream account credentials
- Access to a CKAN portal with API credentials
- Existing campaign data (or run UpstreamSDK_Core_Demo.ipynb first)
- Python 3.7+ environment with required packages

## Related Notebooks

- **UpstreamSDK_Core_Demo.ipynb**: Core SDK functionality and campaign creation

## Installation and Setup

In [None]:
# Install required packages
!pip install upstream-sdk
!pip install -e .
# Import required libraries
import os
import json
import getpass
from pathlib import Path
from datetime import datetime
from typing import Dict, Any, Optional, List
from io import BytesIO

# Import Upstream SDK modules
from upstream.client import UpstreamClient
from upstream.ckan import CKANIntegration

## 1. Configuration and Authentication

First, let's set up authentication for both Upstream and CKAN platforms.

**Configuration Options:**
- **Upstream API**: Username/password authentication
- **CKAN Portal**: API key or access token authentication
- **Organization**: CKAN organization for dataset publishing

In [None]:
# Configuration
UPSTREAM_BASE_URL = "https://upstream-dso.tacc.utexas.edu/dev"
# For local development, uncomment the line below:
UPSTREAM_BASE_URL = 'http://localhost:8000'

# CKAN Configuration - Update these for your CKAN portal
CKAN_URL = "https://ckan.tacc.utexas.edu"   # Replace with your CKAN portal URL
CKAN_ORGANIZATION = "setx-uifl"      # Replace with your organization name

#For local development, uncomment the line below:
CKAN_URL = 'http://ckan.tacc.cloud:5000'
CKAN_ORGANIZATION = 'org'

print("🔧 Configuration Settings:")
print(f"   Upstream API: {UPSTREAM_BASE_URL}")
print(f"   CKAN Portal: {CKAN_URL}")
print(f"   CKAN Organization: {CKAN_ORGANIZATION}")

In [None]:
# Get Upstream credentials
print("🔐 Please enter your TACC credentials:")
upstream_username = input("Tacc Username: ")
upstream_password = getpass.getpass("Upstream Password: ")

# Get CKAN credentials (optional - for read-only operations)
print("\n🔑 CKAN API credentials (optional for demo):")
ckan_api_key = getpass.getpass("CKAN API Key (press Enter to skip): ")

# Prepare CKAN configuration
ckan_config = {
    "timeout": 30
}

if ckan_api_key:
    ckan_config["api_key"] = ckan_api_key
    print("✅ CKAN API key configured")
else:
    print("ℹ️  Running in read-only CKAN mode")

In [None]:
!pip install -e .

In [None]:
# Initialize Upstream client with CKAN integration
try:
    client = UpstreamClient(
        username=upstream_username,
        password=upstream_password,
        base_url=UPSTREAM_BASE_URL,
        ckan_url=CKAN_URL,
        ckan_organization=CKAN_ORGANIZATION,
        **ckan_config
    )
    print('✅ Upstream client initialized')

    # Test Upstream authentication
    if client.authenticate():
        print("✅ Upstream authentication successful!")
        print(f"🔗 Connected to: {UPSTREAM_BASE_URL}")

        # Check CKAN integration
        if client.ckan:
            print("✅ CKAN integration enabled!")
            print(f"🔗 CKAN Portal: {CKAN_URL}")
        else:
            print("⚠️  CKAN integration not configured")
    else:
        print("❌ Upstream authentication failed!")
        raise Exception("Upstream authentication failed")

except Exception as e:
    print(f"❌ Setup error: {e}")
    raise

## 2. Campaign Selection and Data Preparation

Let's select an existing campaign with data to publish to CKAN. If you don't have existing data, run the core demo notebook first.

In [None]:
# List available campaigns
print("📋 Available campaigns for CKAN publishing:")
try:
    campaigns = client.list_campaigns(limit=10)

    if campaigns.total == 0:
        print("❌ No campaigns found. Please run UpstreamSDK_Core_Demo.ipynb first to create sample data.")
        raise Exception("No campaigns available")

    print(f"Found {campaigns.total} campaigns:")
    for i, campaign in enumerate(campaigns.items[:5]):
        print(f"  {i+1}. ID: {campaign.id} - {campaign.name}")
        print(f"     Description: {campaign.description[:80]}...")
        print(f"     Contact: {campaign.contact_name} ({campaign.contact_email})")
        print()

    # Select campaign (use the first one or let user choose)
    selected_campaign = campaigns.items[0]
    campaign_id = selected_campaign.id

    print(f"📊 Selected campaign for CKAN publishing:")
    print(f"   ID: {campaign_id}")
    print(f"   Name: {selected_campaign.name}")

except Exception as e:
    print(f"❌ Error listing campaigns: {e}")
    raise

In [None]:
# Get stations for the selected campaign
print(f"📍 Finding stations in campaign {campaign_id}...")
try:
    stations = client.list_stations(campaign_id=str(campaign_id))

    if stations.total == 0:
        print("❌ No stations found in this campaign. Please create stations and upload data first.")
        raise Exception("No stations available")

    print(f"Found {stations.total} stations:")
    for station in stations.items:
        print(f"  • ID: {station.id} - {station.name}")
        print(f"    Description: {station.description[:80]}...")
        print()

    # Select the first station
    selected_station = stations.items[0]
    station_id = selected_station.id

    print(f"📡 Selected station for CKAN publishing:")
    print(f"   ID: {station_id}")
    print(f"   Name: {selected_station.name}")

except Exception as e:
    print(f"❌ Error listing stations: {e}")
    raise

In [None]:
# Check for existing data in the station
print(f"🔍 Checking data availability for station {station_id}...")
try:
    # List sensors to verify data exists
    sensors = client.sensors.list(campaign_id=campaign_id, station_id=station_id)

    if not sensors.items:
        print("❌ No sensors found in this station. Please upload sensor data first.")
        raise Exception("No sensor data available")
    print(sensors.items)
    total_measurements = 0
    for sensor in sensors.items:
        if sensor.statistics:
            total_measurements += sensor.statistics.count
    print(total_measurements)

    print(f"✅ Data validation successful:")
    print(f"   • Sensors: {len(sensors.items)}")
    print(f"   • Total measurements: {total_measurements}")
    print(f"   • Sensor types: {', '.join([s.variablename for s in sensors.items[:3]])}{'...' if len(sensors.items) > 3 else ''}")

    if total_measurements == 0:
        print("⚠️  Warning: No measurement data found. CKAN publishing will include sensor configuration only.")
    else:
        print("✅ Ready for CKAN publishing with full dataset!")

except Exception as e:
    print(f"❌ Error checking data availability: {e}")
    raise

## 3. CKAN Portal Exploration

Before publishing, let's explore the CKAN portal to understand its structure and existing datasets.

In [None]:
# Initialize standalone CKAN client for exploration
if client.ckan:
    ckan = client.ckan
else:
    # Create standalone CKAN client for exploration
    ckan = CKANIntegration(ckan_url=CKAN_URL, config=ckan_config)

print(f"🌐 Exploring CKAN portal: {CKAN_URL}")

In [None]:
# List existing organizations
print("🏢 Available CKAN organizations:")
try:
    organizations = ckan.list_organizations()

    if organizations:
        print(f"Found {len(organizations)} organizations:")
        for org in organizations[:5]:  # Show first 5
            print(f"  • {org['name']}: {org['title']}")
            print(f"    Description: {(org.get('description') or 'No description')[:60]}...")
            print(f"    Packages: {org.get('package_count', 0)}")
            print()

        # Check if our target organization exists
        org_names = [org['name'] for org in organizations]
        if CKAN_ORGANIZATION in org_names:
            print(f"✅ Target organization '{CKAN_ORGANIZATION}' found!")
        else:
            print(f"⚠️  Target organization '{CKAN_ORGANIZATION}' not found.")
            print("   Publishing will use test dataset mode.")
    else:
        print("No organizations found or access restricted.")

except Exception as e:
    print(f"⚠️  Could not list organizations: {e}")
    print("Continuing with dataset publishing...")

In [None]:
# Search for existing Upstream datasets
print("🔍 Searching for existing Upstream datasets in CKAN:")
try:
    upstream_datasets = ckan.list_datasets(
        tags=["upstream", "environmental"],
        limit=10
    )

    if upstream_datasets:
        print(f"Found {len(upstream_datasets)} Upstream-related datasets:")
        for dataset in upstream_datasets[:3]:  # Show first 3
            print(f"  • {dataset['name']}: {dataset['title']}")
            print(f"    Notes: {(dataset.get('notes') or 'No description')[:80]}...")
            print(f"    Resources: {len(dataset.get('resources', []))}")
            print(f"    Tags: {', '.join([tag['name'] for tag in dataset.get('tags', [])])}")
            print()
    else:
        print("No existing Upstream datasets found.")
        print("This will be the first Upstream dataset in this portal!")

except Exception as e:
    print(f"⚠️  Could not search datasets: {e}")
    print("Proceeding with dataset creation...")

## 4. Data Export and Preparation

Before publishing to CKAN, let's export the campaign data and examine its structure.

In [None]:
# Get detailed campaign information
print(f"📊 Retrieving detailed campaign information...")
try:
    campaign_details = client.get_campaign(str(campaign_id))

    print(f"✅ Campaign Details Retrieved:")
    print(f"   Name: {campaign_details.name}")
    print(f"   Description: {campaign_details.description}")
    print(f"   Contact: {campaign_details.contact_name} ({campaign_details.contact_email})")
    print(f"   Allocation: {campaign_details.allocation}")
    print(f"   Start Date: {campaign_details.start_date}")
    print(f"   End Date: {campaign_details.end_date}")

    # Check campaign summary if available
    if hasattr(campaign_details, 'summary') and campaign_details.summary:
        summary = campaign_details.summary
        print(f"\n📈 Campaign Summary:")
        if hasattr(summary, 'total_stations'):
            print(f"   • Total Stations: {summary.total_stations}")
        if hasattr(summary, 'total_sensors'):
            print(f"   • Total Sensors: {summary.total_sensors}")
        if hasattr(summary, 'total_measurements'):
            print(f"   • Total Measurements: {summary.total_measurements}")
        if hasattr(summary, 'sensor_types'):
            print(f"   • Sensor Types: {', '.join(summary.sensor_types)}")

except Exception as e:
    print(f"❌ Error retrieving campaign details: {e}")
    raise

In [None]:
# Export station data for CKAN publishing
print(f"📤 Exporting station data for CKAN publishing...")
try:
    # Export sensor configuration
    print("   Exporting sensor configuration...")
    station_sensors_data = client.stations.export_station_sensors(
        station_id=str(station_id),
        campaign_id=str(campaign_id)
    )

    # Export measurement data
    print("   Exporting measurement data...")
    station_measurements_data = client.stations.export_station_measurements(
        station_id=str(station_id),
        campaign_id=str(campaign_id)
    )

    # Check exported data sizes
    sensors_size = len(station_sensors_data.getvalue()) if hasattr(station_sensors_data, 'getvalue') else 0
    measurements_size = len(station_measurements_data.getvalue()) if hasattr(station_measurements_data, 'getvalue') else 0

    print(f"✅ Data export completed:")
    print(f"   • Sensors data: {sensors_size:,} bytes")
    print(f"   • Measurements data: {measurements_size:,} bytes")
    print(f"   • Total data size: {(sensors_size + measurements_size):,} bytes")

    if sensors_size == 0:
        print("⚠️  Warning: Sensors data is empty")
    if measurements_size == 0:
        print("⚠️  Warning: Measurements data is empty")

    print("✅ Ready for CKAN publication!")

except Exception as e:
    print(f"❌ Error exporting station data: {e}")
    raise

## 5. CKAN Dataset Creation and Publishing

Now let's publish the campaign data to CKAN using the integrated publishing functionality.

In [None]:
# Prepare dataset metadata
dataset_name = f"upstream-campaign-{campaign_id}"
print(f"🏷️  Preparing dataset metadata for: {dataset_name}")

# Create comprehensive metadata
dataset_metadata = {
    "name": dataset_name,
    "title": campaign_details.name,
    "notes": f"""{campaign_details.description}

This dataset contains environmental sensor data collected through the Upstream platform.

**Campaign Information:**
- Campaign ID: {campaign_id}
- Contact: {campaign_details.contact_name} ({campaign_details.contact_email})
- Allocation: {campaign_details.allocation}
- Duration: {campaign_details.start_date} to {campaign_details.end_date}

**Data Structure:**
- Sensors Configuration: Contains sensor metadata, units, and processing information
- Measurement Data: Time-series environmental measurements with geographic coordinates

**Access and Usage:**
Data is provided in CSV format for easy analysis and integration with various tools.""",
    "tags": ["environmental", "sensors", "upstream", "monitoring", "time-series"],
    "extras": [
        {"key": "campaign_id", "value": str(campaign_id)},
        {"key": "station_id", "value": str(station_id)},
        {"key": "source", "value": "Upstream Platform"},
        {"key": "data_type", "value": "environmental_sensor_data"},
        {"key": "contact_email", "value": campaign_details.contact_email},
        {"key": "allocation", "value": campaign_details.allocation},
        {"key": "export_date", "value": datetime.now().isoformat()}
    ],
    "license_id": "cc-by",  # Creative Commons Attribution
}

print(f"📋 Dataset Metadata Prepared:")
print(f"   • Name: {dataset_metadata['name']}")
print(f"   • Title: {dataset_metadata['title']}")
print(f"   • Tags: {', '.join(dataset_metadata['tags'])}")
print(f"   • License: {dataset_metadata['license_id']}")
print(f"   • Extra fields: {len(dataset_metadata['extras'])}")

In [None]:
# Publish campaign data to CKAN using integrated method
print(f"📤 Publishing campaign data to CKAN...")
station_name = client.stations.get(station_id=station_id, campaign_id=campaign_id).name

try:
    # Use the integrated CKAN publishing method
    publication_result = client.publish_to_ckan(
        campaign_id=str(campaign_id),
        station_id=str(station_id),
    )

    print(f"✅ CKAN Publication Successful!")
    print(f"\n📊 Publication Summary:")
    print(f"   • Success: {publication_result['success']}")
    print(f"   • Dataset Name: {publication_result['dataset']['name']}")
    print(f"   • Dataset ID: {publication_result['dataset']['id']}")
    print(f"   • Resources Created: {len(publication_result['resources'])}")
    print(f"   • CKAN URL: {publication_result['ckan_url']}")
    print(f"   • Message: {publication_result['message']}")

    # Store results for further operations
    published_dataset = publication_result['dataset']
    published_resources = publication_result['resources']
    ckan_dataset_url = publication_result['ckan_url']

    print(f"\n🎉 Your data is now publicly available at:")
    print(f"   {ckan_dataset_url}")

except Exception as e:
    print(f"❌ CKAN publication failed: {e}")
    print("\nTroubleshooting tips:")
    print("   • Check CKAN API credentials")
    print("   • Verify organization permissions")
    print("   • Ensure CKAN portal is accessible")
    print("   • Check dataset name uniqueness")
    raise

## 6. Dataset Verification and Exploration

Let's verify the published dataset and explore its contents in CKAN.

In [None]:
# Verify the published dataset
print(f"🔍 Verifying published dataset in CKAN...")

try:
    # Retrieve the dataset from CKAN to verify it was created correctly
    verified_dataset = ckan.get_dataset(published_dataset['name'])

    print(f"✅ Dataset verification successful!")
    print(f"\n📋 Dataset Information:")
    print(f"   • Name: {verified_dataset['name']}")
    print(f"   • Title: {verified_dataset['title']}")
    print(f"   • State: {verified_dataset['state']}")
    print(f"   • Private: {verified_dataset.get('private', 'Unknown')}")
    print(f"   • License: {verified_dataset.get('license_title', 'Not specified')}")
    print(f"   • Created: {verified_dataset.get('metadata_created', 'Unknown')}")
    print(f"   • Modified: {verified_dataset.get('metadata_modified', 'Unknown')}")

    # Show organization info if available
    if verified_dataset.get('organization'):
        org = verified_dataset['organization']
        print(f"   • Organization: {org.get('title', org.get('name', 'Unknown'))}")

    # Show tags
    if verified_dataset.get('tags'):
        tags = [tag['name'] for tag in verified_dataset['tags']]
        print(f"   • Tags: {', '.join(tags)}")

    # Show extras
    if verified_dataset.get('extras'):
        print(f"   • Extra metadata fields: {len(verified_dataset['extras'])}")
        for extra in verified_dataset['extras'][:3]:  # Show first 3
            print(f"     - {extra['key']}: {extra['value']}")

except Exception as e:
    print(f"❌ Dataset verification failed: {e}")

In [None]:
# Examine the published resources
print(f"📁 Examining published resources...")

try:
    resources = verified_dataset.get('resources', [])

    if resources:
        print(f"Found {len(resources)} resources:")

        for i, resource in enumerate(resources, 1):
            print(f"\n   📄 Resource {i}: {resource['name']}")
            print(f"      • ID: {resource['id']}")
            print(f"      • Format: {resource.get('format', 'Unknown')}")
            print(f"      • Size: {resource.get('size', 'Unknown')} bytes")
            print(f"      • Description: {resource.get('description', 'No description')}")
            print(f"      • Created: {resource.get('created', 'Unknown')}")
            print(f"      • URL: {resource.get('url', 'Not available')}")

            # Show download information
            if resource.get('url'):
                download_url = resource['url']
                if not download_url.startswith('http'):
                    download_url = f"{CKAN_URL}{download_url}"
                print(f"      • Download: {download_url}")

        print(f"\n✅ All resources published successfully!")

    else:
        print("⚠️  No resources found in the dataset")

except Exception as e:
    print(f"❌ Error examining resources: {e}")

## 7. Dataset Management Operations

Let's demonstrate additional CKAN management operations like updating datasets and managing resources.

In [None]:
# Update dataset with additional metadata
print(f"🔄 Demonstrating dataset update operations...")

try:
    # Add update timestamp and additional tags
    current_tags = [tag['name'] for tag in verified_dataset.get('tags', [])]
    updated_tags = current_tags + ["demo", "notebook-generated"]

    # Update the dataset
    updated_dataset = ckan.update_dataset(
        dataset_id=published_dataset['name'],
        tags=updated_tags,
        notes=f"{verified_dataset.get('notes', '')}\n\n**Last Updated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S UTC')} (via Upstream SDK Demo)"
    )

    print(f"✅ Dataset updated successfully!")
    print(f"   • New tags added: demo, notebook-generated")
    print(f"   • Description updated with timestamp")
    print(f"   • Total tags: {len(updated_dataset.get('tags', []))}")

except Exception as e:
    print(f"⚠️  Dataset update failed: {e}")
    print("This may be due to insufficient permissions or CKAN configuration.")

In [None]:
# Demonstrate resource management
print(f"📎 Demonstrating resource management...")

try:
    # Create a metadata resource with campaign summary
    metadata_content = {
        "campaign_info": {
            "id": str(campaign_id),
            "name": campaign_details.name,
            "description": campaign_details.description,
            "contact": {
                "name": campaign_details.contact_name,
                "email": campaign_details.contact_email
            },
            "allocation": campaign_details.allocation,
            "dates": {
                "start": str(campaign_details.start_date),
                "end": str(campaign_details.end_date)
            }
        },
        "station_info": {
            "id": str(station_id),
            "name": selected_station.name,
            "description": selected_station.description
        },
        "export_info": {
            "timestamp": datetime.now().isoformat(),
            "sdk_version": "1.0.0",
            "format_version": "1.0"
        }
    }

    # Create a JSON metadata file
    metadata_json = json.dumps(metadata_content, indent=2)
    metadata_file = BytesIO(metadata_json.encode('utf-8'))
    metadata_file.name = "campaign_metadata.json"

    # Add as a resource
    metadata_resource = ckan.create_resource(
        dataset_id=published_dataset['id'],
        name="Campaign Metadata",
        file_obj=metadata_file,
        format="JSON",
        description="Comprehensive metadata about the campaign, station, and export process",
        resource_type="metadata"
    )

    print(f"✅ Metadata resource created successfully!")
    print(f"   • Resource ID: {metadata_resource['id']}")
    print(f"   • Name: {metadata_resource['name']}")
    print(f"   • Format: {metadata_resource['format']}")
    print(f"   • Size: {len(metadata_json)} bytes")

except Exception as e:
    print(f"⚠️  Resource creation failed: {e}")
    print("This may be due to insufficient permissions or CKAN configuration.")

## 8. Data Discovery and Search

Let's demonstrate how published data can be discovered and searched in CKAN.

In [None]:
# Search for datasets using various criteria
print(f"🔍 Demonstrating CKAN data discovery capabilities...")

# Search by tags
print(f"\n1. 📌 Search by tags ('environmental', 'upstream'):")
try:
    tag_results = ckan.list_datasets(
        tags=["environmental", "upstream"],
        limit=5
    )

    if tag_results:
        print(f"   Found {len(tag_results)} datasets with environmental/upstream tags:")
        for dataset in tag_results:
            print(f"   • {dataset['name']}: {dataset['title']}")
            tags = [tag['name'] for tag in dataset.get('tags', [])]
            print(f"     Tags: {', '.join(tags)}")
    else:
        print("   No datasets found with these tags")

except Exception as e:
    print(f"   ❌ Tag search failed: {e}")

In [None]:
# Search by organization (if configured)
if CKAN_ORGANIZATION:
    print(f"\n2. 🏢 Search by organization ('{CKAN_ORGANIZATION}'):")
    try:
        org_results = ckan.list_datasets(
            organization=CKAN_ORGANIZATION,
            limit=5
        )

        if org_results:
            print(f"   Found {len(org_results)} datasets in organization:")
            for dataset in org_results:
                print(f"   • {dataset['name']}: {dataset['title']}")
                if dataset.get('organization'):
                    org = dataset['organization']
                    print(f"     Organization: {org.get('title', org.get('name'))}")
        else:
            print(f"   No datasets found in organization '{CKAN_ORGANIZATION}'")

    except Exception as e:
        print(f"   ❌ Organization search failed: {e}")
else:
    print(f"\n2. 🏢 Organization search skipped (no organization configured)")

In [None]:
# General dataset search
print(f"\n3. 📊 General dataset search:")
try:
    general_results = ckan.list_datasets(limit=10)

    if general_results:
        print(f"   Found {len(general_results)} total datasets (showing first 10):")
        for i, dataset in enumerate(general_results[:5], 1):
            print(f"   {i}. {dataset['name']}")
            print(f"      Title: {dataset['title']}")
            print(f"      Resources: {len(dataset.get('resources', []))}")
            if dataset.get('organization'):
                org = dataset['organization']
                print(f"      Organization: {org.get('title', org.get('name'))}")
            print()

        if len(general_results) > 5:
            print(f"   ... and {len(general_results) - 5} more datasets")
    else:
        print("   No datasets found")

except Exception as e:
    print(f"   ❌ General search failed: {e}")

## 9. Best Practices and Advanced Features

Let's explore best practices for CKAN integration and advanced features.

In [None]:
# Demonstrate data validation and quality checks
print(f"💡 CKAN Integration Best Practices:")

print(f"\n1. 📋 Dataset Naming Conventions:")
print(f"   • Use consistent prefixes (e.g., 'upstream-campaign-{campaign_id}')")
print(f"   • Include version information for updated datasets")
print(f"   • Use lowercase and hyphens for URL-friendly names")
print(f"   • Example: upstream-campaign-{campaign_id}-v2")

print(f"\n2. 🏷️  Metadata Best Practices:")
print(f"   • Use comprehensive descriptions with context")
print(f"   • Include contact information and data lineage")
print(f"   • Add standardized tags for discoverability")
print(f"   • Use extras for machine-readable metadata")
print(f"   • Specify appropriate licenses")

print(f"\n3. 📁 Resource Organization:")
print(f"   • Separate data files by type (sensors, measurements, metadata)")
print(f"   • Use descriptive resource names and descriptions")
print(f"   • Include format specifications (CSV headers, units)")
print(f"   • Provide data dictionaries for complex datasets")

print(f"\n4. 🔄 Update Management:")
print(f"   • Version datasets when structure changes")
print(f"   • Update modification timestamps")
print(f"   • Maintain backward compatibility when possible")
print(f"   • Document changes in dataset descriptions")

In [None]:
# Performance and monitoring considerations
print(f"\n⚡ Performance and Monitoring:")

# Check dataset and resource sizes
total_resources = len(verified_dataset.get('resources', []))
total_size = sum(int(r.get('size', 0)) for r in verified_dataset.get('resources', []) if r.get('size'))

print(f"\n📊 Current Dataset Metrics:")
print(f"   • Total Resources: {total_resources}")
print(f"   • Total Size: {total_size:,} bytes ({total_size/1024/1024:.2f} MB)")
print(f"   • Average Resource Size: {(total_size/total_resources)/1024:.1f} KB" if total_resources > 0 else "   • No resources with size information")

print(f"\n💡 Optimization Recommendations:")
if total_size > 50 * 1024 * 1024:  # 50 MB
    print(f"   ⚠️  Large dataset detected ({total_size/1024/1024:.1f} MB)")
    print(f"   • Consider data compression")
    print(f"   • Split into smaller time-based chunks")
    print(f"   • Use streaming for large file processing")
else:
    print(f"   ✅ Dataset size is reasonable ({total_size/1024/1024:.1f} MB)")

if total_resources > 10:
    print(f"   ⚠️  Many resources ({total_resources})")
    print(f"   • Consider consolidating related resources")
    print(f"   • Use clear naming conventions")
else:
    print(f"   ✅ Resource count is manageable ({total_resources})")

print(f"\n🔍 Monitoring Recommendations:")
print(f"   • Monitor dataset access patterns")
print(f"   • Track resource download statistics")
print(f"   • Set up automated data freshness checks")
print(f"   • Implement data quality validation pipelines")

## 10. Integration Workflows

Let's demonstrate automated workflows for continuous data publishing.

In [None]:
# Demonstrate automated publishing workflow
print(f"🔄 Automated CKAN Publishing Workflow:")

def automated_campaign_publisher(client, campaign_id, station_id=None, update_existing=True):
    """
    Automated workflow for publishing campaign data to CKAN.

    This function demonstrates a complete workflow that could be
    automated for regular data publishing.
    """
    workflow_steps = []

    try:
        # Step 1: Validate campaign
        workflow_steps.append("Validating campaign data...")
        print(f"   1️⃣  Validating campaign {campaign_id}...")
        campaign = client.get_campaign(str(campaign_id))

        # Step 2: Get stations
        workflow_steps.append("Retrieving station information...")
        print(f"   2️⃣  Retrieving stations...")
        stations = client.list_stations(campaign_id=str(campaign_id))

        if not stations.items:
            raise Exception("No stations found in campaign")

        target_station = stations.items[0] if not station_id else next(
            (s for s in stations.items if s.id == station_id), None
        )

        if not target_station:
            raise Exception(f"Station {station_id} not found")

        # Step 3: Check for existing dataset
        workflow_steps.append("Checking for existing CKAN dataset...")
        print(f"   3️⃣  Checking existing datasets...")
        dataset_name = f"upstream-campaign-{campaign_id}"

        dataset_exists = False
        try:
            existing_dataset = client.ckan.get_dataset(dataset_name)
            dataset_exists = True
            print(f"       Found existing dataset: {dataset_name}")
        except:
            print(f"       No existing dataset found")

        # Step 4: Publish or update
        if dataset_exists and update_existing:
            workflow_steps.append("Updating existing dataset...")
            print(f"   4️⃣  Updating existing dataset...")
        else:
            workflow_steps.append("Creating new dataset...")
            print(f"   4️⃣  Creating new dataset...")

        # Step 5: Publish data
        workflow_steps.append("Publishing data to CKAN...")
        print(f"   5️⃣  Publishing campaign data...")
        result = client.publish_to_ckan(
            campaign_id=str(campaign_id),
            station_id=str(target_station.id)
        )

        # Step 6: Validation
        workflow_steps.append("Validating published dataset...")
        print(f"   6️⃣  Validating publication...")

        return {
            "success": True,
            "dataset_name": dataset_name,
            "ckan_url": result['ckan_url'],
            "steps_completed": len(workflow_steps),
            "workflow_steps": workflow_steps
        }

    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "steps_completed": len(workflow_steps),
            "workflow_steps": workflow_steps,
            "failed_at_step": len(workflow_steps) + 1
        }

# Run the workflow demonstration
print(f"\n🚀 Running automated workflow for campaign {campaign_id}...")
workflow_result = automated_campaign_publisher(
    client=client,
    campaign_id=campaign_id,
    station_id=station_id,
    update_existing=True
)

print(f"\n📋 Workflow Results:")
print(f"   • Success: {workflow_result['success']}")
print(f"   • Steps Completed: {workflow_result['steps_completed']}")

if workflow_result['success']:
    print(f"   • Dataset: {workflow_result['dataset_name']}")
    print(f"   • URL: {workflow_result['ckan_url']}")
    print(f"   ✅ Automated publishing workflow completed successfully!")
else:
    print(f"   • Error: {workflow_result['error']}")
    print(f"   • Failed at step: {workflow_result['failed_at_step']}")
    print(f"   ❌ Workflow failed - see error details above")

## 11. Cleanup and Resource Management

Let's demonstrate proper cleanup and resource management.

In [None]:
# Dataset management options
print(f"🧹 Dataset Management and Cleanup Options:")

print(f"\n📊 Current Dataset Status:")
print(f"   • Dataset Name: {published_dataset['name']}")
print(f"   • Dataset ID: {published_dataset['id']}")
print(f"   • CKAN URL: {ckan_dataset_url}")
print(f"   • Resources: {len(published_resources)}")

print(f"\n🔧 Management Options:")
print(f"   1. Keep dataset active (recommended for production)")
print(f"   2. Make dataset private (hide from public)")
print(f"   3. Archive dataset (mark as deprecated)")
print(f"   4. Delete dataset (only for test data)")

# For demo purposes, we'll show how to manage the dataset
print(f"\n💡 For this demo, we'll keep the dataset active.")
print(f"   Your published data will remain available at:")
print(f"   {ckan_dataset_url}")

# Uncomment the following section if you want to delete the demo dataset
"""
# CAUTION: Uncomment only for cleanup of test datasets
print(f"\n⚠️  Demo dataset cleanup:")
try:
    # Delete the demo dataset (only for demo purposes)
    deletion_result = ckan.delete_dataset(published_dataset['name'])
    if deletion_result:
        print(f"   ✅ Demo dataset deleted successfully")
    else:
        print(f"   ❌ Dataset deletion failed")
except Exception as e:
    print(f"   ⚠️  Could not delete dataset: {e}")
    print(f"   This may be due to insufficient permissions or CKAN configuration.")
"""

print(f"\n🔄 Resource Cleanup:")
try:
    # Close any open file handles
    if 'station_sensors_data' in locals():
        station_sensors_data.close()
    if 'station_measurements_data' in locals():
        station_measurements_data.close()
    if 'metadata_file' in locals():
        metadata_file.close()

    print(f"   ✅ File handles closed")
except Exception as e:
    print(f"   ⚠️  Error closing file handles: {e}")

In [None]:
# Logout and final cleanup
print(f"👋 Session cleanup and logout...")

try:
    # Logout from Upstream
    client.logout()
    print(f"   ✅ Logged out from Upstream successfully")
except Exception as e:
    print(f"   ❌ Logout error: {e}")

print(f"\n🎉 CKAN Integration Demo Completed Successfully!")

print(f"\n📚 Summary of What We Accomplished:")
print(f"   ✅ Connected to both Upstream and CKAN platforms")
print(f"   ✅ Selected and validated campaign data")
print(f"   ✅ Exported sensor and measurement data")
print(f"   ✅ Created comprehensive CKAN dataset with metadata")
print(f"   ✅ Published resources (sensors, measurements, metadata)")
print(f"   ✅ Demonstrated dataset management operations")
print(f"   ✅ Explored data discovery and search capabilities")
print(f"   ✅ Showed automated publishing workflows")

print(f"\n🌐 Your Data is Now Publicly Available:")
print(f"   📊 Dataset: {published_dataset['name']}")
print(f"   🔗 URL: {ckan_dataset_url}")
print(f"   📁 Resources: {len(published_resources)} files available for download")

print(f"\n📖 Next Steps:")
print(f"   • Explore your published data in the CKAN web interface")
print(f"   • Set up automated publishing workflows for production")
print(f"   • Configure organization permissions and access controls")
print(f"   • Integrate CKAN APIs with other data analysis tools")
print(f"   • Monitor dataset usage and access patterns")

## Summary

This notebook demonstrated the comprehensive CKAN integration capabilities of the Upstream SDK:

✅ **Authentication & Setup** - Configured both Upstream and CKAN credentials  
✅ **Data Export** - Retrieved campaign data and prepared for publishing  
✅ **Dataset Creation** - Created CKAN datasets with rich metadata  
✅ **Resource Management** - Published multiple data resources (sensors, measurements, metadata)  
✅ **Portal Exploration** - Discovered existing datasets and organizations  
✅ **Update Operations** - Demonstrated dataset and resource updates  
✅ **Search & Discovery** - Showed data findability through tags and organization  
✅ **Automation Workflows** - Built reusable publishing processes  
✅ **Best Practices** - Covered naming, metadata, and performance considerations  

## Key Features

- **Seamless Integration**: Direct connection between Upstream campaigns and CKAN datasets
- **Rich Metadata**: Automatic generation of comprehensive dataset descriptions and tags
- **Multi-Resource Support**: Separate resources for sensors, measurements, and metadata
- **Update Management**: Smart handling of dataset updates and versioning
- **Error Handling**: Robust error handling and validation throughout the process
- **Automation Ready**: Workflow patterns suitable for production automation

## Production Considerations

- **Authentication**: Use environment variables or configuration files for credentials
- **Monitoring**: Implement logging and monitoring for automated publishing workflows
- **Permissions**: Configure appropriate CKAN organization permissions and access controls
- **Validation**: Add comprehensive data validation before publishing
- **Backup**: Maintain backup copies of datasets before updates

## Related Documentation

- [Upstream SDK Documentation](https://upstream-sdk.readthedocs.io/)
- [CKAN API Documentation](https://docs.ckan.org/en/latest/api/)
- [Environmental Data Publishing Best Practices](https://www.example.com/best-practices)

---

*This notebook demonstrates CKAN integration for the Upstream SDK. For core platform functionality, see UpstreamSDK_Core_Demo.ipynb*