# Upstream Data Upload Guide

## Overview

This guide demonstrates how to authenticate with the Upstream API and upload sensor data using CSV files for environmental monitoring campaigns.

## What You Can Do

The Upstream API allows you to:
- Authenticate and obtain access tokens
- Upload sensor definitions and measurement data
- Manage environmental monitoring campaigns
- Query and retrieve measurement data

## Prerequisites

- Valid Upstream account credentials
- Python 3.7+ with `requests` library installed
- CSV files with sensor and measurement data formatted correctly

## Installation

```bash
pip install requests
```

## Quick Start

1. **Authenticate** with the API to get your access token
2. **Prepare your CSV files** following the required format
3. **Upload your data** using the provided functions
4. **Monitor the results** and verify successful upload

In [35]:
! pip install tapipy
import requests
import json
import getpass
import os
from tapipy.tapis import Tapis
from typing import Dict, Any, Optional, List



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## 1. Authentication

First, we need to authenticate with the Upstream API to obtain an access token.


In [36]:

credentials = {
        "username": input("Username: "),
        "password": getpass.getpass("Password: ")
    }
def authenticate_upstream(base_url: str = "https://upstream-dso.tacc.utexas.edu/dev") -> str:
    """
    Authenticate with Upstream API and return access token.
    Args:
        base_url: Base URL for the Upstream API (dev or prod)
    Returns:
        Access token string
    Raises:
        Exception: If authentication fails
    """
    auth_url = f"{base_url}/api/v1/token"
    try:
        response = requests.post(auth_url, data=credentials)
        response.raise_for_status()    
        token = response.json().get("access_token")
        if not token:
            raise Exception("No access token in response")
        print("✅ Authentication successful!")
        return token
    except requests.exceptions.RequestException as e:
        raise Exception(f"Authentication failed: {e}")

# Get authentication token
token = authenticate_upstream()
# Create python Tapis client for user
t = Tapis(base_url= "https://portals.tapis.io",
          username=credentials['username'],
          password=credentials['password'])

# Call to Tokens API to get access token
t.get_tokens()
tapis_token = t.access_token

✅ Authentication successful!


In [None]:
def make_authenticated_request(
    method: str,
    url: str,
    token: str,
    json: Optional[Dict] = None,
    files: Optional[Dict] = None,
    params: Optional[Dict] = None
) -> requests.Response:
    """
    Make an authenticated HTTP request to the Upstream API.
    
    Args:
        method: HTTP method (GET, POST, PUT, DELETE, etc.)
        url: Full URL for the request
        token: Authentication token
        json: JSON data for the request body
        files: Files for multipart upload
        params: URL parameters
        
    Returns:
        Response object from the request
        
    Raises:
        requests.exceptions.HTTPError: If the request fails
    """
    headers = {
        "Authorization": f"Bearer {token}",
    }
    
    # Don't set Content-Type for file uploads (requests will set it automatically)
    if files is None:
        headers["Content-Type"] = "application/json"
    try:
        response = requests.request(
            method=method.upper(),
            url=url,
            headers=headers,
            json=json,
            files=files,
            params=params,
            timeout=300  # 5 minute timeout for large file uploads
        )
        
        # Raise an exception for bad status codes
        response.raise_for_status()
        return response
        
    except requests.exceptions.HTTPError as e:
        print(f"❌ HTTP Error: {e}")
        print(f"Response content: {response.text}")
        raise
    except requests.exceptions.RequestException as e:
        print(f"❌ Request Error: {e}")
        raise

## 2. Helper Functions for API Requests


In [None]:
def create_campaign(
    campaign_data:str,    
    token: str,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Create a new campaign.
    
    Args:
        name: Campaign name
        description: Campaign description
        allocation: TACC allocation identifier (required)
        token: Authentication token
        base_url: Base URL for the API
        
    Returns:
        Dictionary containing the created campaign data with ID
    """
    url = f"{base_url}/api/v1/campaigns"    
    response = make_authenticated_request(
        method="POST",
        url=url,
        token=token,
        json=campaign_data
    )
    result = response.json()
    print(f"✅ Campaign created successfully!")
    return result



### Creating Campaigns

Before uploading CSV data, you need to create a campaign to organize your data collection project. A campaign serves as the top-level container for all related monitoring activities.

#### Campaign Requirements

**Required Fields:**
- `name`: Descriptive name for your data collection project
- `description`: Detailed description of the campaign's purpose and scope

#### Campaign Best Practices

🎯 **Naming Conventions:**
- Use descriptive, unique names that clearly identify the project
- Include dates, locations, or project codes for easy identification
- Examples: "Austin Air Quality 2024", "Hurricane Harvey Recovery Monitoring"

📝 **Descriptions:**
- Provide detailed context about the campaign's objectives
- Include information about duration, scope, and expected outcomes
- Mention any relevant research or operational goals

In [None]:
def load_and_create_campaign(
    config_path: str = "campaigns/campaign.json",
    token: str = None,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Load campaign configuration from JSON and create the campaign.
    
    Args:
        config_path: Path to the campaign configuration JSON file
        token: Authentication token
        base_url: Base URL for the API
        
    Returns:
        Dictionary containing the created campaign data with ID
    """
    # Load configuration
    with open(config_path) as campaign_data:
        campaign_json = json.loads(campaign_data.read())

    # Validate required fields
    required_fields = ["name", "description"]
    for field in required_fields:
        if field not in campaign_json:
            raise ValueError(f"Missing required field '{field}' in campaign config")    
    # Display configuration summary
    print(f"📋 Campaign Configuration Summary:")
    print(f"  Name: {campaign_json['name']}")
    print(f"  Description: {campaign_json['description'][:100]}...")
    if "metadata" in campaign_json:
        metadata = campaign_json["metadata"]
        print(f"  Project Lead: {metadata.get('project_lead', 'N/A')}")
        print(f"  Institution: {metadata.get('institution', 'N/A')}")
    
    # Create the campaign
    campaign = create_campaign(
        campaign_data=campaign_json,
        token=token,
        base_url=base_url
    )
    return campaign

try:
    campaign = load_and_create_campaign(
        config_path="campaigns/campaign.json",
        token=token
    )    
    campaign_id = campaign['id']
except FileNotFoundError as e:
    print(f"❌ Configuration file error: {e}")
    print("💡 Please create a campaigns/campaign.json file with your campaign details")
except ValueError as e:
    print(f"❌ Configuration error: {e}")
except Exception as e:
    print(f"❌ Campaign creation failed: {e}")


=== Creating Campaign from Configuration ===
📋 Campaign Configuration Summary:
  Name: Beaumont Stream Gauge
  Description: Beaumont Stream Gauge Campaign...
✅ Campaign created successfully!
Campaign ID: 12

🎉 Campaign setup complete!
Campaign ID: 12


### Creating Stations

Once you have a campaign, you need to create stations within it. Stations represent specific monitoring locations where sensors collect data.

#### Station Requirements

**Required Fields:**
- `campaign_id`: ID of the parent campaign (must exist)
- `name`: Unique name for the monitoring station
- `description`: Details about the station location and purpose
- `latitude`: Decimal degrees (e.g., 30.2672)
- `longitude`: Decimal degrees (e.g., -97.7431)

#### Station Best Practices

📍 **Location Data:**
- Ensure coordinates are in decimal degrees format
- Use WGS84 coordinate system (standard GPS coordinates)
- Verify coordinates are accurate for your monitoring location
- Test coordinates in mapping software before creating stations

🏷️ **Station Naming:**
- Use descriptive names that indicate location or purpose
- Include geographic references or landmarks
- Examples: "River Bridge Station", "Industrial District Monitor"

📝 **Station Descriptions:**
- Describe the physical location and surroundings
- Note any special characteristics or constraints
- Include installation details or access information

#### Alternative: Web Interface for Stations

If you prefer using the web interface:

1. **Navigate to Campaign:**
   - Go to your created campaign in the web portal
   - Access the campaign details page

2. **Create Station:**
   - Go to the "Stations" section within the campaign
   - Click "Add Station"
   - Provide station details and coordinates
   - Save to get your Station ID

3. **Note the Station ID:**
   - Copy the Station ID for use in data uploads


💡 **Pro Tip:** Save your campaign and station IDs in a configuration file or notebook cell for easy reuse across multiple data uploads.

In [None]:
def create_station(
    station_data: Dict[str, Any],
    campaign_id: int,
    token: str,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Create a new station within a campaign.
    
    Args:
        station_data: Dictionary containing station information
        campaign_id: ID of the parent campaign
        token: Authentication token
        base_url: Base URL for the API
        
    Returns:
        Dictionary containing the created station data with ID
    """
    url = f"{base_url}/api/v1/campaigns/{campaign_id}/stations"    
    response = make_authenticated_request(
        method="POST",
        url=url,
        token=token,
        json=station_data
    )
    result = response.json()
    print(f"✅ Station created successfully!")
    print(f"Station ID: {result.get('id')}")
    print(f"Station Name: {station_data.get('name')}")
    print(f"Project ID: {station_data.get('projectid')}")
    print(f"Contact: {station_data.get('contact_name')}")
    
    return result

def load_station_config(config_path: str = "stations/station.json") -> Dict[str, Any]:
    """
    Load station configuration from JSON file.
    
    Args:
        config_path: Path to the station configuration JSON file    
    Returns:
        Dictionary containing station configuration data
    """
    try:
        with open(config_path, 'r', encoding='utf-8') as file:
            config = json.load(file)
            return config
    except FileNotFoundError:
        raise FileNotFoundError(f"Station config file not found: {config_path}")
    except json.JSONDecodeError as e:
        raise ValueError(f"Invalid JSON in station config file: {e}")

def load_and_create_station(
    campaign_id: int,
    config_path: str = "stations/station.json",
    token: str = None,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Load station configuration from JSON and create the station.
    
    Args:
        campaign_id: ID of the parent campaign
        config_path: Path to the station configuration JSON file
        token: Authentication token
        base_url: Base URL for the API
        
    Returns:
        Dictionary containing the created station data with ID
    """
    # Load configuration
    station_config = load_station_config(config_path)
    # Validate required fields
    required_fields = ["name", "projectid", "description", "contact_name", "contact_email", "active", "start_date"]
    for field in required_fields:
        if field not in station_config:
            raise ValueError(f"Missing required field '{field}' in station config")
    # Display configuration summary
    print(f"📋 Station Configuration Summary:")
    print(f"  Name: {station_config['name']}")
    print(f"  Project ID: {station_config['projectid']}")
    print(f"  Description: {station_config['description'][:100]}...")
    print(f"  Contact: {station_config['contact_name']}")
    print(f"  Active: {station_config['active']}")
    print(f"  Start Date: {station_config['start_date']}")
    # Create the station
    station = create_station(
        station_data=station_config,
        campaign_id=campaign_id,
        token=token,
        base_url=base_url
    )
    return station

def load_and_create_multiple_stations(
    campaign_id: int,
    config_path: str = "stations/stations.json",
    token: str = None,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> List[Dict[str, Any]]:
    """
    Load multiple station configurations from JSON and create all stations.
    Args:
        campaign_id: ID of the parent campaign
        config_path: Path to the stations configuration JSON file
        token: Authentication token
        base_url: Base URL for the API
        
    Returns:
        List of dictionaries containing the created station data
    """
    # Load configuration
    with open(config_path, 'r', encoding='utf-8') as file:
        stations_config = json.load(file)
    created_stations = []
    # Handle both single station and multiple stations format
    if "stations" in stations_config:
        station_list = stations_config["stations"]
    else:
        station_list = [stations_config]  # Single station format

    print(f"📋 Creating {len(station_list)} station(s)...")
    
    for i, station_config in enumerate(station_list, 1):
        print(f"\n--- Creating Station {i}/{len(station_list)} ---")        
        try:
            station = create_station(
                station_data=station_config,
                campaign_id=campaign_id,
                token=token,
                base_url=base_url
            )
            created_stations.append(station)
            
        except Exception as e:
            print(f"❌ Failed to create station '{station_config.get('name', 'Unknown')}': {e}")
            continue
    
    return created_stations

## 📡 Registering Environmental Monitoring Stations to CKAN
The next section walks you through the process of automating the registration of environmental monitoring stations to a CKAN data portal. By using this code, you're streamlining the workflow of:

- 🔐 Authenticating with CKAN using a JWT token

- 🏷️ Creating datasets that represent sensor stations

- 📎 Uploading metadata and resources such as sensor types, campaign info, and contact details

- 📁 Organizing data for discoverability and reuse within research communities



In [None]:
def create_ckan_dataset(
    jwt_token: str,
    dataset_name: str,
    title: str,
    description: str,
    tags: list = None,
    owner_org: str = None,
    ckan_url: str = "https://ckan.tacc.utexas.edu"
) -> Dict[str, Any]:
    """
    Create a dataset (package) in CKAN to represent a station.    
    Args:
        jwt_token: JWT authentication token
        dataset_name: Unique dataset identifier (lowercase, no spaces)
        title: Human-readable title
        description: Dataset description
        tags: List of tag names
        owner_org: owner_org name/id
        ckan_url: CKAN instance URL
        
    Returns:
        CKAN API response
    """
    
    # Prepare dataset metadata
    dataset_data = {
        "name": dataset_name,
        "title": title,
        "notes": description,
        "tags": [{"name": tag} for tag in (tags or [])],
        "private": False,
        "type": "dataset"
    }
    
    dataset_data["owner_org"] = owner_org
    # CKAN API endpoint
    api_url = f"{ckan_url}/api/3/action/package_create"
    # Headers with JWT token
    headers = {
        "Authorization": f"Bearer {tapis_token.access_token}",
        "Content-Type": "application/json"
    }
    try:
        response = requests.post(
            api_url,
            headers=headers,
            json=dataset_data
        )        
        response.raise_for_status()
        result = response.json()
        if result.get("success"):
            dataset_id = result["result"]["id"]
            dataset_url = f"{ckan_url}/dataset/{dataset_name}"   
            return result["result"]
        else:
            print(f"❌ CKAN API returned error: {result}")
            raise Exception(f"CKAN API error: {result}")
            
    except requests.exceptions.RequestException as e:
        print(f"❌ HTTP request failed: {e}")
        if hasattr(e, 'response') and e.response is not None:
            print(f"   Response: {e.response.text}")
        raise
    except Exception as e:
        print(f"❌ Dataset creation failed: {e}")
        raise

def register_station_to_ckan(
    jwt_token:str,
    station_name: str,
    station_title: str,
    station_description: str,
    campaign_name: str = None,
    sensor_types: list = None,
    author:str=None,
    author_email:str=None,
    owner_org: str = None,
    ckan_url: str = "https://ckan.tacc.utexas.edu"
) -> Dict[str, Any]:
    """
    Complete workflow to register a station in CKAN.
    Args:
        username: Tapis username
        password: Tapis password
        station_name: Unique station identifier
        station_title: Human-readable station title
        station_description: Station description
        campaign_name: Associated campaign name
        location: Station location
        sensor_types: List of sensor types at this station
        owner_org: CKAN owner_org
        ckan_url: CKAN instance URL
        
    Returns:
        CKAN dataset information
    """
    tags = []
    if sensor_types:
        tags.extend(sensor_types)
    if campaign_name:
        tags.append(f"campaign-{campaign_name}")
    tags.extend(["sensor-station", "environmental-data", "upstream"])
    # Enhanced description
    enhanced_description = station_description
    if campaign_name:
        enhanced_description += f"\nCampaign: {campaign_name}"
    if sensor_types:
        enhanced_description += f"\nSensor Types: {', '.join(sensor_types)}"
    # Step 3: Create CKAN dataset
    print("3️⃣  Creating CKAN dataset...")
    dataset = create_ckan_dataset(
        jwt_token=jwt_token,
        dataset_name=station_name,
        title=station_title,
        description=enhanced_description,
        tags=tags,
        owner_org=owner_org,
        ckan_url=ckan_url
    )
    print("✅ Station registration completed!")
    return dataset

def add_resources_to_station(
    jwt_token: str,
    dataset_id: str,
    resources: list,
    ckan_url: str = "https://ckan.tacc.utexas.edu"
) -> list:
    """
    Add data resources (files/URLs) to a station dataset.
    Args:
        jwt_token: JWT authentication token
        dataset_id: CKAN dataset ID
        resources: List of resource dictionaries
        ckan_url: CKAN instance URL
        
    Returns:
        List of created resources
    """
    
    api_url = f"{ckan_url}/api/3/action/resource_create"
    headers = {
        "Authorization": f"Bearer {jwt_token}",
        "Content-Type": "application/json"
    }

    created_resources = []
    for resource in resources:
        resource_data = {
            "package_id": dataset_id,
            **resource
        }
        print(f"📎 Adding resource: {resource.get('name', 'Unnamed')}")
        try:
            response = requests.post(
                api_url,
                headers=headers,
                json=resource_data
            )
            response.raise_for_status()
            result = response.json()
            if result.get("success"):
                created_resources.append(result["result"])
                print(f"   ✅ Resource added: {result['result']['id']}")
            else:
                print(f"   ❌ Failed to add resource: {result}")
        except Exception as e:
            print(f"   ❌ Error adding resource: {e}")
    return created_resources

# Load station metadata from JSON file
def load_station_metadata(json_file_path: str = "stations/station.json") -> Dict[str, Any]:
    """
    Load station metadata from JSON file.
    
    Args:
        json_file_path: Path to the station JSON file
        
    Returns:
        Station metadata dictionary
    """
    try:
        with open(json_file_path, 'r') as f:
            station_data = json.load(f)        
        print(f"📋 Loaded station metadata from {json_file_path}")
        print(f"   Station: {station_data.get('name', 'Unknown')}")
        print(f"   Project: {station_data.get('projectid', 'Unknown')}")
        print(f"   Active: {station_data.get('active', 'Unknown')}")
        return station_data
        
    except FileNotFoundError:
        print(f"❌ Station metadata file not found: {json_file_path}")
        raise
    except json.JSONDecodeError as e:
        print(f"❌ Invalid JSON in station file: {e}")
        raise
    except Exception as e:
        print(f"❌ Error loading station metadata: {e}")
        raise

def convert_station_metadata_for_ckan(station_data: Dict[str, Any]) -> Dict[str, Any]:
    """
    Convert station metadata to CKAN-compatible format.
    
    Args:
        station_data: Raw station metadata from JSON
        
    Returns:
        CKAN-compatible station information
    """
    
    # Create CKAN-compatible dataset name (lowercase, no spaces, no special chars)
    station_name = station_data.get('name', 'unknown-station')
    ckan_name = station_name.lower().replace(' ', '-').replace('/', '-').replace('_', '-')
    # Remove any remaining special characters
    import re
    ckan_name = re.sub(r'[^a-z0-9\-]', '', ckan_name)
    # Build enhanced description
    description_parts = [station_data.get('description', 'Environmental monitoring station')]
    if station_data.get('projectid'):
        description_parts.append(f"Project: {station_data['projectid']}")
    if station_data.get('contact_name'):
        description_parts.append(f"Contact: {station_data['contact_name']}")
        if station_data.get('contact_email'):
            description_parts.append(f"Email: {station_data['contact_email']}")
    if station_data.get('start_date'):
        description_parts.append(f"Start Date: {station_data['start_date']}")
    if station_data.get('active') is not None:
        status = "Active" if station_data['active'] else "Inactive"
        description_parts.append(f"Status: {status}")
    enhanced_description = "\n\n".join(description_parts)
    # Create tags from project and other metadata
    tags = ["environmental-monitoring", "upstream", "sensor-station"]
    if station_data.get('projectid'):
        # Clean project ID for tag
        project_tag = station_data['projectid'].lower().replace(' ', '-').replace('_', '-')
        project_tag = re.sub(r'[^a-z0-9\-]', '', project_tag)
        tags.append(f"project-{project_tag}")
    return( {
        "station_name": ckan_name,
        "station_title": station_data.get('name', 'Unknown Station'),
        "station_description": enhanced_description,
        "campaign_name": station_data.get('projectid'),
        "owner_org":"setx-uifl",
        "author":station_data.get('contact_name'),
        "author_email":station_data.get('contact_email'),
        "sensor_types": ["water-level", "stream-gauge"],  # Inferred from description
        "raw_metadata": station_data  # Keep original data for reference
    })

## ⚙️ Running the Station Registration Workflow

This section of the code provides **two options** for registering environmental monitoring stations to CKAN, based on your configuration files:

### 🧪 Create a Single Station

If you're working with **one station at a time**, this block reads a single configuration file (`stations/station.json`) and walks through the entire registration process:

- Loads metadata  
- Formats it for CKAN  
- Registers the station as a dataset  
- Returns a station ID upon success

💡 *Useful when you're testing or onboarding new stations one by one.*



In [0]:
try:
    station = load_and_create_station(
        campaign_id=campaign_id,
        config_path="stations/station.json",
        token=token
    )    
    station_id = station['id']
except FileNotFoundError as e:
    print(f"❌ Configuration file error: {e}")
    print("💡 Please create a stations/station.json file with your station details")
except ValueError as e:
    print(f"❌ Configuration error: {e}")
except Exception as e:
    print(f"❌ Station creation failed: {e}")

=== Creating Single Station from Configuration ===
📄 Loaded station config from: stations/station.json
📋 Station Configuration Summary:
  Name: Cow Bayou near Mauriceville
  Project ID: SETx-UIFL Beaumont
  Description: Beaumont Run stream gauge at Cow Bayou...
  Contact: Nick Brake
  Active: True
  Start Date: 2025-06-02T14:42:00+0000
✅ Station created successfully!
Station ID: 39
Station Name: Cow Bayou near Mauriceville
Project ID: SETx-UIFL Beaumont
Contact: Nick Brake

🎉 Station setup complete!
Station ID: 39

=== Creating Multiple Stations from Configuration ===
📋 Creating 2 station(s)...

--- Creating Station 1/2 ---
✅ Station created successfully!
Station ID: 40
Station Name: Cow Bayou near Mauriceville
Project ID: SETx-UIFL Beaumont
Contact: Nick Brake

--- Creating Station 2/2 ---
✅ Station created successfully!
Station ID: 41
Station Name: Pine Island Bayou near Sour Lake
Project ID: SETx-UIFL Beaumont
Contact: Nick Brake

🎉 Created 2 station(s) successfully!
  • Unknown (ID

### 🧩 Create Multiple Stations

Need to register **several stations at once**? This block processes a configuration file (`stations/stations.json`) containing a list of station definitions. It will:

- Loop through each station entry  
- Run the registration process for each  
- Report success or failure for individual stations

💡 *Great for batch imports or syncing an entire sensor network in one go.*

Both workflows include helpful print statements and error handling to guide you through common issues — such as missing files or malformed configs.


In [0]:
try:
    stations = load_and_create_multiple_stations(
        campaign_id=campaign_id,
        config_path="stations/stations.json",
        token=token
    )    
except FileNotFoundError as e:
    print(f"❌ Configuration file error: {e}")
    print("💡 Please create a stations/stations.json file with your station details")
except Exception as e:
    print(f"❌ Multiple stations creation failed: {e}")

# 🛰️ Station Registration & Resource Publishing Guide

This document guides you through the registration of a station and the publication of its associated metadata and resources into a CKAN data portal.

---

## 1️⃣ Load Station Metadata

Begin by loading your station's configuration from a local JSON file.

- **File:** `./stations/station.json`
- **Expected Fields:**
  - `name`
  - `projectid`
  - `contact_name`
  - `contact_email`
  - `start_date`
  - ...and other relevant metadata

---

## 2️⃣ Convert Metadata to CKAN Format

Transform the raw station metadata into the format expected by CKAN. This typically includes:

- `station_name`: A machine-readable slug (e.g., `lake-travis-buoy`)
- `station_title`: A human-readable title
- `campaign_name`: Associated research campaign
- `tags`, `groups`, bounding boxes, and other CKAN-compatible fields

---

## 3️⃣ Register Station in CKAN

Use your Tapis JWT token to register the station with CKAN.

- ✅ **Dataset ID**
- ✅ **Dataset Name**
- ✅ **CKAN URL**  
  Format: `https://ckan.tacc.utexas.edu/dataset/<dataset-name>`

---

## 4️⃣ Add Station Resources

Add data endpoints and visualizations as resources to enrich the dataset.

### 🔗 Base Resources

- **Station Information**  
  > Full metadata & configuration for this station  
  `JSON` - `/api/v1/campaigns/<campaign_id>/stations/<station_id>`

- **All Station Sensors**  
  > List of all sensors deployed at the station  
  `JSON` - `/api/v1/campaigns/<campaign_id>/stations/<station_id>/sensors`

- **All Sensors and Visualizations**  
  > Frontend dashboard for sensors and charts  
  `Website` - `https://dso-tacc.netlify.app/campaigns/<campaign_id>/stations/<station_id>`

- **Aggregated Statistics**  
  > Time-aggregated measurements with statistical analysis  
  `JSON` - `/api/v1/campaigns/<campaign_id>/stations/<station_id>/measurements/aggregated`

---

### 🧾 Optional: Contact Information

If `contact_name` or `contact_email` is provided in the JSON, a text-based resource is added:

**Contact Information**



In [None]:
# Load station metadata from JSON file
raw_station_data = load_station_metadata("./stations/station.json")

# Convert to CKAN format
station_info = convert_station_metadata_for_ckan(raw_station_data)

# Register the station
dataset = register_station_to_ckan(
jwt_token=tapis_token.access_token,
    **{k: v for k, v in station_info.items() if k != 'raw_metadata'}
)

# Get JWT token again (in case it expired)
jwt_token = tapis_token.access_token
station_name = raw_station_data.get('name', 'this station')
base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"

# Base station resources
resources = [
        {
            "name": "Station Information",
            "description": f"Complete station metadata and configuration for {station_name}",
            "format": "JSON",
            "url": f"{base_url}/api/v1/campaigns/{campaign_id}/stations/{station_id}"
        },
        {
            "name": "All Station Sensors",
            "description": f"Complete list of sensors deployed at {station_name}",
            "format": "JSON",
            "url": f"{base_url}/api/v1/campaigns/{campaign_id}/stations/{station_id}/sensors"
        },
        {
            "name": "All Sensors and Visualizations",
            "description": f"All Sensors and Visualizations from {station_name} (paginated)",
            "format": "Website",
            "url": f"https://dso-tacc.netlify.app/campaigns/{campaign_id}/stations/{station_id}"
        },
        {
            "name": "Aggregated Statistics",
            "description": f"Time-aggregated measurements with statistical analysis from {station_name}",
            "format": "JSON",
            "url": f"{base_url}/api/v1/campaigns/{campaign_id}/stations/{station_id}/measurements/aggregated"
        }
    ]
# Add contact information as a resource if available
if raw_station_data.get('contact_name') or raw_station_data.get('contact_email'):
    contact_info = {
        "name": "Contact Information",
        "description": "Station contact and project information",
        "format": "TEXT"
    }
    contact_text = f"Station Contact Information\n"
    contact_text += f"Station: {raw_station_data.get('name', 'Unknown')}\n"
    contact_text += f"Project: {raw_station_data.get('projectid', 'Unknown')}\n"
    if raw_station_data.get('contact_name'):
        contact_text += f"Contact Name: {raw_station_data['contact_name']}\n"
    if raw_station_data.get('contact_email'):
        contact_text += f"Contact Email: {raw_station_data['contact_email']}\n"
    if raw_station_data.get('start_date'):
        contact_text += f"Start Date: {raw_station_data['start_date']}\n"
    # For this example, we'll add it as a URL (you might want to upload as a file instead)
    resources.append(contact_info)

created_resources = add_resources_to_station(
    jwt_token=jwt_token,
    dataset_id=dataset['id'],
    resources=resources
)

1️⃣  Loading station metadata from JSON...
📋 Loaded station metadata from ./stations/station.json
   Station: Cow Bayou near Mauriceville
   Project: SETx-UIFL Beaumont
   Active: True
2️⃣  Converting metadata for CKAN...
   CKAN Dataset Name: cow-bayou-near-mauriceville
   Title: Cow Bayou near Mauriceville
   Campaign: SETx-UIFL Beaumont
3️⃣  Registering station to CKAN...
🚀 REGISTERING STATION TO CKAN
3️⃣  Creating CKAN dataset...
🏗️  Creating CKAN dataset: cow-bayou-near-mauriceville
   Title: Cow Bayou near Mauriceville
   URL: https://ckan.tacc.utexas.edu/api/3/action/package_create
✅ Dataset created successfully!
   Dataset ID: e6c2acb7-99a4-44ad-9935-d220874dd10c
   Dataset URL: https://ckan.tacc.utexas.edu/dataset/cow-bayou-near-mauriceville
✅ Station registration completed!

🎉 Station registered successfully!
Dataset ID: e6c2acb7-99a4-44ad-9935-d220874dd10c
Dataset Name: cow-bayou-near-mauriceville
Dataset URL: https://ckan.tacc.utexas.edu/dataset/cow-bayou-near-mauriceville


## CSV Data Upload Function Documentation

### Overview

The `upload_csv_data` function provides a streamlined way to upload sensor and measurement data to the Upstream platform via CSV files. This function handles file validation, authentication, and provides detailed feedback on the upload process.


### Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `campaign_id` | `int` | ✅ | Unique identifier for the target campaign |
| `station_id` | `int` | ✅ | Unique identifier for the target station within the campaign |
| `sensors_file_path` | `str` | ✅ | Local file path to the sensors CSV file |
| `measurements_file_path` | `str` | ✅ | Local file path to the measurements CSV file |
| `token` | `str` | ✅ | Authentication token for API access |
| `base_url` | `str` | ❌ | Base URL for the Upstream API (defaults to dev environment) |

### Return Value

Returns a `Dict[str, Any]` containing the upload response data with statistics including:
- Total sensors processed
- Total measurements added to database
- Data processing time

### Features

#### 🔍 **File Validation**
- Automatically checks if both CSV files exist before attempting upload
- Raises `FileNotFoundError` with descriptive messages for missing files

#### 📊 **Progress Tracking**
- Displays upload parameters for verification
- Shows real-time upload status with emoji indicators
- Provides detailed statistics upon completion

#### 🔐 **Secure Upload**
- Uses authenticated requests via the `make_authenticated_request` helper
- Properly formats files for multipart form data upload

#### 🎯 **Error Handling**
- Pre-upload file existence validation
- Clear error messages for troubleshooting

### API Endpoint

The function uploads to the following endpoint:
```
POST {base_url}/api/v1/uploadfile_csv/campaign/{campaign_id}/station/{station_id}/sensor
```

### File Format Requirements

#### Sensors CSV
- Must contain sensor definition data
- Uploaded as `upload_file_sensors` form field

#### Measurements CSV  
- Must contain measurement data corresponding to the sensors
- Uploaded as `upload_file_measurements` form field

### Console Output Example

```
=== Uploading CSV Data ===
Campaign ID: 123
Station ID: 456
Sensors file: ./data/sensors.csv
Measurements file: ./data/measurements.csv
📤 Uploading files...
✅ Upload completed successfully!
📊 Upload Statistics:
  • Sensors processed: 15
  • Measurements added: 1,250
  • Processing time: 2.3s
```

### Dependencies

- `os` - For file existence checking
- `make_authenticated_request` - Custom function for authenticated API calls
- `Dict`, `Any` from `typing` - For type hints

### Error Scenarios

| Error Type | Cause | Solution |
|------------|-------|----------|
| `FileNotFoundError` | CSV file doesn't exist at specified path | Verify file paths are correct |
| Authentication errors | Invalid or expired token | Refresh authentication token |
| API errors | Server issues or invalid parameters | Check campaign/station IDs and API status |

### Best Practices

1. **Validate Data First**: Ensure your CSV files are properly formatted before upload
2. **Check Permissions**: Verify you have write access to the specified campaign/station
3. **Monitor Output**: Pay attention to the upload statistics to confirm expected data volumes
4. **Handle Errors**: Always wrap calls in try-catch blocks for production use
5. **Use Absolute Paths**: Prefer absolute file paths to avoid path resolution issues


In [None]:
import glob
import os
from pathlib import Path

def upload_csv_data(
    campaign_id: int,
    station_id: int,
    token: str,
    data_dir: str = "./data/",
    author:str=None,
    author_email:str=None,
    sensors_filename: str = "sensors.csv",
    measurements_filename: str = "measurements.csv",
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Upload sensor and measurement CSV files to Upstream from data directory.
    
    Args:
        campaign_id: ID of the target campaign
        station_id: ID of the target station
        token: Access token
        data_dir: Directory containing CSV files (default: "./data/")
        sensors_filename: Name of sensors CSV file (default: "sensors.csv")
        measurements_filename: Name of measurements CSV file (default: "measurements.csv")
        base_url: Base URL for the API
        
    Returns:
        Upload response data
    """
    # Construct file paths
    sensors_file_path = os.path.join(data_dir, sensors_filename)
    measurements_file_path = os.path.join(data_dir, measurements_filename)    
    upload_url = f"{base_url}/api/v1/uploadfile_csv/campaign/{campaign_id}/station/{station_id}/sensor"
    print(f"=== Uploading CSV Data ===")
    print(f"Campaign ID: {campaign_id}")
    print(f"Station ID: {station_id}")
    print(f"Data Directory: {data_dir}")
    print(f"Sensors file: {sensors_file_path}")
    print(f"Measurements file: {measurements_file_path}")
    # Verify files exist
    if not os.path.exists(sensors_file_path):
        raise FileNotFoundError(f"Sensors file not found: {sensors_file_path}")
    if not os.path.exists(measurements_file_path):
        raise FileNotFoundError(f"Measurements file not found: {measurements_file_path}")
    # Display file information
    sensors_size = os.path.getsize(sensors_file_path)
    measurements_size = os.path.getsize(measurements_file_path)
    print(f"📁 File Information:")
    print(f"  • Sensors file size: {sensors_size:,} bytes")
    print(f"  • Measurements file size: {measurements_size:,} bytes")
    
    # Prepare files for upload
    with open(sensors_file_path, 'rb') as sensors_file, \
            open(measurements_file_path, 'rb') as measurements_file:
        files = {
            'upload_file_sensors': (sensors_filename, sensors_file, 'text/csv'),
            'upload_file_measurements': (measurements_filename, measurements_file, 'text/csv')
        }
        print("📤 Uploading files...")
        response = make_authenticated_request(
            method="POST",
            url=upload_url,
            token=token,
            files=files
        )
        result = response.json()
        print("✅ Upload completed successfully!")
        # Display upload statistics
        print(f"📊 Upload Statistics:")
        print(f"  • Sensors processed: {result.get('Total sensors processed', 'N/A')}")
        print(f"  • Measurements added: {result.get('Total measurements added to database', 'N/A')}")
        print(f"  • Processing time: {result.get('Data Processing time', 'N/A')}")
        return result

def list_data_files(data_dir: str = "./data/") -> Dict[str, list]:
    """
    List all CSV files in the data directory.
    
    Args:
        data_dir: Directory to search for CSV files
        
    Returns:
        Dictionary with lists of found files
    """
    if not os.path.exists(data_dir):
        print(f"❌ Data directory not found: {data_dir}")
        return {"csv_files": [], "sensors_files": [], "measurements_files": []}
    
    # Find all CSV files
    csv_pattern = os.path.join(data_dir, "*.csv")
    csv_files = glob.glob(csv_pattern)
    
    # Categorize files
    sensors_files = [f for f in csv_files if 'sensor' in os.path.basename(f).lower()]
    measurements_files = [f for f in csv_files if 'measurement' in os.path.basename(f).lower()]
    
    print(f"📁 Files found in {data_dir}:")
    print(f"  • Total CSV files: {len(csv_files)}")
    print(f"  • Sensor files: {len(sensors_files)}")
    print(f"  • Measurement files: {len(measurements_files)}")
    
    if csv_files:
        print(f"📄 All CSV files:")
        for file in csv_files:
            size = os.path.getsize(file)
            print(f"    - {os.path.basename(file)} ({size:,} bytes)")
    
    return {
        "csv_files": csv_files,
        "sensors_files": sensors_files,
        "measurements_files": measurements_files
    }

def upload_data_with_auto_detection(
    campaign_id: int,
    station_id: int,
    token: str,
    data_dir: str = "./data/",
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Upload CSV data with automatic file detection.
    
    Args:
        campaign_id: ID of the target campaign
        station_id: ID of the target station
        token: Access token
        data_dir: Directory containing CSV files
        base_url: Base URL for the API
        
    Returns:
        Upload response data
    """
    print("=== Auto-detecting Data Files ===")
    files_info = list_data_files(data_dir)
    # Try to find sensors and measurements files
    sensors_file = None
    measurements_file = None
    # Look for standard filenames first
    standard_sensors = os.path.join(data_dir, "sensors.csv")
    standard_measurements = os.path.join(data_dir, "measurements.csv")
    if os.path.exists(standard_sensors):
        sensors_file = "sensors.csv"
    elif files_info["sensors_files"]:
        sensors_file = os.path.basename(files_info["sensors_files"][0])
        print(f"🔍 Using detected sensors file: {sensors_file}")
    if os.path.exists(standard_measurements):
        measurements_file = "measurements.csv"
    elif files_info["measurements_files"]:
        measurements_file = os.path.basename(files_info["measurements_files"][0])
        print(f"🔍 Using detected measurements file: {measurements_file}")
    
    if not sensors_file or not measurements_file:
        raise FileNotFoundError(
            f"Could not find required files. "
            f"Sensors: {sensors_file}, Measurements: {measurements_file}"
        )
    
    # Upload the files
    return upload_csv_data(
        campaign_id=campaign_id,
        station_id=station_id,
        token=token,
        data_dir=data_dir,
        sensors_filename=sensors_file,
        measurements_filename=measurements_file,
        base_url=base_url
    )

# Usage examples
data_files = list_data_files("./data/")
try:
    # Upload using standard filenames
    result = upload_csv_data(
        campaign_id=campaign_id,
        station_id=station_id,
        token=token,
        data_dir="./data/",
        sensors_filename="sensors.csv",
        measurements_filename="measurements.csv"
    )
except FileNotFoundError as e:
    print(f"❌ File error: {e}")
    print("💡 Make sure your CSV files are in the ./data/ directory")
except Exception as e:
    print(f"❌ Upload failed: {e}")
try:
    # Upload with automatic file detection
    result = upload_data_with_auto_detection(
        campaign_id=campaign_id,
        station_id=station_id,
        token=token,
        data_dir="./data/"
    )    
except FileNotFoundError as e:
    print(f"❌ File detection error: {e}")
    print("💡 Make sure your CSV files are in the ./data/ directory with 'sensor' and 'measurement' in their names")
except Exception as e:
    print(f"❌ Auto-upload failed: {e}")


=== Listing Available Data Files ===
📁 Files found in ./data/:
  • Total CSV files: 2
  • Sensor files: 1
  • Measurement files: 1
📄 All CSV files:
    - measurements.csv (906,949 bytes)
    - sensors.csv (173 bytes)

=== Uploading CSV Data (Standard Method) ===
=== Uploading CSV Data ===
Campaign ID: 12
Station ID: 39
Data Directory: ./data/
Sensors file: ./data/sensors.csv
Measurements file: ./data/measurements.csv
📁 File Information:
  • Sensors file size: 173 bytes
  • Measurements file size: 906,949 bytes
📤 Uploading files...
✅ Upload completed successfully!
📊 Upload Statistics:
  • Sensors processed: 3
  • Measurements added: 26881
  • Processing time: 9.2 seconds.

🎉 Data upload complete!

=== Uploading CSV Data (Auto-Detection Method) ===
=== Auto-detecting Data Files ===
📁 Files found in ./data/:
  • Total CSV files: 2
  • Sensor files: 1
  • Measurement files: 1
📄 All CSV files:
    - measurements.csv (906,949 bytes)
    - sensors.csv (173 bytes)
=== Uploading CSV Data ===
Ca

### CSV File Format Examples

#### Sensors CSV Format

Your `sensors.csv` file defines the sensor metadata and should follow this structure:

```csv
alias,variablename,units,postprocess,postprocessscript
temp_sensor_01,Air Temperature,°C,,
humidity_01,Relative Humidity,%,,
pressure_01,Atmospheric Pressure,hPa,,
wind_speed_01,Wind Speed,m/s,true,wind_correction_script
```

**Column Descriptions:**
- `alias`: Unique identifier for the sensor (used as column header in measurements)
- `variablename`: Human-readable description of what the sensor measures
- `units`: Measurement units (e.g., °C, %, hPa, m/s)
- `postprocess`: Boolean flag indicating if post-processing is required
- `postprocessscript`: Name of the post-processing script (if applicable)

#### Measurements CSV Format

Your `measurements.csv` file contains the actual sensor data and should follow this structure:

```csv
collectiontime,Lat_deg,Lon_deg,temp_sensor_01,humidity_01,pressure_01,wind_speed_01
2024-01-15T10:30:00,30.2672,-97.7431,23.5,65.2,1013.25,2.3
2024-01-15T10:31:00,30.2673,-97.7432,23.7,64.8,1013.20,2.1
2024-01-15T10:32:00,30.2674,-97.7433,23.9,64.5,1013.15,1.8
2024-01-15T10:33:00,30.2675,-97.7434,,64.2,1013.10,1.9
```

**Required Columns:**
- `collectiontime`: Timestamp in ISO 8601 format (YYYY-MM-DDTHH:MM:SS)
- `Lat_deg`: Latitude in decimal degrees
- `Lon_deg`: Longitude in decimal degrees

**Sensor Data Columns:**
- Each sensor `alias` from sensors.csv becomes a column header
- Column names must exactly match the sensor aliases
- Empty values are automatically handled (see row 4 in example)

#### Important File Format Notes

⚠️ **Critical Requirements:**
- Each sensor `alias` from sensors.csv becomes a column in measurements.csv
- `collectiontime`, `Lat_deg`, and `Lon_deg` are required columns in measurements.csv
- Empty values are handled automatically by the system
- Maximum file size is **500 MB per file**
- Use UTF-8 encoding for both files
- Timestamps should be in UTC or include timezone information

📝 **Best Practices:**
- Keep sensor aliases short but descriptive
- Use consistent naming conventions (e.g., `sensor_type_number`)
- Ensure measurement values match the units specified in sensors.csv
- Include all sensors in measurements.csv even if some readings are missing


#### Helper Function Features

🔍 **Campaign Discovery:**
- List all campaigns you have access to
- View campaign metadata and descriptions
- Identify the correct campaign ID for your data

🏗️ **Station Management:**
- List all stations within a campaign
- View station details and locations
- Find the appropriate station ID for your sensors

💡 **Integration Tip:**
Use these helper functions before uploading data to ensure you're targeting the correct campaign and station IDs.


In [None]:
import glob
import os
import pandas as pd
import tempfile
import shutil
from pathlib import Path
from typing import Dict, Any, List, Optional
import time
import math

def get_file_info(file_path: str) -> Dict[str, Any]:
    """Get detailed information about a CSV file."""
    if not os.path.exists(file_path):
        return {}
    
    file_size = os.path.getsize(file_path)    
    # Count rows efficiently
    with open(file_path, 'r', encoding='utf-8') as f:
        row_count = sum(1 for line in f) - 1  # Subtract header row
    return {
        'size_bytes': file_size,
        'size_mb': file_size / (1024 * 1024),
        'row_count': row_count,
        'estimated_chunk_count': lambda chunk_size: math.ceil(row_count / chunk_size)
    }

def create_csv_chunks(
    file_path: str,
    chunk_size: int = 10000,
    output_dir: Optional[str] = None,
    max_file_size_mb: int = 50
) -> List[str]:
    """
    Split a large CSV file into smaller chunks.
    
    Args:
        file_path: Path to the large CSV file
        chunk_size: Number of rows per chunk
        output_dir: Directory to store chunk files (temp dir if None)
        max_file_size_mb: Maximum file size per chunk in MB
        
    Returns:
        List of chunk file paths
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")
    
    # Create output directory
    if output_dir is None:
        output_dir = tempfile.mkdtemp(prefix="csv_chunks_")
    else:
        os.makedirs(output_dir, exist_ok=True)
    
    file_info = get_file_info(file_path)
    filename = os.path.basename(file_path)
    name, ext = os.path.splitext(filename)
    
    print(f"📦 Chunking {filename}:")
    print(f"  • Total rows: {file_info['row_count']:,}")
    print(f"  • File size: {file_info['size_mb']:.2f} MB")
    print(f"  • Chunk size: {chunk_size:,} rows")
    print(f"  • Estimated chunks: {file_info['estimated_chunk_count'](chunk_size)}")
    
    chunk_files = []
    try:
        # Read and chunk the CSV file
        chunk_num = 0
        for chunk_df in pd.read_csv(file_path, chunksize=chunk_size):
            chunk_num += 1
            chunk_filename = f"{name}_chunk_{chunk_num:03d}{ext}"
            chunk_path = os.path.join(output_dir, chunk_filename)
            # Save chunk
            chunk_df.to_csv(chunk_path, index=False)
            # Check file size
            chunk_size_mb = os.path.getsize(chunk_path) / (1024 * 1024)
            if chunk_size_mb > max_file_size_mb:
                print(f"⚠️  Warning: Chunk {chunk_num} is {chunk_size_mb:.2f} MB (exceeds {max_file_size_mb} MB limit)")
            chunk_files.append(chunk_path)
            print(f"  ✓ Created chunk {chunk_num}: {len(chunk_df)} rows, {chunk_size_mb:.2f} MB")
    
    except Exception as e:
        # Clean up on error
        for chunk_file in chunk_files:
            if os.path.exists(chunk_file):
                os.remove(chunk_file)
        raise e
    
    print(f"📦 Created {len(chunk_files)} chunks in {output_dir}")
    return chunk_files

def upload_csv_data_chunked(
    campaign_id: int,
    station_id: int,
    token: str,
    data_dir: str = "./data/",
    sensors_filename: str = "sensors.csv",
    measurements_filename: str = "measurements.csv",
    chunk_size: int = 10000,
    max_file_size_mb: int = 50,
    cleanup_chunks: bool = True,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Upload large CSV files using chunking strategy.
    
    Args:
        campaign_id: ID of the target campaign
        station_id: ID of the target station
        token: Access token
        data_dir: Directory containing CSV files
        sensors_filename: Name of sensors CSV file
        measurements_filename: Name of measurements CSV file
        chunk_size: Number of rows per chunk
        max_file_size_mb: Maximum file size per chunk in MB
        cleanup_chunks: Whether to delete chunk files after upload
        base_url: Base URL for the API
        
    Returns:
        Aggregated upload response data
    """
    print(f"=== Chunked CSV Data Upload ===")
    print(f"Campaign ID: {campaign_id}")
    print(f"Station ID: {station_id}")
    print(f"Chunk size: {chunk_size:,} rows")
    print(f"Max chunk file size: {max_file_size_mb} MB")
    
    # Construct file paths
    sensors_file_path = os.path.join(data_dir, sensors_filename)
    measurements_file_path = os.path.join(data_dir, measurements_filename)
    
    # Verify files exist
    if not os.path.exists(sensors_file_path):
        raise FileNotFoundError(f"Sensors file not found: {sensors_file_path}")
    if not os.path.exists(measurements_file_path):
        raise FileNotFoundError(f"Measurements file not found: {measurements_file_path}")
    
    # Get file information
    sensors_info = get_file_info(sensors_file_path)
    measurements_info = get_file_info(measurements_file_path)
    print(f"\n📁 File Analysis:")
    print(f"  • Sensors: {sensors_info['row_count']:,} rows, {sensors_info['size_mb']:.2f} MB")
    print(f"  • Measurements: {measurements_info['row_count']:,} rows, {measurements_info['size_mb']:.2f} MB")
    # Create temporary directory for chunks
    chunk_dir = tempfile.mkdtemp(prefix="upload_chunks_")
    try:
        # Create chunks
        print("\n--- Chunking Sensors File ---")
        sensors_chunks = create_csv_chunks(
            sensors_file_path, 
            chunk_size=chunk_size,
            output_dir=os.path.join(chunk_dir, "sensors"),
            max_file_size_mb=max_file_size_mb
        )
        
        print("\n--- Chunking Measurements File ---")
        measurements_chunks = create_csv_chunks(
            measurements_file_path,
            chunk_size=chunk_size, 
            output_dir=os.path.join(chunk_dir, "measurements"),
            max_file_size_mb=max_file_size_mb
        )
        
        # Upload chunks
        total_chunks = max(len(sensors_chunks), len(measurements_chunks))
        successful_uploads = 0
        failed_uploads = 0
        aggregated_results = {
            'total_sensors_processed': 0,
            'total_measurements_added': 0,
            'total_processing_time': 0,
            'chunk_results': []
        }
        print(f"\n📤 Uploading {total_chunks} chunk pairs...")
        
        for i in range(total_chunks):
            chunk_num = i + 1
            print(f"\n--- Uploading Chunk {chunk_num}/{total_chunks} ---")   
            try:
                # Get chunk files (use last chunk if one file has fewer chunks)
                sensors_chunk = sensors_chunks[min(i, len(sensors_chunks) - 1)]
                measurements_chunk = measurements_chunks[min(i, len(measurements_chunks) - 1)]
                # Upload chunk pair
                start_time = time.time()
                with open(sensors_chunk, 'rb') as sf, open(measurements_chunk, 'rb') as mf:
                    files = {
                        'upload_file_sensors': (os.path.basename(sensors_chunk), sf, 'text/csv'),
                        'upload_file_measurements': (os.path.basename(measurements_chunk), mf, 'text/csv')
                    }
                    upload_url = f"{base_url}/api/v1/uploadfile_csv/campaign/{campaign_id}/station/{station_id}/sensor"
                    response = make_authenticated_request(
                        method="POST",
                        url=upload_url,
                        token=token,
                        files=files
                    )
                
                upload_time = time.time() - start_time
                result = response.json()
                
                # Aggregate results
                aggregated_results['total_sensors_processed'] += result.get('Total sensors processed', 0)
                aggregated_results['total_measurements_added'] += result.get('Total measurements added to database', 0)
                aggregated_results['total_processing_time'] += upload_time
                aggregated_results['chunk_results'].append({
                    'chunk': chunk_num,
                    'sensors_processed': result.get('Total sensors processed', 0),
                    'measurements_added': result.get('Total measurements added to database', 0),
                    'upload_time': upload_time
                })
                
                successful_uploads += 1
                print(f"  ✅ Chunk {chunk_num} uploaded successfully")
                print(f"     • Sensors: {result.get('Total sensors processed', 0)}")
                print(f"     • Measurements: {result.get('Total measurements added to database', 0)}")
                print(f"     • Time: {upload_time:.2f}s")
                
            except Exception as e:
                failed_uploads += 1
                print(f"  ❌ Chunk {chunk_num} failed: {e}")
                aggregated_results['chunk_results'].append({
                    'chunk': chunk_num,
                    'error': str(e)
                })
        
        # Final results
        print(f"\n📊 Chunked Upload Summary:")
        print(f"  • Total chunks: {total_chunks}")
        print(f"  • Successful: {successful_uploads}")
        print(f"  • Failed: {failed_uploads}")
        print(f"  • Total sensors processed: {aggregated_results['total_sensors_processed']:,}")
        print(f"  • Total measurements added: {aggregated_results['total_measurements_added']:,}")
        print(f"  • Total processing time: {aggregated_results['total_processing_time']:.2f}s")
        
        if failed_uploads > 0:
            print(f"⚠️  {failed_uploads} chunks failed to upload")
        return aggregated_results
    finally:
        # Cleanup chunks if requested
        if cleanup_chunks and os.path.exists(chunk_dir):
            print(f"🧹 Cleaning up chunk files from {chunk_dir}")
            shutil.rmtree(chunk_dir)

def list_data_files(data_dir: str = "./data/") -> Dict[str, list]:
    """List all CSV files in the data directory.
    Args:
        data_dir: Directory to search for CSV files
        
    Returns:
        Dictionary with lists of found files
    """
    if not os.path.exists(data_dir):
        print(f"❌ Data directory not found: {data_dir}")
        return {"csv_files": [], "sensors_files": [], "measurements_files": []}
    
    # Find all CSV files
    csv_pattern = os.path.join(data_dir, "*.csv")
    csv_files = glob.glob(csv_pattern)
    
    # Categorize files
    sensors_files = [f for f in csv_files if 'sensor' in os.path.basename(f).lower()]
    measurements_files = [f for f in csv_files if 'measurement' in os.path.basename(f).lower()]
    
    print(f"📁 Files found in {data_dir}:")
    print(f"  • Total CSV files: {len(csv_files)}")
    print(f"  • Sensor files: {len(sensors_files)}")
    print(f"  • Measurement files: {len(measurements_files)}")
    if csv_files:
        print(f"📄 All CSV files:")
        for file in csv_files:
            size = os.path.getsize(file)
            print(f"    - {os.path.basename(file)} ({size:,} bytes)")
    return {
        "csv_files": csv_files,
        "sensors_files": sensors_files,
        "measurements_files": measurements_files
    }

def upload_data_with_auto_detection(
    campaign_id: int,
    station_id: int,
    token: str,
    data_dir: str = "./data/",
    use_chunking: bool = False,
    chunk_size: int = 10000,
    max_file_size_mb: int = 50,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """
    Upload CSV data with automatic file detection.
    
    Args:
        campaign_id: ID of the target campaign
        station_id: ID of the target station
        token: Access token
        data_dir: Directory containing CSV files
        use_chunking: Whether to use chunked upload
        chunk_size: Number of rows per chunk (if chunking)
        max_file_size_mb: Maximum file size per chunk in MB (if chunking)
        base_url: Base URL for the API
        
    Returns:
        Upload response data
    """
    print("=== Auto-detecting Data Files ===")
    files_info = list_data_files(data_dir)
    
    # Try to find sensors and measurements files
    sensors_file = None
    measurements_file = None
    
    # Look for standard filenames first
    standard_sensors = os.path.join(data_dir, "sensors.csv")
    standard_measurements = os.path.join(data_dir, "measurements.csv")
    
    if os.path.exists(standard_sensors):
        sensors_file = "sensors.csv"
    elif files_info["sensors_files"]:
        sensors_file = os.path.basename(files_info["sensors_files"][0])
        print(f"🔍 Using detected sensors file: {sensors_file}")
    
    if os.path.exists(standard_measurements):
        measurements_file = "measurements.csv"
    elif files_info["measurements_files"]:
        measurements_file = os.path.basename(files_info["measurements_files"][0])
        print(f"🔍 Using detected measurements file: {measurements_file}")
    
    if not sensors_file or not measurements_file:
        raise FileNotFoundError(
            f"Could not find required files. "
            f"Sensors: {sensors_file}, Measurements: {measurements_file}"
        )

In [None]:
def get_campaigns(token: str, base_url: str = "https://upstream-dso.tacc.utexas.edu/dev") -> Dict[str, Any]:
    """Get list of available campaigns."""
    url = f"{base_url}/api/v1/campaigns"
    response = make_authenticated_request("GET", url, token)
    return response.json()

def get_stations(campaign_id: int, token: str, base_url: str = "https://upstream-dso.tacc.utexas.edu/dev") -> Dict[str, Any]:
    """Get list of stations for a campaign."""
    url = f"{base_url}/api/v1/campaigns/{campaign_id}/stations"
    response = make_authenticated_request("GET", url, token)
    return response.json()

'\nprint("=== Available Campaigns ===")\ncampaigns = get_campaigns(token)\nprint(json.dumps(campaigns, indent=2))\n\nprint("=== Available Stations ===")\nstations = get_stations(CAMPAIGN_ID, token)\nprint(json.dumps(stations, indent=2))\n'

In [None]:
# List available files
files_info = list_data_files("./data/")
# Analyze file sizes
sensors_path = "./data/sensors.csv"
measurements_path = "./data/measurements.csv"
if os.path.exists(sensors_path) and os.path.exists(measurements_path):
    sensors_info = get_file_info(sensors_path)
    measurements_info = get_file_info(measurements_path)
    total_size_mb = sensors_info['size_mb'] + measurements_info['size_mb']
    total_rows = sensors_info['row_count'] + measurements_info['row_count']
    # Start upload with progress tracking
    start_time = time.time()
    try:
        result = upload_csv_data_chunked(
            campaign_id=campaign_id,
            station_id=station_id,
            token=token,
            data_dir="./data/",
            sensors_filename="sensors.csv",
            measurements_filename="measurements.csv",
            chunk_size=6000,
            max_file_size_mb=30
        )
        total_time = time.time() - start_time
    except Exception as e:
        print(f"❌ Progress monitored upload failed: {e}")

📁 Files found in ./data/:
  • Total CSV files: 2
  • Sensor files: 1
  • Measurement files: 1
📄 All CSV files:
    - measurements.csv (906,949 bytes)
    - sensors.csv (173 bytes)
📈 Upload Progress Estimation:
  • Total data: 0.87 MB, 8,964 rows
  • Estimated upload time: 1.7 seconds
=== Chunked CSV Data Upload ===
Campaign ID: 12
Station ID: 39
Chunk size: 6,000 rows
Max chunk file size: 30 MB

📁 File Analysis:
  • Sensors: 3 rows, 0.00 MB
  • Measurements: 8,961 rows, 0.86 MB

--- Chunking Sensors File ---
📦 Chunking sensors.csv:
  • Total rows: 3
  • File size: 0.00 MB
  • Chunk size: 6,000 rows
  • Estimated chunks: 1
  ✓ Created chunk 1: 3 rows, 0.00 MB
📦 Created 1 chunks in /var/folders/ps/dx2yrk_1117grf32kqlw9qyh0000gq/T/upload_chunks_6eegcnam/sensors

--- Chunking Measurements File ---
📦 Chunking measurements.csv:
  • Total rows: 8,961
  • File size: 0.86 MB
  • Chunk size: 6,000 rows
  • Estimated chunks: 2
  ✓ Created chunk 1: 6000 rows, 0.58 MB
  ✓ Created chunk 2: 2961 rows

## Create Measurement
The create_measurement function allows you to post a single measurement to the Upstream API for a specific sensor within a campaign and station.

In [33]:
def create_measurement(
    campaign_id: int,
    station_id: int, 
    sensor_id: int,
    measurement_data: Dict[str, Any],
    token: str,
    base_url: str = "https://upstream-dso.tacc.utexas.edu/dev"
) -> Dict[str, Any]:
    """Create a single measurement for a sensor."""
    url = f"{base_url}/api/v1/campaigns/{campaign_id}/stations/{station_id}/sensors/{sensor_id}/measurements"
    response = make_authenticated_request("POST", url, token, json=measurement_data)
    return response.json()



In [42]:

campaign_id = 12
station_id = 39  
sensor_id = 9664

# Measurement data
measurement_data = {
    "variablename": "Rain Increement",
    "collectiontime": "2024-01-15T10:37:00",
    "variabletype": "float", 
    "description": "Rain Increment measurement",
    "measurementvalue": 25.3,
    "geometry":  'POINT(10.12345 20.54321)'
    
}
result = create_measurement(
            campaign_id=campaign_id,
            station_id=station_id,
            sensor_id=sensor_id,
            measurement_data=measurement_data,
            token=token
        )

## 7. Best Practices

1. **File Preparation:**
   - Validate your CSV files before upload
   - Ensure sensor aliases match between files
   - Use consistent timestamp formats

2. **Error Handling:**
   - Always wrap API calls in try-catch blocks
   - Check file existence before upload
   - Validate response status codes

3. **Security:**
   - Never hardcode credentials in notebooks
   - Store tokens securely
   - Use environment variables for sensitive data

4. **Performance:**
   - Keep files under 500 MB for optimal performance
   - Use batch uploads for large datasets
   - Monitor upload progress and statistics