# Example: Accessing Climate Data from OSN

This notebook demonstrates how to access climate data from the Open Storage Network (OSN) on LEAP Pangeo JupyterHub.

**Learning Objectives:**
- Connect to OSN storage
- List available datasets
- Load climate data with xarray
- Perform basic exploratory analysis

**Prerequisites:**
- Run `leap_startup.ipynb` first to install dependencies

In [None]:
# Import required packages
import s3fs
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path

print("‚úÖ Imports successful!")

## 1. Configure OSN Access

Set up connection to the Open Storage Network where LEAP hackathon data is hosted.

In [None]:
# OSN Configuration
OSN_ENDPOINT_URL = "https://ncsa.osn.xsede.org"
OSN_BUCKET = "Pangeo"
HACKATHON_PREFIX = "leap-persistent/hackathon-2024"

# Initialize S3 filesystem (anonymous access for public data)
fs = s3fs.S3FileSystem(
    anon=True,
    client_kwargs={'endpoint_url': OSN_ENDPOINT_URL}
)

print(f"üìç OSN Endpoint: {OSN_ENDPOINT_URL}")
print(f"üìç Bucket: {OSN_BUCKET}")
print(f"üìç Prefix: {HACKATHON_PREFIX}")
print(f"\n‚úÖ S3 filesystem initialized!")

## 2. List Available Data

Explore what datasets are available on OSN.

In [None]:
# List available files in the hackathon directory
try:
    files = fs.ls(f"{OSN_BUCKET}/{HACKATHON_PREFIX}")
    print(f"Found {len(files)} files/directories:\n")
    for i, f in enumerate(files[:10], 1):
        file_name = f.split('/')[-1]
        print(f"{i:2d}. {file_name}")
    
    if len(files) > 10:
        print(f"\n... and {len(files) - 10} more files")
except Exception as e:
    print(f"‚ö†Ô∏è Error listing files: {e}")
    print("\nThis is expected if the hackathon data isn't uploaded yet.")
    print("Replace with your actual data path when available.")

## 3. Load Climate Data (Example)

Example of loading a Zarr or NetCDF dataset. Replace with your actual data path.

In [None]:
# Example: Loading a Zarr dataset
# Replace 'your-dataset.zarr' with actual dataset name

# dataset_path = f"s3://{OSN_BUCKET}/{HACKATHON_PREFIX}/your-dataset.zarr"
# ds = xr.open_zarr(fs.get_mapper(dataset_path))

# For now, create example data
print("Creating example dataset (replace with actual data loading):\n")

# Create example climate data
time = pd.date_range('2020-01-01', periods=365, freq='D')
lat = np.linspace(-90, 90, 180)
lon = np.linspace(-180, 180, 360)

temperature = 15 + 10 * np.random.randn(len(time), len(lat), len(lon))

ds = xr.Dataset(
    {
        'temperature': (['time', 'lat', 'lon'], temperature),
    },
    coords={
        'time': time,
        'lat': lat,
        'lon': lon,
    },
    attrs={
        'description': 'Example climate dataset',
        'units': 'Celsius'
    }
)

print(ds)
print("\n‚úÖ Dataset loaded successfully!")

## 4. Basic Data Exploration

Explore the structure and contents of the dataset.