# Reading and writing data on Cloud object storage 
Reading from and writing to Cloud object storage (e.g. AWS S3, Google Cloud Storage, Azure Blob Storage) is a bit different than regular filesystems.   Here we access public read buckets and write to an S3-API-compatible Pangeo@EOSC MinIO bucket.  We use `fsspec`, which makes many types of data storage (including S3) look like filesystems. 

In [None]:
import fsspec
import pandas as pd
import os
import xarray as xr

In [None]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
from zarr.errors import UnstableSpecificationWarning
warnings.filterwarnings("ignore", category=UnstableSpecificationWarning)

List files on a public read bucket

In [None]:
fs = fsspec.filesystem('s3', anon=True)

In [None]:
fs.ls('anaconda-public-datasets')

Reading CSV from a public read bucket

In [None]:
df = pd.read_csv(fs.open("s3://anaconda-public-datasets/iris/iris.csv"))
df

Write CSV to an S3 bucket

In [None]:
from dotenv import load_dotenv
_ = load_dotenv(f'/home/jovyan/dotenv/school_2025.env')  # create AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env vars

In [None]:
username = os.environ['JUPYTERHUB_USER']
print(username)

In [None]:
fs = fsspec.filesystem('s3', anon=True, skip_instance_cache=True, use_listings_cache=False,
                       endpoint_url='https://pangeo-eosc-minioapi.vm.fedcloud.eu')

In [None]:
bucket = 's3://protocoast-school-2025'

In [None]:
fs.ls(bucket)

In [None]:
outfile = fs.open(f"s3://{bucket}/{username}/testing/iris.csv", 
                      mode='wt')

with outfile as f:
    df.to_csv(f)

List files on restricted S3 bucket

In [None]:
fs.ls(f'{bucket}/{username}/testing/')

In [None]:
df = pd.read_csv(fs.open(f"s3://{bucket}/{username}/testing/iris.csv"))
df

The rest of the examples will use xarray, which follows the NetCDF data model

Read NetCDF data from THREDDS OPeNDAP Service  

In [None]:
ds = xr.open_dataset('http://thredds.socib.es/thredds/dodsC/mooring/temperature_recorder/station_andratx-scb_temprec002/L1/dep0001_station-andratx_scb-temprec002_L1_latest.nc',
               engine='pydap')

In [None]:
ds

Visualation interlude: plot a time range of data with hvplot

In [None]:
import hvplot.xarray

ds['WTR_TEM'].sel(time=slice('2025-10-01','2025-10-07')).hvplot(grid=True)

Write Xarray Dataset to NetCDF, then upload to Cloud bucket

In [None]:
del ds['station_name'].attrs['DODS']

In [None]:
local_file = 'socib_andratx.nc'
ds.to_netcdf(local_file, mode='w')

In [None]:
s3_url = f's3://{bucket}/{username}/{local_file}'
_ = fs.upload(local_file, s3_url)

Read NetCDF data from s3 bucket

In [None]:
xr.open_dataset(fs.open(s3_url))

Write Xarray Dataset directly to Cloud bucket in Zarr format

In [None]:
ds.to_zarr(fs.get_mapper(f'{bucket}/{username}/socib_andratx.zarr'), mode='w')

In [None]:
xr.open_dataset(fs.get_mapper(f'{bucket}/{username}/socib_andratx.zarr'), engine='zarr')