# Climatological product

> Setelah membaca chapter ini, pembaca diharapkan dapat membuat produk klimatologis (peta spasial dan diagram windrose) berdasarkan wilayah pelayanan provinsi yang informatif dan mudah untuk dipahami.

## Intro

Pada chapter sebelumnya, kita telah mencoba mengolah data multi dimensi menggunakan data sampel, mulai dari operasi dasar hingga visualisasi data pada peta spasial. Chapter ini akan menggabungkan beberapa chapter sebelumnya untuk membuat produk klimatologi spasial, area, dan titik. Terdapat tiga proses:
- load data: inacawo, inawaves, dan inaflows. Khusus untuk pelatihan ini, data tersedia selama tahun 2016-2020 untuk inawaves dan inaflows, serta 2024-sekarang untuk inacawo.
- preprocess data: agregrasi (perhitungan klimatologi).
- plot data: peta spasial dan wind rose.

Dalam hal ini, kita akan menggunakan template script yang telah disimpan dalam bentuk modul (`klimtool.py`) untuk mempermudah pemrosesan. 
Script `klimtool.py` berisi beberapa `class`, namun hanya `class` `klimtool` yang akan dipanggil di notebook ini. `class` `klimtool` memiliki method yang ditujukan sebagai data loader, preprocessor, hingga plotter. Untuk setiap langkah yang diselesaikan dengan memanggil modul tersebut, terdapat penjelasan singkat agar user dapat memahami alur dari pemrosesan data.

## Load modul dan dataset

| Method | Deskripsi | Argument mandatory |
| :- | :- | :- |
| `klimtool.open_inawaves` | Membuka dataset inawaves | `tstart:datetime`, `tend:datetime`, `timefreq:str`, `latlon:list[float]` | 
| `klimtool.open_inaflows` | Membuka dataset inaflows | `tstart:datetime`, `tend:datetime`, `timefreq:str`, `latlon:list[float]` | 
| `klimtool.open_inacawo` | Membuka dataset inacawo | `tstart:datetime`, `tend:datetime`, `timefreq:str`, `latlon:list[float]` | 

dimana `tstart`, `tend`, `timefreq` masing-masing merupakan batas waktu awal dan akhir serta frekuensi waktu (`1H`: per1jam , `3H`: per3jam, `1D`: harian, `MS`: bulanan, `YS`: Tahunan). ⚠️Perlu diperhatikan untuk frekuensi per1jam hanya berlaku untuk dataset inacawo⚠️.

In [1]:
from klimtool import klimtool

In [2]:
%%time
klimtool().open_inawaves()

FileNotFoundError: No such file or directory: '/data/local/ofs/inawaves_combined.zarr'

In [3]:
xr.open_zarr("/data/local/InaCAWO/cawozarr/")

NameError: name 'xr' is not defined

## Klimatologi spasial

| Method | Deskripsi | Argument mandatory |
| :- | :- | :- |
| `klimtool.run_plot` | Memulai plotting | `model:str`, `ds:xarray.Dataset`, `timefreq:str`, `var:str`, `area_type:str`, `area_name:str`,  `map_title: str`, `out_dir:str` |

## Klimatologi area dan titik

In [7]:
%%time
from minio import Minio

client = Minio("172.17.0.1:9990",
    access_key="moqs1u5xKGk6tRx3GbyJ",
    secret_key="OLGsgagMmgPWE6aRaBGdiQUC7U4WLFKQXIGiN821",
    secure=False
)
client.list_buckets()

CPU times: user 236 ms, sys: 23.4 ms, total: 260 ms
Wall time: 430 ms


[Bucket('bucket-test'), Bucket('zarr-data')]

In [16]:
for obj in client.list_objects("zarr-data"):
    print(obj.object_name)

demo-dataset.zarr/
mydata.zarr/


In [4]:
import zarr
import fsspec
import numpy as np
from zarr.storage import FSStore

# Konfigurasi koneksi ke MinIO
fs = fsspec.filesystem(
    "s3",
    key="moqs1u5xKGk6tRx3GbyJ",
    secret="OLGsgagMmgPWE6aRaBGdiQUC7U4WLFKQXIGiN821",
    endpoint_url="http://172.17.0.1:9990",
    skip_instance_cache=True,
)

# Nama bucket dan path untuk menyimpan data
bucket_name = "zarr-data"
store_path = f"{bucket_name}/mydata.zarr"

# Pastikan bucket sudah ada
if not fs.exists(bucket_name):
    fs.mkdir(bucket_name)

# Membuat store menggunakan FsspecStore
store = FSStore(url=store_path, fs=fs)

# Membuat data dummy
data = np.random.rand(500, 500)

# Menyimpan data ke Zarr di MinIO
z = zarr.create(shape=data.shape, chunks=(100, 100), dtype="f4", store=store)
z[:] = data

print(f"Zarr array berhasil disimpan di MinIO pada {store_path}")

Zarr array berhasil disimpan di MinIO pada zarr-data/mydata.zarr


In [3]:
print(zarr.__version__)

2.18.4


In [15]:
import xarray as xr
import numpy as np
import s3fs

# Buat xarray Dataset dummy
ds = xr.Dataset(
    {
        "temperature": (("lat", "lon"), np.random.rand(50, 100)),
        "salinity": (("lat", "lon"), np.random.rand(50, 100))
    },
    coords={
        "lat": np.linspace(-90, 90, 50),
        "lon": np.linspace(0, 359, 100)
    }
)

# Setup koneksi ke MinIO via s3fs
fs = s3fs.S3FileSystem(
    key="moqs1u5xKGk6tRx3GbyJ",
    secret="OLGsgagMmgPWE6aRaBGdiQUC7U4WLFKQXIGiN821",
    client_kwargs={"endpoint_url": "http://172.17.0.1:9990"}
)

# Path penyimpanan di MinIO
store = FSStore(
    url="zarr-data/demo-dataset.zarr", 
    fs=fs
)

# Simpan ke Zarr (overwrite jika sudah ada)
# ds.to_zarr(store=store, storage_options={"client_kwargs": {"endpoint_url": "http://172.17.0.1:9990"}}, mode="w")
ds.to_zarr(store=store, mode="w")

print("✅ Dataset berhasil disimpan ke MinIO sebagai Zarr.")


✅ Dataset berhasil disimpan ke MinIO sebagai Zarr.


In [18]:
import xarray as xr
import s3fs
from zarr.storage import FSStore

# Koneksi MinIO
fs = s3fs.S3FileSystem(
    key="moqs1u5xKGk6tRx3GbyJ",
    secret="OLGsgagMmgPWE6aRaBGdiQUC7U4WLFKQXIGiN821",
    client_kwargs={"endpoint_url": "http://172.17.0.1:9990"}
)

# Path dataset Zarr di MinIO
store = FSStore("zarr-data/demo-dataset.zarr", fs=fs)

# Buka Zarr
ds_reload = xr.open_zarr(store=store, consolidated=True)

print("✅ Dataset berhasil dimuat:")
ds_reload

✅ Dataset berhasil dimuat:


Unnamed: 0,Array,Chunk
Bytes,39.06 kiB,39.06 kiB
Shape,"(50, 100)","(50, 100)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 39.06 kiB 39.06 kiB Shape (50, 100) (50, 100) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",100  50,

Unnamed: 0,Array,Chunk
Bytes,39.06 kiB,39.06 kiB
Shape,"(50, 100)","(50, 100)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,39.06 kiB,39.06 kiB
Shape,"(50, 100)","(50, 100)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 39.06 kiB 39.06 kiB Shape (50, 100) (50, 100) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",100  50,

Unnamed: 0,Array,Chunk
Bytes,39.06 kiB,39.06 kiB
Shape,"(50, 100)","(50, 100)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [21]:
ds_reload.nbytes/1000

81.2

In [2]:
import xarray as xr
import numpy as np
import s3fs
import zarr
from zarr.storage import FSStore

def update_zarr_forecast(ds_new, zarr_path, fs_kwargs, time_dim="time"):
    fs = s3fs.S3FileSystem(**fs_kwargs)
    store = FSStore(zarr_path, fs=fs)

    t_new = ds_new[time_dim].values

    if not fs.exists(zarr_path):
        print("Zarr belum ada, inisialisasi...")
        ds_new.to_zarr(store=store, mode="w")
        return

    print("Membuka Zarr existing...")
    root = zarr.open_group(store, mode="a")
    z_meta = zarr.open_group(store, mode="r")  # read metadata via xarray
    t_existing = z_meta[time_dim][:]
    
    overlap_mask = np.isin(t_new, t_existing)
    new_mask = ~overlap_mask
    print(f"  Replace: {overlap_mask.sum()} | Append: {new_mask.sum()}")

    for var in ds_new.data_vars:
        zarr_array = root[var]
        data_new = ds_new[var].values

        for i, t in enumerate(t_new):
            if overlap_mask[i]:
                idx_existing = np.where(t_existing == t)[0][0]
                zarr_array[idx_existing] = data_new[i]

    if new_mask.any():
        ds_append = ds_new.isel({time_dim: np.where(new_mask)[0]})
        append_len = ds_append.dims[time_dim]
        
        for var in ds_new.data_vars:
            zarr_array = root[var]
            old_shape = zarr_array.shape[0]
            new_shape = old_shape + append_len

            print(f"  Resize var {var}: {old_shape} → {new_shape}")
            zarr_array.resize(new_shape, axis=0)
            zarr_array[old_shape:new_shape] = ds_append[var].values

        zarr_time = root[time_dim]
        zarr_time.resize(new_shape)
        zarr_time[old_shape:new_shape] = ds_append[time_dim].values

    print("✅ Update Zarr selesai.")

In [7]:
fs_options = {
    "key": "moqs1u5xKGk6tRx3GbyJ",
    "secret": "OLGsgagMmgPWE6aRaBGdiQUC7U4WLFKQXIGiN821",
    "client_kwargs": {
        "endpoint_url": "172.17.0.1:9990"
    }
}

update_zarr_forecast(ds_new, "zarr-data/model-archive.zarr", fs_options)

In [6]:
fs_options?

[31mType:[39m        dict
[31mString form:[39m {'key': 'moqs1u5xKGk6tRx3GbyJ', 'secret': 'OLGsgagMmgPWE6aRaBGdiQUC7U4WLFKQXIGiN821', 'client_kwargs': {'endpoint_url': '172.17.0.1:9990'}}
[31mLength:[39m      3
[31mDocstring:[39m  
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)