# Data Crawling NO2

Proses ini digunakan untuk mengunduh dan memproses data NO2 (Nitrogen Dioxide) dari satelit Sentinel-5P menggunakan OpenEO API.

Import library dan koneksi ke Copernicus Data Space Ecosystem

In [1]:
import openeo
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import xarray as xr

connection = openeo.connect("openeo.dataspace.copernicus.eu").authenticate_oidc()
print("Connected to Copernicus Data Space Ecosystem")

Authenticated using refresh token.
Connected to Copernicus Data Space Ecosystem


Definisi Area of Interest (AOI) wilayah Sampang dan parameter waktu (2021-2025)

In [2]:
aoi = {
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "coordinates": [
          [
            [
              113.21596949524894,
              -7.164956617756133
            ],
            [
              113.21596949524894,
              -7.212778923646425
            ],
            [
              113.29495729344416,
              -7.212778923646425
            ],
            [
              113.29495729344416,
              -7.164956617756133
            ],
            [
              113.21596949524894,
              -7.164956617756133
            ]
          ]
        ],
        "type": "Polygon"
      }
    }
  ]
}

# Definisi extent spasial dari koordinat AOI
spatial_extent = {
    "west": 113.21596949524894,
    "south": -7.212778923646425,
    "east": 113.29495729344416,
    "north": -7.164956617756133
}

start_date = "2021-01-01"
end_date = "2025-10-19"

print(f"AOI defined for coordinates: {spatial_extent}")
print(f"Time range: {start_date} to {end_date}")
print("Setup completed successfully")

AOI defined for coordinates: {'west': 113.21596949524894, 'south': -7.212778923646425, 'east': 113.29495729344416, 'north': -7.164956617756133}
Time range: 2021-01-01 to 2025-10-19
Setup completed successfully


Loading data Sentinel-5P NO2 dan agregasi temporal harian

In [3]:
print("Loading Sentinel-5P NO2 data...")

s5p_no2 = connection.load_collection(
    "SENTINEL_5P_L2",
    temporal_extent=[start_date, end_date],
    spatial_extent=spatial_extent,
    bands=["NO2"],
)

s5p_monthly = s5p_no2.aggregate_temporal_period(
    period="day",
    reducer="mean"
)

print("Data collection and aggregation configured successfully")

Loading Sentinel-5P NO2 data...
Data collection and aggregation configured successfully


Eksekusi job batch processing dan export ke file NetCDF

In [4]:
print("Starting data processing job...")

job = s5p_monthly.execute_batch(
    title="NO2 Averages 2021-2025", 
    outputfile="no2_sampang_4years.nc"
)

Starting data processing job...
0:00:00 Job 'j-25102801365948468b6de79899098335': send 'start'
0:00:13 Job 'j-25102801365948468b6de79899098335': queued (progress 0%)
0:00:19 Job 'j-25102801365948468b6de79899098335': queued (progress 0%)
0:00:25 Job 'j-25102801365948468b6de79899098335': queued (progress 0%)
0:00:34 Job 'j-25102801365948468b6de79899098335': queued (progress 0%)
0:00:44 Job 'j-25102801365948468b6de79899098335': queued (progress 0%)
0:00:56 Job 'j-25102801365948468b6de79899098335': queued (progress 0%)
0:01:12 Job 'j-25102801365948468b6de79899098335': running (progress N/A)
0:01:32 Job 'j-25102801365948468b6de79899098335': running (progress N/A)
0:01:56 Job 'j-25102801365948468b6de79899098335': running (progress N/A)
0:02:26 Job 'j-25102801365948468b6de79899098335': running (progress N/A)
0:03:04 Job 'j-25102801365948468b6de79899098335': running (progress N/A)
0:03:51 Job 'j-25102801365948468b6de79899098335': running (progress N/A)
0:04:49 Job 'j-25102801365948468b6de79899

Membaca file NetCDF dan mengkonversi ke CSV

In [8]:
import xarray as xr
import pandas as pd
import numpy as np

print("Membaca file no2_sampang_4years.nc...")
ds = xr.open_dataset("./dataset/no2_sampang_4years.nc")

print("Struktur dataset:")
print(ds)
print("\nVariabel yang tersedia:")
print(list(ds.data_vars))
print("\nDimensi:")
print(ds.dims)

print("\nMengkonversi ke DataFrame...")

df_list = []

for time_idx in range(len(ds.t)):
    time_val = pd.to_datetime(ds.t[time_idx].values)

    no2_data = ds["NO2"].isel(t=time_idx)

    for y_idx in range(len(ds.y)):
        for x_idx in range(len(ds.x)):
            lat = ds.y[y_idx].values
            lon = ds.x[x_idx].values
            no2_value = no2_data.isel(y=y_idx, x=x_idx).values

            df_list.append(
                {"t": time_val, "NO2": no2_value}
            )

df = pd.DataFrame(df_list)

print(f"\nTotal baris data: {len(df)}")
print("\nPreview data:")
print(df.head(5))
print(f"\nRentang waktu: {df['t'].min()} hingga {df['t'].max()}")

output_file = "dataset/NO2_sampang.csv"
df.to_csv(output_file, index=False)
print(f"\nData berhasil disimpan ke {output_file}")

Membaca file no2_sampang_4years.nc...
Struktur dataset:
<xarray.Dataset> Size: 42kB
Dimensions:  (t: 1729, x: 2, y: 2)
Coordinates:
  * t        (t) datetime64[ns] 14kB 2021-01-01 2021-01-02 ... 2025-10-18
  * x        (x) float64 16B 113.2 113.3
  * y        (y) float64 16B -7.182 -7.217
Data variables:
    crs      |S1 1B ...
    NO2      (t, y, x) float32 28kB ...
Attributes:
    Conventions:  CF-1.9
    institution:  Copernicus Data Space Ecosystem openEO API - 0.68.0a10.dev2...
    description:  
    title:        

Variabel yang tersedia:
['crs', 'NO2']

Dimensi:

Mengkonversi ke DataFrame...

Total baris data: 6916

Preview data:
           t  NO2
0 2021-01-01  nan
1 2021-01-01  nan
2 2021-01-01  nan
3 2021-01-01  nan
4 2021-01-02  nan

Rentang waktu: 2021-01-01 00:00:00 hingga 2025-10-18 00:00:00

Data berhasil disimpan ke dataset/NO2_sampang.csv
