# ü¶Ü Session 2: DuckDB Fundamental

**Durasi:** 90 menit  
**Dataset:** RUP Paket Penyedia 2025

## üéØ Tujuan Pembelajaran
Setelah sesi ini, Anda dapat:
1. Menjalankan SQL query dengan DuckDB
2. Melakukan SELECT, WHERE, ORDER BY
3. Menggunakan aggregate functions (SUM, COUNT, AVG)
4. Melakukan JOIN antar tabel
5. Export hasil query

## 1Ô∏è‚É£ Setup DuckDB

In [None]:
import duckdb
import pandas as pd
from pathlib import Path

# Inisialisasi DuckDB (in-memory)
conn = duckdb.connect(':memory:')
print(f"‚úÖ DuckDB version: {duckdb.__version__}")

In [None]:
# Load data dan register ke DuckDB
data_path = Path('../../../datasets/rup/RUP-PaketPenyedia-Terumumkan-2025.parquet')
df = pd.read_parquet(data_path)

# Register DataFrame sebagai tabel
conn.register('rup', df)

print(f"‚úÖ Data registered: {len(df):,} rows")

## 2Ô∏è‚É£ Query Dasar: SELECT & WHERE

In [None]:
# Query 1: Pilih 5 paket pertama
query = """
SELECT nama_paket, pagu, metode_pengadaan
FROM rup
LIMIT 5
"""

conn.execute(query).df()

In [None]:
# Query 2: Filter pagu > 1 Miliar
query = """
SELECT nama_paket, 
       pagu / 1e9 AS pagu_miliar,
       metode_pengadaan,
       nama_satker
FROM rup
WHERE pagu > 1000000000
ORDER BY pagu DESC
LIMIT 10
"""

conn.execute(query).df()

In [None]:
# Query 3: Filter dengan kondisi AND
query = """
SELECT COUNT(*) AS jumlah_paket,
       SUM(pagu) / 1e9 AS total_pagu_miliar
FROM rup
WHERE metode_pengadaan = 'Tender'
  AND pagu > 1000000000
"""

conn.execute(query).df()

## 3Ô∏è‚É£ Aggregate Functions

In [None]:
# Statistik pagu per metode pengadaan
query = """
SELECT metode_pengadaan,
       COUNT(*) AS jumlah_paket,
       SUM(pagu) / 1e9 AS total_pagu_miliar,
       AVG(pagu) / 1e6 AS rata_pagu_juta,
       MIN(pagu) AS pagu_min,
       MAX(pagu) / 1e9 AS pagu_max_miliar
FROM rup
GROUP BY metode_pengadaan
ORDER BY total_pagu_miliar DESC
"""

conn.execute(query).df()

In [None]:
# Top 10 Satker
query = """
SELECT nama_satker,
       COUNT(*) AS jumlah_paket,
       SUM(pagu) / 1e9 AS total_pagu_miliar,
       ROUND(AVG(pagu) / 1e6, 2) AS rata_pagu_juta
FROM rup
GROUP BY nama_satker
ORDER BY total_pagu_miliar DESC
LIMIT 10
"""

conn.execute(query).df()

## 4Ô∏è‚É£ Filtering dengan HAVING

In [None]:
# Satker dengan total pagu > 10 Miliar
query = """
SELECT nama_satker,
       COUNT(*) AS jumlah_paket,
       SUM(pagu) / 1e9 AS total_pagu_miliar
FROM rup
GROUP BY nama_satker
HAVING SUM(pagu) > 10000000000
ORDER BY total_pagu_miliar DESC
"""

conn.execute(query).df()

## 5Ô∏è‚É£ CASE Statement

In [None]:
# Kategorisasi paket berdasarkan nilai pagu
query = """
SELECT 
    CASE 
        WHEN pagu < 10000000 THEN 'Kecil (< 10 Juta)'
        WHEN pagu < 100000000 THEN 'Menengah (10-100 Juta)'
        WHEN pagu < 1000000000 THEN 'Besar (100 Juta - 1 M)'
        ELSE 'Sangat Besar (> 1 M)'
    END AS kategori_pagu,
    COUNT(*) AS jumlah_paket,
    SUM(pagu) / 1e9 AS total_pagu_miliar
FROM rup
GROUP BY kategori_pagu
ORDER BY total_pagu_miliar DESC
"""

conn.execute(query).df()

## 6Ô∏è‚É£ String Functions

In [None]:
# Cari paket yang mengandung kata "Belanja"
query = """
SELECT nama_paket,
       pagu / 1e6 AS pagu_juta,
       metode_pengadaan
FROM rup
WHERE LOWER(nama_paket) LIKE '%belanja%'
LIMIT 10
"""

conn.execute(query).df()

## 7Ô∏è‚É£ Export Results

In [None]:
# Export ke CSV
query = """
COPY (
    SELECT metode_pengadaan,
           COUNT(*) AS jumlah_paket,
           SUM(pagu) / 1e9 AS total_pagu_miliar
    FROM rup
    GROUP BY metode_pengadaan
) TO 'summary_metode.csv' (HEADER, DELIMITER ',')
"""

conn.execute(query)
print("‚úÖ Data exported to summary_metode.csv")

## üìä Query Kompleks: Multi-Level Aggregation

In [None]:
# Analisis metode dan jenis pengadaan
query = """
SELECT metode_pengadaan,
       jenis_pengadaan,
       COUNT(*) AS jumlah_paket,
       ROUND(SUM(pagu) / 1e9, 2) AS total_pagu_miliar,
       ROUND(AVG(pagu) / 1e6, 2) AS rata_pagu_juta
FROM rup
WHERE jenis_pengadaan IS NOT NULL
GROUP BY metode_pengadaan, jenis_pengadaan
HAVING COUNT(*) > 10
ORDER BY metode_pengadaan, total_pagu_miliar DESC
"""

result = conn.execute(query).df()
result.head(15)

## üéØ Latihan Mandiri

1. Hitung jumlah paket untuk setiap jenis pengadaan
2. Temukan 5 paket dengan nama terpanjang
3. Hitung persentase paket per metode pengadaan
4. Buat query untuk menemukan satker dengan rata-rata pagu terbesar (min 10 paket)

In [None]:
# Ruang untuk latihan

In [None]:
# Tutup koneksi
conn.close()
print("‚úÖ Connection closed")