# 01 - Data Extraction
## Nusantara Food Watch - Extract Data from Database

**Purpose:** Pull data from PostgreSQL/Supabase database

**Input:** Database (harga_pangan table)

**Output:** CSV files in `data/interim/`

---

## Setup

In [None]:
# Add project root to Python path
import sys
from pathlib import Path

project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

print(f"üìÅ Project root: {project_root}")

In [None]:
# Standard imports
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Our custom utilities
from src.data_analysis.utils import (
    DataLoader, 
    DataSaver, 
    load_data, 
    save_csv,
    setup_plot_style,
    check_missing_values
)

from src.data_analysis.config import (
    INTERIM_DIR, 
    PROCESSED_DIR, 
    FIGURES_DIR,
    DEFAULT_COMMODITIES,
    MARKET_TYPES
)

# Setup plotting style
setup_plot_style()
%matplotlib inline

print("‚úÖ Imports complete!")
print(f"\nüìÅ Output directories:")
print(f"   Interim: {INTERIM_DIR}")
print(f"   Processed: {PROCESSED_DIR}")
print(f"   Figures: {FIGURES_DIR}")

## Configuration

In [None]:
# Date range for extraction
START_DATE = '2017-01-01'
END_DATE = '2025-11-28'

# Commodities to extract (or None for all)
COMMODITIES = None  # None = all commodities, or ['cat_1', 'cat_2', ...]

# Market types to include
MARKET_TYPE_IDS = [1, 2, 3, 4]  # 1=Traditional, 2=Modern, 3=Wholesale, 4=Producer

print(f"üìÖ Date range: {START_DATE} to {END_DATE}")
print(f"üõí Commodities: {'All' if COMMODITIES is None else len(COMMODITIES)}")
print(f"üè™ Market types: {len(MARKET_TYPE_IDS)}")

---
## Your Analysis Here

Use the cells below for your data extraction logic.

In [None]:
# Example: Load data from database
df = load_data("SELECT * FROM harga_pangan")

---
## Save Results

In [None]:
# Example: Save to interim
# save_csv(df, 'extracted_data.csv', processed=False)