# 03 - Exploratory Data Analysis (EDA)
## Nusantara Food Watch - Explore and Understand Data

**Purpose:** Explore patterns, distributions, correlations in cleaned data

**Input:** Cleaned CSV from `data/processed/`

**Output:** 
- Summary statistics
- Exploratory charts in `reports/figures/`
- Analysis insights

---

## Setup

In [None]:
# Add project root to Python path
import sys
from pathlib import Path

project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

print(f"üìÅ Project root: {project_root}")

In [None]:
# Standard imports
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Our custom utilities
from src.data_analysis.utils import (
    DataSaver,
    setup_plot_style
)

from src.data_analysis.config import (
    INTERIM_DIR,
    PROCESSED_DIR, 
    FIGURES_DIR
)

# Setup plotting style
setup_plot_style()
%matplotlib inline

# For better display
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print("‚úÖ Imports complete!")
print(f"\nüìÅ Working directories:")
print(f"   Input (Processed): {PROCESSED_DIR}")
print(f"   Output (Figures): {FIGURES_DIR}")

## Configuration

In [None]:
# Input file from processed folder
INPUT_FILE = 'cleaned_data.csv'  # Change this to your processed file

# Figure settings
FIGURE_FORMAT = 'png'
FIGURE_DPI = 300
SAVE_FIGURES = True

print(f"üì• Input: {INPUT_FILE}")
print(f"üìä Figure format: {FIGURE_FORMAT}")
print(f"üìä Figure DPI: {FIGURE_DPI}")
print(f"üíæ Save figures: {SAVE_FIGURES}")

## Load Data

In [None]:
# Load from processed folder
df = pd.read_csv(PROCESSED_DIR / INPUT_FILE)

# Convert date column if exists
if 'tanggal' in df.columns:
    df['tanggal'] = pd.to_datetime(df['tanggal'])

print(f"‚úÖ Loaded {len(df):,} records")
print(f"üìä Shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")

In [None]:
# Preview
df.head()

In [None]:
# Basic info
df.info()

---
## Your Analysis Here

Use the cells below for your exploratory analysis.

In [None]:
# Example: Summary statistics
# df.describe()

In [None]:
# Example: Distribution plot
# fig, ax = plt.subplots(figsize=(12, 6))
# df['harga'].hist(bins=50, ax=ax)
# ax.set_title('Price Distribution')
# ax.set_xlabel('Price (Rp)')
# ax.set_ylabel('Frequency')
# plt.show()

---
## Save Figures

In [None]:
# Example: Save figure
# if SAVE_FIGURES:
#     saver = DataSaver()
#     saver.save_figure(fig, 'eda_distribution.png', dpi=FIGURE_DPI)