## Step 1: Setup and Imports

Import required libraries for data loading, model training, and visualization.

In [None]:
# Data and modeling
import matplotlib.pyplot as plt

# FoodSpec API
from foodspec.apps.oils import run_oil_authentication_quickstart
from foodspec.chemometrics.pca import run_pca
from foodspec.data.loader import load_example_oils
from foodspec.viz.classification import plot_confusion_matrix
from foodspec.viz.pca import plot_pca_scores

print("✓ All imports successful")

## Step 2: Load Example Oils Dataset

FoodSpec includes a built-in synthetic oils dataset for demonstration. It contains Raman spectra of 4 different oils.

In [None]:
# Load the example oils dataset
print("Loading example oils dataset...")
fs = load_example_oils()

# Explore the dataset structure
print("\nDataset Summary:")
print(f"  Shape: {len(fs.data)} samples × {fs.data.x.shape[1]} wavenumbers")
print(f"  Modality: {fs.data.modality.upper()}")
print(f"  Wavenumber range: {fs.data.wavenumbers[0]:.0f} - {fs.data.wavenumbers[-1]:.0f} cm⁻¹")

# Class distribution
oil_counts = fs.data.metadata['oil_type'].value_counts()
print("\nOil Type Distribution:")
print(oil_counts)
print(f"\nClasses: {sorted(fs.data.metadata['oil_type'].unique())}")

## Step 3: Train Classifier with Cross-Validation

Run the oil authentication workflow, which:
1. Preprocesses spectra (baseline correction, smoothing, normalization)
2. Extracts features (peak heights and ratios)
3. Trains a classifier with K-fold cross-validation
4. Reports metrics (accuracy, precision, recall, F1)

In [None]:
print("Training classifier with cross-validation...\n")
result = run_oil_authentication_quickstart(fs, label_column="oil_type")

print("\n" + "="*60)
print("CROSS-VALIDATION METRICS")
print("="*60)
print(result.cv_metrics)
print("\nInterpretation:")
print("  - Each row is one fold (1, 2, 3) + mean and std")
print("  - Mean accuracy across folds indicates model reliability")
print("  - Low std indicates stable performance across folds")

## Step 4: Visualize Confusion Matrix

The confusion matrix shows how many samples from each oil class are correctly classified vs. misclassified.

In [None]:
# Create confusion matrix figure
fig, ax = plt.subplots(figsize=(8, 6))
ax_cm = plot_confusion_matrix(result.confusion_matrix, result.class_labels)
plt.title("Oil Classification: Confusion Matrix", fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("  - Diagonal elements = correct classifications")
print("  - Off-diagonal = misclassifications between oil types")
print("  - High diagonal values indicate good model discrimination")

## Step 5: Explore Data Structure with PCA

PCA (Principal Component Analysis) reduces dimensionality and visualizes how oils cluster in lower-dimensional space.

In [None]:
# Run PCA on raw spectra
print("Running PCA on spectral data...")
pca, res = run_pca(fs.x, n_components=2)

# Create PCA plot
fig, ax = plt.subplots(figsize=(8, 6))
ax_pca = plot_pca_scores(res.scores, labels=fs.data.metadata["oil_type"])
plt.title("Oil Spectra: PCA Scores Plot (First 2 Components)", fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nPCA Variance Explained:")
for i, var in enumerate(pca.explained_variance_ratio_[:2]):
    print(f"  PC{i+1}: {var*100:.1f}%")
print(f"  Total: {sum(pca.explained_variance_ratio_[:2])*100:.1f}%")
print("\nInterpretation:")
print("  - Well-separated clusters → oils have distinct spectral signatures")
print("  - Overlapping clusters → oils have similar spectral features")
print("  - Variance explained indicates how much info is in first 2 PCs")

## Step 6: Key Takeaways

### What We Learned:

1. **Data Loading:** FoodSpec makes it easy to load spectroscopy data with metadata
2. **Model Training:** The quickstart API handles preprocessing + training + cross-validation
3. **Performance Metrics:** Cross-validation ensures model reliability (not just accuracy on one set)
4. **Visualization:** Confusion matrices and PCA plots are essential for model interpretation
5. **Classification:** When oils separate well in PCA, they have distinct spectral fingerprints

### Real-World Applications:

- **Food Authenticity:** Detect adulteration or mislabeling in olive oils
- **Quality Control:** Monitor oils during storage or processing
- **Rapid Testing:** Spectral analysis is faster than chemical reference methods
- **Traceability:** Link oils to origin or production method

### Next Steps:

- Explore other examples: heating stability, mixture analysis, hyperspectral mapping
- Try the full protocol-based workflow for customizable analyses
- Build your own dataset: load your Raman/NIR/MIR spectra in CSV format