# PyRamEx Tutorial: ML/DL-Friendly Raman Spectroscopy Analysis

This notebook demonstrates how to use PyRamEx for Raman spectroscopic data analysis with machine learning and deep learning integration.

## Installation

```bash
pip install pyramex
```

## Quick Start

Load data, preprocess, and apply ML models in just a few lines!

In [None]:
# Import PyRamEx
from pyramex import Ramanome, load_spectra
import numpy as np

# Load data
data = load_spectra('path/to/spectra/')
print(f"Loaded {data.n_samples} spectra with {data.n_wavenumbers} wavenumber points")

## 1. Data Loading

PyRamEx supports multiple Raman spectroscopy file formats from major manufacturers.

In [None]:
# Option 1: Load single file
data = load_spectra('single_spectrum.txt')

# Option 2: Load directory with multiple files
data = load_spectra('spectra_directory/')

# Option 3: Create from NumPy arrays
spectra = np.random.randn(100, 1000)  # 100 samples, 1000 points
wavenumbers = np.linspace(200, 4000, 1000)
metadata = {'sample_id': [f'S{i}' for i in range(100)]}
data = Ramanome(spectra, wavenumbers, metadata)

## 2. Preprocessing

Apply common preprocessing steps with method chaining:

In [None]:
# Preprocess with method chaining
data_processed = data.smooth(window_size=5, polyorder=2) \
    .remove_baseline(method='polyfit', degree=3) \
    .normalize(method='minmax') \
    .cutoff(wavenumber_range=(500, 3500))

print(f"Applied {len(data_processed.processed)} preprocessing steps")
print(data_processed.processed)

## 3. Quality Control

Remove low-quality samples:

In [None]:
# Apply quality control
qc_result = data_processed.quality_control(method='icod', threshold=0.05)
print(qc_result)

# Filter bad samples
data_clean = data_processed[qc_result.good_samples]
print(f"Retained {data_clean.n_samples}/{data_processed.n_samples} samples")

## 4. Dimensionality Reduction

Reduce dimensions for visualization and feature extraction:

In [None]:
# Apply PCA
data_clean.reduce(method='pca', n_components=2)

# Plot reduction
data_clean.plot_reduction(method='pca', color_by='label')

## 5. Machine Learning Integration

### 5.1 Scikit-Learn Integration

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Convert to sklearn format
X_train, X_test, y_train, y_test = data_clean.to_sklearn_format(test_size=0.2)

# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2%}")

### 5.2 PyTorch Integration

In [None]:
import torch
from torch.utils.data import DataLoader

# Create PyTorch dataset
dataset = data_clean.to_torch_dataset()
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Create CNN model
model = create_cnn_model(
    input_length=data_clean.n_wavenumbers,
    n_classes=len(np.unique(y_train)),
    dropout=0.3
)

# Training loop (simplified)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = torch.nn.CrossEntropyLoss()

for epoch in range(10):
    for batch_x, batch_y in dataloader:
        optimizer.zero_grad()
        outputs = model(batch_x)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()

### 5.3 TensorFlow Integration

In [None]:
import tensorflow as tf

# Create TensorFlow dataset
dataset = data_clean.to_tf_dataset(batch_size=32, shuffle=True)

# Build simple model
model = tf.keras.Sequential([
    tf.keras.layers.Conv1D(32, 5, activation='relu', input_shape=(1, data_clean.n_wavenumbers)),
    tf.keras.layers.MaxPooling1D(2),
    tf.keras.layers.Conv1D(64, 5, activation='relu'),
    tf.keras.layers.MaxPooling1D(2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(len(np.unique(y_train)), activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train
model.fit(dataset, epochs=10)

## 6. Feature Engineering

Extract specific spectral features:

In [None]:
from pyramex.features import extract_band_intensity, calculate_cdr

# Extract band intensities
bands = [(2000, 2250), (2750, 3050), 1450, 1665]
features = extract_band_intensity(data_clean, bands)
print(f"Extracted {features.shape[1]} band features")

# Calculate CDR ratio
cdr = calculate_cdr(data_clean, band1=(2000, 2250), band2=(2750, 3050))
print(f"CDR range: {cdr.min():.3f} to {cdr.max():.3f}")

## 7. Visualization

Interactive and static visualizations:

In [None]:
# Plot spectra
data_clean.plot(n_samples=10)

# Plot preprocessing effects
data_clean.plot_preprocessing_steps()

# Plot quality control results
data_clean.plot_quality_control(method='icod')

# Interactive plot (requires plotly)
data_clean.interactive_plot()

## 8. Complete Workflow Example

End-to-end pipeline from raw data to trained model:

In [None]:
# 1. Load data
data = load_spectra('path/to/data/')

# 2. Preprocess
data = data.smooth().remove_baseline().normalize()

# 3. Quality control
qc = data.quality_control(method='icod')
data = data[qc.good_samples]

# 4. Dimensionality reduction (optional)
data.reduce(method='pca', n_components=50)

# 5. Convert to ML format
X_train, X_test, y_train, y_test = data.to_sklearn_format()

# 6. Train model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

# 7. Evaluate
print(f"Test accuracy: {model.score(X_test, y_test):.2%}")

## Summary

PyRamEx provides:
- ✅ Easy data loading from multiple formats
- ✅ Method chaining for preprocessing
- ✅ Multiple QC methods
- ✅ Seamless ML/DL integration
- ✅ Interactive visualizations
- ✅ GPU acceleration support (optional)

For more information, see:
- GitHub: https://github.com/openclaw/pyramex
- Original RamEx: https://github.com/qibebt-bioinfo/RamEx