# Argon Jupyter Integration Example

This notebook demonstrates how to use Argon's Jupyter integration to track experiments, manage data, and create reproducible ML workflows.

## Setup

First, let's load the Argon extension and initialize our project.

In [None]:
# Load the Argon extension
%load_ext argon.integrations.jupyter_magic

In [None]:
# Initialize Argon for this notebook
%argon_init ml-notebook-demo

In [None]:
# Set our working branch
%argon_branch random-forest-experiment

## Data Preparation

Let's create some sample data and save it to our Argon branch.

In [None]:
%%argon_track
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

# Create sample data
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=10,
    n_redundant=5,
    n_clusters_per_class=1,
    random_state=42
)

# Convert to DataFrame for easier handling
feature_names = [f'feature_{i}' for i in range(X.shape[1])]
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

print(f"Dataset shape: {df.shape}")
print(f"Target distribution: {df['target'].value_counts().to_dict()}")
df.head()

In [None]:
# Save the dataset to our Argon branch
from argon.integrations.jupyter import get_argon_integration

integration = get_argon_integration()
dataset_path = integration.save_dataset(
    df, 
    "classification_dataset", 
    "Synthetic classification dataset with 1000 samples and 20 features"
)

print(f"Dataset saved to: {dataset_path}")

## Experiment Configuration

Let's set up our experiment parameters.

In [None]:
# Log experiment parameters
%argon_params n_estimators=100 max_depth=10 random_state=42 test_size=0.2

In [None]:
# Create a checkpoint after data preparation
%argon_checkpoint data_prepared --description "Dataset created and parameters set"

## Model Training

Now let's train our Random Forest model.

In [None]:
%%argon_track
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    df.drop('target', axis=1), 
    df['target'], 
    test_size=0.2, 
    random_state=42
)

print(f"Training set shape: {X_train.shape}")
print(f"Test set shape: {X_test.shape}")

In [None]:
%%argon_track
# Train the model
model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42
)

model.fit(X_train, y_train)
print("Model training completed!")

In [None]:
# Save the trained model
model_path = integration.save_model(
    model, 
    "random_forest_v1", 
    "Random Forest classifier with 100 estimators",
    metadata={
        "algorithm": "RandomForest",
        "framework": "scikit-learn",
        "training_samples": len(X_train)
    }
)

print(f"Model saved to: {model_path}")

## Model Evaluation

Let's evaluate our model and log the metrics.

In [None]:
%%argon_track
# Make predictions
y_pred = model.predict(X_test)

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")

In [None]:
# Log metrics to Argon
%argon_metrics accuracy=0.95 precision=0.94 recall=0.95

In [None]:
# Create a checkpoint after model training
%argon_checkpoint model_trained --description "Random Forest model trained and evaluated"

## Visualization

Let's create some visualizations to understand our model better.

In [None]:
%%argon_track
# Plot feature importance
feature_importance = pd.DataFrame({
    'feature': feature_names,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

plt.figure(figsize=(10, 6))
plt.barh(range(10), feature_importance['importance'][:10])
plt.yticks(range(10), feature_importance['feature'][:10])
plt.xlabel('Feature Importance')
plt.title('Top 10 Most Important Features')
plt.tight_layout()
plt.show()

# Save feature importance data
feature_importance_path = integration.save_dataset(
    feature_importance, 
    "feature_importance", 
    "Feature importance scores from Random Forest model"
)
print(f"Feature importance saved to: {feature_importance_path}")

## Hyperparameter Tuning Experiment

Let's create a new branch for hyperparameter tuning.

In [None]:
# Create a new branch for hyperparameter tuning
%argon_branch hyperparameter_tuning

In [None]:
# Set different parameters for this experiment
%argon_params n_estimators=200 max_depth=15 random_state=42 test_size=0.2

In [None]:
%%argon_track
# Load the dataset from the previous branch
df_loaded = integration.load_dataset("classification_dataset")

# Train model with different parameters
X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(
    df_loaded.drop('target', axis=1), 
    df_loaded['target'], 
    test_size=0.2, 
    random_state=42
)

model_2 = RandomForestClassifier(
    n_estimators=200,
    max_depth=15,
    random_state=42
)

model_2.fit(X_train_2, y_train_2)
y_pred_2 = model_2.predict(X_test_2)

# Calculate metrics
accuracy_2 = accuracy_score(y_test_2, y_pred_2)
precision_2 = precision_score(y_test_2, y_pred_2, average='weighted')
recall_2 = recall_score(y_test_2, y_pred_2, average='weighted')

print(f"Accuracy: {accuracy_2:.4f}")
print(f"Precision: {precision_2:.4f}")
print(f"Recall: {recall_2:.4f}")

In [None]:
# Log metrics for the tuned model
%argon_metrics accuracy=0.96 precision=0.95 recall=0.96

In [None]:
# Save the tuned model
model_2_path = integration.save_model(
    model_2, 
    "random_forest_v2_tuned", 
    "Random Forest classifier with 200 estimators - hyperparameter tuned",
    metadata={
        "algorithm": "RandomForest",
        "framework": "scikit-learn",
        "training_samples": len(X_train_2),
        "tuned": True
    }
)

print(f"Tuned model saved to: {model_2_path}")

## Experiment Comparison

Let's compare our two experiments.

In [None]:
# Compare experiments across branches
%argon_compare random-forest-experiment hyperparameter_tuning

## Status and Export

Let's check our current status and export our results.

In [None]:
# Check current status
%argon_status

In [None]:
# Export notebook results
integration.export_notebook_results("notebook_results.json")
print("Notebook results exported to notebook_results.json")

## Summary

This notebook demonstrated:

1. **Initialization**: Setting up Argon for notebook use
2. **Branch Management**: Creating and switching between experiment branches
3. **Data Tracking**: Saving and loading datasets with version control
4. **Parameter Logging**: Tracking experiment parameters
5. **Model Management**: Saving and loading trained models
6. **Metrics Tracking**: Logging and comparing experiment metrics
7. **Checkpoints**: Creating snapshots of experiment state
8. **Cell Tracking**: Monitoring individual cell executions
9. **Experiment Comparison**: Comparing results across different branches
10. **Export**: Exporting experiment results for analysis

### Key Benefits:
- **Reproducibility**: All experiments are tracked and can be reproduced
- **Version Control**: Data and models are versioned alongside code
- **Collaboration**: Teams can share and compare experiments easily
- **Organization**: Experiments are organized in logical branches
- **Integration**: Seamless integration with existing ML workflows

### Next Steps:
- Try integrating with MLflow, DVC, or W&B for additional tracking
- Experiment with different models and parameters
- Share your experiments with team members
- Use the exported data for further analysis