# ü§ñ Model Integration

<div style="background-color: #e3f2fd; padding: 15px; border-radius: 5px; border-left: 5px solid #2196F3;">
<b>üìì Information</b><br>
<b>Level:</b> Basic<br>
<b>Time:</b> 15 minutes<br>
<b>Dataset:</b> Digits (sklearn)
</div>

## üéØ Objectives
- ‚úÖ Integrate model with DBDataset
- ‚úÖ Automatic predictions
- ‚úÖ Different model types

In [None]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from deepbridge import DBDataset

# Load data
digits = load_digits()
df = pd.DataFrame(digits.data)
df['target'] = digits.target

print(f"Dataset: {df.shape}")
print(f"Classes: {df['target'].nunique()}")

## Train Model

In [None]:
# Separate data
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train
clf = RandomForestClassifier(n_estimators=50, random_state=42)
clf.fit(X_train, y_train)

print(f"‚úÖ Model trained!")
print(f"Accuracy: {clf.score(X_test, y_test):.3f}")

## DBDataset with Model

In [None]:
# Create DBDataset WITH model
dataset = DBDataset(
    data=df,
    target_column='target',
    model=clf,  # ‚Üê Integrated model!
    test_size=0.2,
    random_state=42
)

print("‚úÖ DBDataset with model created!")
print(f"\nüéâ Magic: Automatic predictions!")
print(f"Train predictions available: {dataset.train_predictions is not None}")
print(f"Test predictions available: {dataset.test_predictions is not None}")

## Access Predictions

In [None]:
# Test predictions (probabilities)
test_preds = dataset.test_predictions

print(f"üìä Predictions (probabilities):")
print(f"Shape: {test_preds.shape}")
print(f"Columns: {test_preds.columns.tolist()[:5]}...")  # First 5
print(f"\nFirst 3 predictions:")
display(test_preds.head(3))

## Different Model Types

In [None]:
# DBDataset works with:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

models = {
    'LogisticRegression': LogisticRegression(max_iter=1000, random_state=42),
    'DecisionTree': DecisionTreeClassifier(random_state=42),
    'RandomForest': RandomForestClassifier(n_estimators=50, random_state=42)
}

print("ü§ñ Testing different models:\n")
for name, model in models.items():
    model.fit(X_train, y_train)
    ds = DBDataset(data=df, target_column='target', model=model, test_size=0.2, random_state=42)
    print(f"{name}: ‚úÖ Predictions generated ({ds.test_predictions.shape})")

print("\nüí° DBDataset works with any sklearn-compatible model!")

## Visualize Probabilities

In [None]:
import matplotlib.pyplot as plt

# Get probabilities for class 0
probs_class_0 = test_preds.iloc[:, 0]

plt.figure(figsize=(10, 4))
plt.hist(probs_class_0, bins=30, edgecolor='black', alpha=0.7)
plt.xlabel('Probability (Class 0)')
plt.ylabel('Frequency')
plt.title('Probability Distribution')
plt.grid(axis='y', alpha=0.3)
plt.show()

## üéâ Conclusion

### What you learned:
- ‚úÖ Integrate model with DBDataset (`model=` parameter)
- ‚úÖ Predictions generated automatically
- ‚úÖ Easy access via `.train_predictions` and `.test_predictions`
- ‚úÖ Works with any sklearn-compatible model

### Main Benefits:
1. üöÄ **Automation** - No need to generate predictions manually
2. üìä **Consistency** - Predictions always available
3. üîÑ **Reusability** - Predictions cached, no re-calculation
4. ‚úÖ **Validation** - Ready for validation tests!

**Next:** Explore validation tests in the `03_validation_tests/` notebooks