# SharkPy Tutorial Notebook 🦈

Welcome to the SharkPy tutorial! SharkPy is a fun, user-friendly machine learning package that simplifies model training, prediction, explanation, and more. In this notebook, we'll walk through the key features of SharkPy using example datasets.

This notebook is designed to be run in Jupyter or Colab. Let's dive in! 🌊

## Installation

First, install SharkPy if you haven't already

```python
!pip install sharkpy

# Import Required Libraries
Import SharkPy and other necessary libraries for data handling and visualization

In [None]:
from sharkpy import Shark
import pandas as pd
import numpy as np

## Basic Usage: Regression Example
Let's start with a simple regression task using synthetic data.

### Create Synthetic Regression Data

In [None]:
# Generate sample regression data
np.random.seed(42)
data = pd.DataFrame({
    'feature1': np.random.normal(0, 1, 100),
    'feature2': np.random.normal(5, 2, 100),
    'target': np.random.normal(10, 3, 100) + np.random.normal(0, 1, 100)
})

print(data.head())

### Initialize Shark and Train a Model

In [None]:
shark = Shark()

# Train a linear regression model
shark.learn(
    data=data,
    target='target',
    project_name='Regression Demo',
    model_choice='linear_regression'
)

### Make Predictions

In [None]:
# Predict on the training data
predictions = shark.predict()
print("First 5 predictions:", predictions[:5])

# Predict on new data
new_data = pd.DataFrame({
    'feature1': [0.5, -0.5],
    'feature2': [4.0, 6.0]
})
new_predictions = shark.predict(new_data)
print("New predictions:", new_predictions)

### Report Model Performance

In [None]:
cv_results, train_metrics = shark.report(cv_folds=5)
print("CV Results:", cv_results)
print("Train Metrics:", train_metrics)

### Explain the Model

In [None]:
shark.explain(depth='simple', format='txt', export_path='regression_explanation.txt')

### Plot Model Insights

In [None]:
from sharkpy.plotting import plot_model

# Plot predictions
plot_model(shark.model, shark.features, shark.target, kind="prediction", show=True)

# Plot residuals
plot_model(shark.model, shark.features, shark.target, kind="residuals", show=True)

### Save and Load Model

In [None]:
# Save the model
save_path = shark.save_model(name='regression_model')

# Load the model in a new Shark instance
new_shark = Shark()
loaded_model = new_shark.load_model(save_path)
print("Loaded model:", loaded_model)

# Classification Example
Now, let's try classification with the Iris dataset.

### Load Iris Data

In [None]:
iris_url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
iris_data = pd.read_csv(iris_url, header=None)
iris_data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

print(iris_data.head())

### Train a Classification Model

In [None]:
shark_class = Shark()

shark_class.learn(
    data=iris_data,
    target='species',
    project_name='Iris Classification',
    model_choice='logistic_regression'
)

### Make Predictions

In [None]:
# Predict on training data
class_predictions = shark_class.predict()
print("First 5 predictions:", class_predictions[:5])

# Predict on new data (example)
new_iris = pd.DataFrame({
    'sepal_length': [5.1, 6.2],
    'sepal_width': [3.5, 2.8],
    'petal_length': [1.4, 4.3],
    'petal_width': [0.2, 1.3]
})
new_class_preds = shark_class.predict(new_iris)
print("New predictions:", new_class_preds)

### Confusion Matrix and ROC Curve

In [None]:
# Plot confusion matrix
plot_model(shark_class.model, shark_class.features, shark_class.target, kind="confusion_matrix", show=True)

# Plot ROC curve (for binary or multiclass)
plot_model(shark_class.model, shark_class.features, shark_class.target, kind="roc", show=True)

# Model Battle: Compare Multiple Models
Let's battle models on the Iris dataset!

In [None]:
battle_result = shark_class.battle(
    data=iris_data,
    target='species',
    models=['logistic_regression', 'random_forest', 'xgboost'],
    metric='accuracy',
    n_trials=10  # Fewer trials for demo
)

print("Champion model:", battle_result['champion'])
print("Score:", battle_result['score'])

# Display comparison plot
battle_result['comparison']

# Advanced: Shapash Integration
If you have Shapash installed, you can create an interactive dashboard.

In [None]:
!pip install shapash  # Install if needed

shark_class.explain_with_shapash(title_story='Iris Model Explanation', display=True)

# Custom Model and Optimization
Use a custom model or optimized boosting models.

In [None]:
from sklearn.ensemble import RandomForestClassifier

custom_shark = Shark()
custom_shark.learn(
    data=iris_data,
    target='species',
    model=RandomForestClassifier(n_estimators=50)  # Custom model
)

# Or use optimized XGBoost
xg_shark = Shark()
xg_shark.learn(
    data=iris_data,
    target='species',
    model_choice='xgboost',
    n_trials=20
)

# Available Models
Check out all available models in SharkPy.

In [None]:
available = shark.available_models()
print(available)

# Conclusion
You've now explored the core features of SharkPy! Experiment with your own datasets, try different models, and use the battle feature to find the best one. If you encounter issues or have questions, check the documentation or reach out to the community.
**Happy modeling! 🦈**