# C60.ai Quickstart

This notebook provides a quick introduction to using the C60.ai AutoML framework.

## Installation

First, let's install the required packages. If you haven't already, install them using:

```bash
pip install -r requirements.txt
```

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris, load_diabetes
from sklearn.model_selection import train_test_split

# Import C60
from c60 import AutoML, Pipeline
from c60.core.generator import PipelineGenerator
from c60.core.evaluator import Evaluator

## Load and Prepare Data

Let's load a sample dataset and split it into training and testing sets.

In [None]:
# Load dataset
# For classification
X, y = load_iris(return_X_y=True, as_frame=True)

# For regression (uncomment to use)
# X, y = load_diabetes(return_X_y=True, as_frame=True)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y if hasattr(y, 'nunique') else None
)

print(f"Training data shape: {X_train.shape}")
print(f"Testing data shape: {X_test.shape}")

## Basic Usage

Let's create a simple AutoML instance and run it on our data.

In [None]:
# Initialize AutoML
automl = AutoML(
    task='classification',  # or 'regression'
    time_budget=300,  # 5 minutes
    metric='accuracy',  # or any other metric
    n_jobs=-1,  # use all available cores
    random_state=42
)

# Fit the model
automl.fit(X_train, y_train)

# Make predictions
y_pred = automl.predict(X_test)

# Evaluate
from sklearn.metrics import accuracy_score
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

## Advanced Usage

Let's create a more advanced pipeline with custom components.

In [None]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier

# Create a custom pipeline
pipeline = Pipeline()

# Add preprocessing steps
pipeline.add_step(
    name="preprocessing",
    estimator=SklearnPipeline([
        ("imputer", SimpleImputer(strategy="median")),
        ("scaler", StandardScaler())
    ])
)

# Add a model
pipeline.add_step(
    name="model",
    estimator=RandomForestClassifier(n_estimators=100, random_state=42)
)

# Fit the pipeline
pipeline.fit(X_train, y_train)

# Evaluate
score = pipeline.score(X_test, y_test)
print(f"Pipeline accuracy: {score:.4f}")

## Hyperparameter Tuning

Let's use the built-in optimizer to tune hyperparameters.

In [None]:
from c60.core.optimizer import Optimizer

# Initialize the optimizer
optimizer = Optimizer(
    task='classification',
    metric='accuracy',
    n_trials=50,
    random_state=42
)

# Define parameter search space
param_distributions = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Create a base model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(random_state=42)

# Run optimization
result = optimizer.optimize(
    estimator=model,
    X=X_train,
    y=y_train,
    param_distributions=param_distributions
)

print("Best parameters:", result['best_params'])
print("Best score:", result['best_value'])

## Pipeline Generation

Let's use the PipelineGenerator to automatically generate and evaluate multiple pipelines.

In [None]:
# Initialize the pipeline generator
generator = PipelineGenerator(
    task='classification',
    preprocessing=True,
    feature_selection=True,
    random_state=42
)

# Generate initial population of pipelines
pipelines = generator.generate_initial_population(
    n_pipelines=5,
    numerical_features=X_train.columns.tolist()
)

# Initialize evaluator
evaluator = Evaluator(task='classification')

# Evaluate each pipeline
for i, pipeline in enumerate(pipelines):
    score = evaluator.evaluate(pipeline, X_train, y_train)
    print(f"Pipeline {i+1} - Accuracy: {score['accuracy']:.4f}")

## Conclusion

This quickstart demonstrated the basic and advanced usage of the C60.ai AutoML framework. You can now:

1. Use the high-level `AutoML` class for automated model selection and tuning
2. Create custom pipelines with the `Pipeline` class
3. Tune hyperparameters with the `Optimizer` class
4. Generate and evaluate multiple pipelines with the `PipelineGenerator`

For more advanced usage, refer to the documentation and examples in the `examples` directory.