# 🚀 Kolosal AutoML Tutorial: Optimize Your Configuration

This section sets up the key modules from the `kolosal_automl` library that enable automated machine learning.

### Components
- **`MLTrainingEngine`**: Manages model selection, training, evaluation, and reporting.
- **`DeviceOptimizer`**: Detects available hardware (CPU/GPU) and suggests the best configuration for performance.

### Outcome
All necessary classes are imported and ready for instantiation.


In [1]:
from kolosal_automl.modules.engine.train_engine import MLTrainingEngine
from kolosal_automl.modules.device_optimizer import DeviceOptimizer

  from .autonotebook import tqdm as notebook_tqdm


# ⚙️ Training Engine Configuration

We now configure the AutoML pipeline to optimize for regression tasks using your machine’s hardware resources.

### Steps
- **Device Optimization**: Automatically detect and apply the best hardware configuration using `DeviceOptimizer`.
- **Task Type**: Set to `regression` to predict continuous numerical values.
- **Engine Initialization**: Instantiate `MLTrainingEngine` with the optimal config.

### Outcome
A `MLTrainingEngine` object is ready to run regression training using an optimized setup.


In [None]:
optimizer = DeviceOptimizer(optimization_mode="performance")
training_config = optimizer.get_optimal_training_engine_config()
training_config.task_type = "regression"
training_engine = MLTrainingEngine(training_config)

2025-05-20 14:45:23,759 - INFO - cpu_device_optimizer - System Overview: Laptop-Evint (Windows 10)
2025-05-20 14:45:23,759 - INFO - cpu_device_optimizer - Environment: cloud
2025-05-20 14:45:23,759 - INFO - cpu_device_optimizer - Optimization Mode: performance
2025-05-20 14:45:23,759 - INFO - cpu_device_optimizer - --------------------------------------------------
2025-05-20 14:45:23,759 - INFO - cpu_device_optimizer - CPU: Intel64 Family 6 Model 154 Stepping 3, GenuineIntel
2025-05-20 14:45:23,771 - INFO - cpu_device_optimizer - CPU Cores: 14 physical, 20 logical
2025-05-20 14:45:23,771 - INFO - cpu_device_optimizer - CPU Freq (MHz): Current=2300, Min=0, Max=2300
2025-05-20 14:45:23,771 - INFO - cpu_device_optimizer - CPU Features: AVX=False, AVX2=False, AVX512=False, SSE4=True, FMA=True, NEON=False
2025-05-20 14:45:23,771 - INFO - cpu_device_optimizer - Memory: 63.67 GB total, 57.30 GB usable, 43.67 GB available
2025-05-20 14:45:23,771 - INFO - cpu_device_optimizer - Swap Memory: 4.

2025-05-20 14:45:34,122 - kolosal_automl.modules.engine.inference_engine.InferenceEngine - INFO - CPU usage (71.0%) below threshold, disabling throttling
2025-05-20 14:45:34,122 - INFO - kolosal_automl.modules.engine.inference_engine.InferenceEngine - CPU usage (71.0%) below threshold, disabling throttling


# 🧪 Synthetic Dataset Generation

To test our AutoML pipeline, we create a clean, synthetic dataset for regression modeling.

### Details
- **Method**: `make_regression` from scikit-learn.
- **Size**: 1,000 samples × 20 features.
- **Noise**: Low (0.1), ensuring the model learns clear patterns.

### Outcome
- `X`: Feature matrix (1000 × 20)
- `y`: Target vector (1000,)


In [3]:
from sklearn.datasets import make_regression

data = make_regression(n_samples=1000, n_features=20, noise=0.1)
X, y= data

# 🚀 Model Training Execution

With the dataset and engine ready, we now execute the automated training workflow.

### What Happens Under the Hood
- Preprocessing: Scaling, transformation, or encoding (if needed)
- Model Selection: Tries multiple regression models and tunes them
- Training: Uses the best model and trains with optimal config
- Evaluation: Computes metrics (e.g., MAE, RMSE) internally

### Outcome
An optimized regression model is trained and stored in `training_engine`, ready for evaluation and prediction.


In [4]:
training_engine.train_model(X, y)

2025-05-20 14:45:24,465 - INFO - MLTrainingEngine - Initialized random_forest model for regression
2025-05-20 14:45:24,469 - INFO - Experiment_1747727123 - Started experiment 1747727123
2025-05-20 14:45:24,469 - INFO - Experiment_1747727123 - Configuration:
{
  "task_type": "regression",
  "random_state": 42,
  "n_jobs": 18,
  "verbose": 0,
  "cv_folds": 5,
  "test_size": 0.2,
  "stratify": true,
  "optimization_strategy": "hyper_optimization_x",
  "optimization_iterations": 75,
  "optimization_timeout": null,
  "early_stopping": true,
  "early_stopping_rounds": 10,
  "early_stopping_metric": null,
  "feature_selection": false,
  "feature_selection_method": "mutual_info",
  "feature_selection_k": null,
  "feature_importance_threshold": 0.01,
  "model_path": "model_registry\\training_models",
  "model_registry_url": null,
  "auto_version_models": true,
  "experiment_tracking": true,
  "experiment_tracking_platform": "mlflow",
  "experiment_tracking_config": {},
  "use_intel_optimization

{'model_name': 'random_forest_1747727124',
 'model': RandomForestRegressor(),
 'params': None,
 'metrics': {'prediction_time': 0.0, 'score': 0.8021514516313261},
 'feature_importance': array([0.00668669, 0.00770637, 0.237793  , 0.0071763 , 0.01317969,
        0.00825957, 0.00663323, 0.00886302, 0.00755073, 0.00761813,
        0.01149559, 0.12530995, 0.00723957, 0.00793029, 0.07526502,
        0.23970139, 0.20028526, 0.00663688, 0.00716595, 0.00750336]),
 'training_time': 55.826399087905884}