This page describes optimization of pipeline for different problems and using different models.
This covers all scikit-learng models, catboost, lightgbm and xgboost
>>> from ai4water.datasets import busan_beach
>>> from autotab import OptimizePipeline
>>> data = busan_beach()
>>> input_features = data.columns.tolist()[0:-1]
>>> output_features = data.columns.tolist()[-1:]
>>> pl = OptimizePipeline(
... inputs_to_transform=input_features,
... outputs_to_transform=output_features,
... models=["LinearRegression",
... "LassoLars",
... "Lasso",
... "RandomForestRegressor",
... "HistGradientBoostingRegressor",
... "CatBoostRegressor",
... "XGBRegressor",
... "LGBMRegressor",
... "GradientBoostingRegressor",
... "ExtraTreeRegressor",
... "ExtraTreesRegressor"
... ],
... parent_iterations=30,
... child_iterations=12,
... parent_algorithm='bayes',
... child_algorithm='bayes',
... eval_metric='mse',
... monitor=['r2', 'nse'],
... input_features=input_features,
... output_features=output_features,
... split_random=True,
... )
>>> pl.fit(data=data)
>>> pl.post_fit(data=data)
This covers all scikit-learng models, catboost, lightgbm and xgboost
>>> from ai4water.datasets import MtropicsLaos
>>> from autotab import OptimizePipeline
>>> data = MtropicsLaos().make_classification(lookback_steps=1)
>>> input_features = data.columns.tolist()[0:-1]
>>> output_features = data.columns.tolist()[-1:]
>>> pl = OptimizePipeline(
... mode="classification",
... eval_metric="accuracy",
... inputs_to_transform=input_features,
... outputs_to_transform=output_features,
... models=["ExtraTreeClassifier",
... "RandomForestClassifier",
... "XGBClassifier",
... "CatBoostClassifier",
... "LGBMClassifier",
... "GradientBoostingClassifier",
... "HistGradientBoostingClassifier",
... "ExtraTreesClassifier",
... "RidgeClassifier",
... "SVC",
... "KNeighborsClassifier",
... ],
... parent_iterations=30,
... child_iterations=12,
... parent_algorithm='bayes',
... child_algorithm='bayes',
... monitor=['accuracy'],
... input_features=input_features,
... output_features=output_features,
... split_random=True,
... )
>>> pl.fit(data=data)
>>> pl.post_fit(data=data)
This covers MLP, LSTM, CNN, CNNLSTM, TFT, TCN, LSTMAutoEncoder for regression .
Each model can consist of stacks of layers. For example MLP can consist of
stacks of Dense layers. The number of layers are also optimized. When using
deep learning models, also set the value fo epochs
because the default
value is 14 which is too small for a deep learning model. Also consider
setting values for batch_size
and lr
.
>>> from ai4water.datasets import busan_beach
>>> from autotab import OptimizePipeline
>>> data = busan_beach()
>>> input_features = data.columns.tolist()[0:-1]
>>> output_features = data.columns.tolist()[-1:]
>>> pl = OptimizePipeline(
... inputs_to_transform=input_features,
... outputs_to_transform=output_features,
... models=["MLP", "LSTM", "CNN", "CNNLSTM", "TFT", "TCN", "LSTMAutoEncoder"],
... parent_iterations=30,
... child_iterations=12,
... parent_algorithm='bayes',
... child_algorithm='bayes',
... eval_metric='mse',
... monitor=['r2', 'nse'],
... input_features=input_features,
... output_features=output_features,
... split_random=True,
... epochs=100,
... )
>>> pl.fit(data=data)
>>> pl.post_fit(data=data)
This covers MLP, LSTM, CNN, CNNLSTM, TFT, TCN, LSTMAutoEncoder for classification problem. Each model can consist of stacks of layers. For example MLP can consist of stacks of Dense layers. The number of layers are also optimized.
>>> from ai4water.datasets import MtropicsLaos
>>> from autotab import OptimizePipeline
>>> data = MtropicsLaos().make_classification(lookback_steps=1,)
>>> input_features = data.columns.tolist()[0:-1]
>>> output_features = data.columns.tolist()[-1:]
>>> pl = OptimizePipeline(
... mode="classification",
... eval_metric="accuracy",
... inputs_to_transform=input_features,
... outputs_to_transform=output_features,
... models=["MLP", "CNN"],
... parent_iterations=30,
... child_iterations=12,
... parent_algorithm='bayes',
... child_algorithm='bayes',
... monitor=['f1_score'],
... input_features=input_features,
... output_features=output_features,
... split_random=True,
... epochs=100,
... )
>>> pl.fit(data=data)
>>> pl.post_fit(data=data)
For multi-class classification with neural networks, we must set
num_classes
argument to some value greater than 2.