# **AutoML:** FLAML
FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection method invented by Microsoft Research.

### **Installation:**
For regular use **pip install flaml** <br>
For Jupyter notebook **pip install flaml[notebook]**

## **CLASSIFICATION EXAMPLE**

### Import Libraries

In [53]:
import numpy as np
import pandas as pd
import seaborn as sns
from flaml import AutoML
from sklearn.model_selection import train_test_split

### Load Dataset

In [54]:
df = sns.load_dataset("iris")
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


### Data Splitting

In [58]:
X = df.iloc[:,:4]
y = df.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(105, 4) (45, 4) (105,) (45,)


### Building a Classification Model

In [64]:
automl_settings = {
    "time_budget": 60,  # Seconds
    "metric": 'accuracy', # Evaluation Metric
    "task": 'classification' # Supervised ML Task
}
autoML = AutoML()
autoML.fit(X_train, y_train, **automl_settings)

print(f"BEST MODEL:\n{autoML.model.estimator}")
print(f"ACCURACY SCORE: {autoML.score(X_test, y_test)}")

BEST MODEL:
ExtraTreesClassifier(criterion='entropy', max_features=0.7148468554889633,
                     max_leaf_nodes=7, n_estimators=5, n_jobs=-1,
                     random_state=12032022)
ACCURACY SCORE: 0.9555555555555556


---
## **REGRESSION EXAMPLE**

### Import Libraries

In [65]:
import numpy as np
import pandas as pd
import seaborn as sns
from flaml import AutoML
from sklearn.model_selection import train_test_split

### Load Dataset

In [67]:
df = sns.load_dataset("taxis")
df.head()

Unnamed: 0,pickup,dropoff,passengers,distance,fare,tip,tolls,total,color,payment,pickup_zone,dropoff_zone,pickup_borough,dropoff_borough
0,2019-03-23 20:21:09,2019-03-23 20:27:24,1,1.6,7.0,2.15,0.0,12.95,yellow,credit card,Lenox Hill West,UN/Turtle Bay South,Manhattan,Manhattan
1,2019-03-04 16:11:55,2019-03-04 16:19:00,1,0.79,5.0,0.0,0.0,9.3,yellow,cash,Upper West Side South,Upper West Side South,Manhattan,Manhattan
2,2019-03-27 17:53:01,2019-03-27 18:00:25,1,1.37,7.5,2.36,0.0,14.16,yellow,credit card,Alphabet City,West Village,Manhattan,Manhattan
3,2019-03-10 01:23:59,2019-03-10 01:49:51,1,7.7,27.0,6.15,0.0,36.95,yellow,credit card,Hudson Sq,Yorkville West,Manhattan,Manhattan
4,2019-03-30 13:27:42,2019-03-30 13:37:14,3,2.16,9.0,1.1,0.0,13.4,yellow,credit card,Midtown East,Yorkville West,Manhattan,Manhattan


### Data Splitting

In [68]:
X = df.iloc[:, 2:7]
y = df.iloc[:, 7]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(4503, 5) (1930, 5) (4503,) (1930,)


### Building a Regression Model

In [71]:
automl_settings = {
    "time_budget": 60,  # Seconds
    "metric": 'r2', # Evaluation Metric
    "task": 'regression' # Supervised ML Task
}
autoML = AutoML()
autoML.fit(X_train, y_train, **automl_settings)

print(f"BEST MODEL:\n{autoML.model.estimator}")
print(f"ACCURACY SCORE: {autoML.score(X_test, y_test)}")

BEST MODEL:
LGBMRegressor(colsample_bytree=0.763007791741338,
              learning_rate=0.16645809713264254, max_bin=1023,
              min_child_samples=6, n_estimators=70, num_leaves=14,
              reg_alpha=0.0009765625, reg_lambda=0.10626868868028042,
              verbose=-1)
ACCURACY SCORE: 0.9903735592952132


---
## **MULTI-OUTPUT EXAMPLE**

In [None]:
from flaml import AutoML
from sklearn.multioutput import MultiOutputClassifier, MultiOutputRegressor

automl_settings = {
    "time_budget": 1,  # Seconds
    "metric": 'accuracy', # Evaluation Metric
    "estimator_list": ["extra_tree", "rf"],
    "task": 'classification' # Supervised ML Task
}
# autoML = AutoML()
autoML = MultiOutputClassifier(AutoML())
autoML.fit(X_train_encoded.values, y_train_encoded, **automl_settings)

---
## **CUSTOM ESTIMATOR**
- **'lgbm'**: LGBMEstimator
- **'xgboost'**: XGBoostSkLearnEstimator
- **'xgb_limitdepth'**: XGBoostLimitDepthEstimator
- **'rf'**: RandomForestEstimator
- **'extra_tree'**: ExtraTreesEstimator
- **'lrl1'**: LRL1Classifier (sklearn.LogisticRegression with L1 regularization)
- **'lrl2'**: LRL2Classifier (sklearn.LogisticRegression with L2 regularization)
- **'catboost'**: CatBoostEstimator 
- **'kneighbor'**: KNeighborsEstimator
- **'prophet'**: Prophet for Time-Series Forecasting
- **'arima'**: ARIMA for Time-Series Forecasting
- **'sarimax'**: SARIMAX for Time-Series Forecasting
- **'holt-winters'**: Holt-Winters (triple exponential smoothing) model for Time-Series Forecasting
- **'transformer'**: Huggingface transformer models for task "seq-classification", "seq-regression", "multichoice-classification", "token-classification" and "summarization"
- **'temporal_fusion_transformer'**: TemporalFusionTransformerEstimator for Time-Series Forecasting

In [None]:
automl_settings = {
    "time_budget": 60,  # Seconds
    "metric": 'r2', # Evaluation Metric
    "estimator_list": ["extra_tree", "rf"], # <---------------- Custom Estimator
    "task": 'regression' # Supervised ML Task
}
autoML = AutoML()
autoML.fit(X_train, y_train, **automl_settings)

print(f"BEST MODEL:\n{autoML.model.estimator}")
print(f"ACCURACY SCORE: {autoML.score(X_test, y_test)}")