<a href="https://colab.research.google.com/github/Rohith616/Client-Project-1/blob/main/Flaml.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install flaml[notebook,ts_forecast]

Collecting flaml[notebook,ts_forecast]
  Downloading FLAML-0.9.7-py3-none-any.whl (143 kB)
[K     |████████████████████████████████| 143 kB 6.9 MB/s 
Collecting lightgbm>=2.3.1
  Downloading lightgbm-3.3.2-py3-none-manylinux1_x86_64.whl (2.0 MB)
[K     |████████████████████████████████| 2.0 MB 44.8 MB/s 
Collecting rgf-python
  Downloading rgf_python-3.12.0-py3-none-manylinux1_x86_64.whl (757 kB)
[K     |████████████████████████████████| 757 kB 48.6 MB/s 
Collecting openml==0.10.2
  Downloading openml-0.10.2.tar.gz (158 kB)
[K     |████████████████████████████████| 158 kB 53.7 MB/s 
[?25hCollecting catboost>=0.26
  Downloading catboost-1.0.4-cp37-none-manylinux1_x86_64.whl (76.1 MB)
[K     |████████████████████████████████| 76.1 MB 1.1 MB/s 
Collecting prophet>=1.0.1
  Downloading prophet-1.0.1.tar.gz (65 kB)
[K     |████████████████████████████████| 65 kB 4.7 MB/s 
[?25hCollecting statsmodels>=0.12.2
  Downloading statsmodels-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux20

In [None]:
import pandas as pd
from flaml import AutoML
from flaml.ml import sklearn_metric_loss_score


def train_model(
    data: pd.DataFrame, future_data: pd.DataFrame, parameters: dict
) -> [AutoML, pd.DataFrame, pd.DataFrame]:
    def preprocessing(data, future_data):

        data["event"].fillna("no event", inplace=True)
        data["holiday"].fillna("休日なし", inplace=True)  # replace it with no holiday
        data["ds"] = pd.to_datetime(data["ds"])

        df = data.select_dtypes(include="object")
        df = df.drop(["target_address"], axis=1)
        for cols in df:
            dummies = pd.get_dummies(df[cols], drop_first=True)
            df = pd.concat([df, dummies], axis="columns")
        data = pd.concat([data, df], axis="columns")
        df = data.select_dtypes(include=["object"])
        data = data.drop(df.columns, axis=1)
        data.drop("Unnamed: 0", inplace=True, axis=1)

        future_data["event"].fillna("no event", inplace=True)
        future_data["holiday"].fillna(
            "休日なし", inplace=True
        )  # replace it with no holiday
        future_data["ds"] = pd.to_datetime(future_data["ds"])

        df = future_data.select_dtypes(include="object")
        df = df.drop(["target_address"], axis=1)
        for cols in df:
            dummies = pd.get_dummies(df[cols], drop_first=True)
            df = pd.concat([df, dummies], axis="columns")
        future_data = pd.concat([future_data, df], axis="columns")
        df = future_data.select_dtypes(include=["object"])
        future_data = future_data.drop(df.columns, axis=1)
        future_data.drop("Unnamed: 0", inplace=True, axis=1)

        return data, future_data

    data, future_data = preprocessing(data, future_data)

    num_samples = data.shape[0]
    test_size = round(num_samples / 5)
    time_horizon = test_size
    split_idx = num_samples - time_horizon
    train = data[:split_idx]
    X_test = data[split_idx:].drop("y", axis=1)
    y_test = data[split_idx:]["y"]

    automl = AutoML()
    feed_dict = parameters["FLAML"]

    model = automl.fit(dataframe=train, label="y", period=time_horizon, **feed_dict)
    print("Best ML leaner:", automl.best_estimator)
    print("Best hyperparmeter config:", automl.best_config)
    print(f"Best mape on validation data: {automl.best_loss}")
    print(f"Training duration of best run: {automl.best_config_train_time}s")

    pred = automl.predict(X_test)
    print("mape", "=", sklearn_metric_loss_score("mape", y_predict=pred, y_true=y_test))

    future_pred = automl.predict(future_data)

    return [model, pred, future_pred]

In [None]:
if __name__ == "__main__":
    data = pd.read_csv("/content/preprocessed_data.csv")
    future_data = data.drop("y", axis=1)
    # future_data = future_data.iloc[:1000]
    parameters = {
        "FLAML": {
            "time_budget": 180,  # total running time in seconds
            "metric": "mape",  # primary metric for validation: 'mape' is generally used for forecast tasks
            "task": "ts_forecast",  # task type
            "eval_method": "holdout",  # validation method can be chosen from ['auto', 'holdout', 'cv']
            "seed": 42,
        }
    }
    automl = train_model(data, future_data, parameters)

[flaml.automl: 02-21 06:12:55] {2055} INFO - task = ts_forecast
[flaml.automl: 02-21 06:12:55] {2057} INFO - Data split method: time
[flaml.automl: 02-21 06:12:55] {2061} INFO - Evaluation method: holdout
[flaml.automl: 02-21 06:12:55] {2142} INFO - Minimizing error metric: mape
[flaml.automl: 02-21 06:12:55] {2200} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'prophet', 'arima', 'sarimax']
[flaml.automl: 02-21 06:12:55] {2453} INFO - iteration 0, current learner lgbm
[flaml.automl: 02-21 06:12:55] {2569} INFO - Estimated sufficient time budget=612s. Estimated necessary time budget=1s.
[flaml.automl: 02-21 06:12:55] {2621} INFO -  at 0.2s,	estimator lgbm's best error=0.4767,	best estimator lgbm's best error=0.4767
[flaml.automl: 02-21 06:12:55] {2453} INFO - iteration 1, current learner lgbm
[flaml.automl: 02-21 06:12:55] {2621} INFO -  at 0.7s,	estimator lgbm's best error=0.4767,	best estimator lgbm's best error=0.4767
[flaml.auto

2015-12-18 00:00:00 2017-11-17 00:00:00 (701, 31)


[flaml.automl: 02-21 06:13:44] {2621} INFO -  at 49.8s,	estimator sarimax's best error=0.5618,	best estimator prophet's best error=0.3776
[flaml.automl: 02-21 06:13:44] {2453} INFO - iteration 9, current learner extra_tree


2015-12-18 00:00:00 2017-11-17 00:00:00 (701, 31)


[flaml.automl: 02-21 06:13:45] {2621} INFO -  at 50.1s,	estimator extra_tree's best error=0.4083,	best estimator prophet's best error=0.3776
[flaml.automl: 02-21 06:13:45] {2453} INFO - iteration 10, current learner lgbm
[flaml.automl: 02-21 06:13:45] {2621} INFO -  at 50.1s,	estimator lgbm's best error=0.4627,	best estimator prophet's best error=0.3776
[flaml.automl: 02-21 06:13:45] {2453} INFO - iteration 11, current learner xgboost
[flaml.automl: 02-21 06:13:45] {2621} INFO -  at 50.1s,	estimator xgboost's best error=0.7963,	best estimator prophet's best error=0.3776
[flaml.automl: 02-21 06:13:45] {2453} INFO - iteration 12, current learner rf
[flaml.automl: 02-21 06:13:45] {2621} INFO -  at 50.4s,	estimator rf's best error=0.4121,	best estimator prophet's best error=0.3776
[flaml.automl: 02-21 06:13:45] {2453} INFO - iteration 13, current learner rf
[flaml.automl: 02-21 06:13:45] {2621} INFO -  at 50.6s,	estimator rf's best error=0.4065,	best estimator prophet's best error=0.3776
[

2015-12-18 00:00:00 2017-11-17 00:00:00 (701, 31)


[flaml.automl: 02-21 06:14:11] {2621} INFO -  at 76.4s,	estimator prophet's best error=0.3772,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:11] {2453} INFO - iteration 41, current learner lgbm
[flaml.automl: 02-21 06:14:11] {2621} INFO -  at 76.4s,	estimator lgbm's best error=0.4315,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:11] {2453} INFO - iteration 42, current learner lgbm
[flaml.automl: 02-21 06:14:11] {2621} INFO -  at 76.4s,	estimator lgbm's best error=0.4315,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:11] {2453} INFO - iteration 43, current learner lgbm
[flaml.automl: 02-21 06:14:11] {2621} INFO -  at 76.5s,	estimator lgbm's best error=0.4187,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:11] {2453} INFO - iteration 44, current learner lgbm
[flaml.automl: 02-21 06:14:11] {2621} INFO -  at 76.5s,	estimator lgbm's best error=0.4187,	best estimator prophet's best error=0.3772
[f

2015-12-18 00:00:00 2017-11-17 00:00:00 (701, 31)


[flaml.automl: 02-21 06:14:27] {2621} INFO -  at 92.1s,	estimator rf's best error=0.4065,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:27] {2453} INFO - iteration 50, current learner lgbm
[flaml.automl: 02-21 06:14:27] {2621} INFO -  at 92.2s,	estimator lgbm's best error=0.4187,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:27] {2453} INFO - iteration 51, current learner xgboost
[flaml.automl: 02-21 06:14:27] {2621} INFO -  at 92.2s,	estimator xgboost's best error=0.4022,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:27] {2453} INFO - iteration 52, current learner lgbm
[flaml.automl: 02-21 06:14:27] {2621} INFO -  at 92.3s,	estimator lgbm's best error=0.4131,	best estimator prophet's best error=0.3772
[flaml.automl: 02-21 06:14:27] {2453} INFO - iteration 53, current learner lgbm
[flaml.automl: 02-21 06:14:27] {2621} INFO -  at 92.3s,	estimator lgbm's best error=0.4131,	best estimator prophet's best error=0.3772
[

Best ML leaner: prophet
Best hyperparmeter config: {'changepoint_prior_scale': 0.021150654016310305, 'seasonality_prior_scale': 10.0, 'holidays_prior_scale': 1.887740615941918, 'seasonality_mode': 'multiplicative'}
Best mape on validation data: 0.37456214680140343
Training duration of best run: 3.282740354537964s
mape = 0.0627778488381375
