<a href="https://colab.research.google.com/github/XinyaoWa/aidk-integration/blob/main/Democratize_the_customized_models_with_AIDK.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **To be deleted**

The below will be deleted when hydro.ai can be installed with pip

In [1]:
from google.colab import drive
import os
drive.mount('/content/drive/')
os.chdir('/content/drive/My Drive/integration/frameworks.bigdata.bluewhale-main')
aidk_path = '/content/drive/MyDrive/integration/frameworks.bigdata.bluewhale-main'

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [2]:
from run_hydroai2 import try_democratize

/content/drive/MyDrive/integration/frameworks.bigdata.bluewhale-main


In [3]:
!pip install -r requirements.txt

Collecting sigopt==7.5.0
  Downloading sigopt-7.5.0-py2.py3-none-any.whl (38 kB)
Collecting PyYAML>=5.4.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 8.0 MB/s 
[?25hCollecting pypng>=0.0.20
  Downloading pypng-0.0.21-py3-none-any.whl (48 kB)
[K     |████████████████████████████████| 48 kB 4.6 MB/s 
Collecting GitPython>=2.0.0
  Downloading GitPython-3.1.27-py3-none-any.whl (181 kB)
[K     |████████████████████████████████| 181 kB 44.2 MB/s 
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 1.7 MB/s 
[?25hCollecting smmap<6,>=3.0.1
  Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Installing collected packages: smmap, gitdb, PyYAML, pypng, GitPython, sigopt
  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Succ

## **Overview**

### **AIDK Introduction**
AIDK(AI Democratization Toolkit) is an end-to-end AI toolkit which can deliver high performance lightweight models efficiently on commodity hardware.

##### **AIDK Key Features**

With AIDK, you can:

- Get direct access to democratized models for a wide range of areas( recommendation systems, computer vision, speech recognition, nature language processing.....), which can bring 100x speedup over stock models.
- Automatically optimize and accelerate the candidate models with democratized tools in AIDK(SDA, SDNN...) while keeping comparable score.
- Plug the democratized module(RecDP, SDA, SDNN...) into your own AI pipeline for specific usage.

###**Notebook Content**

After the learning of quick start for AIDK, in this notebook we will show a further usage: If the build-in models in modelzoo can't meet your requirements, and you have  a customized model as baseline for your project, AIDK can still optimize your model with democratized algorithms such as SDA, SDNN and NAS, which can find more suitable network structure or parameters for better performance.

For simplicity, we use Xgboost as an example for demonstration.

## **AIDK Install**

Install AIDK with pip.

In [None]:
# pip install aidk

In [None]:
# from aidk import *
# aidk_path = aidk.__path__[0]
aidk_path

'/content/drive/MyDrive/integration/frameworks.bigdata.bluewhale-main'

## **Democratize the customized model with AIDK**

### **Define the customized model**

Take XGBoost as an example and define the baseline model.

In [3]:
import xgboost

params = {'learning_rate': 0.6,
          'max_depth':8,
          'n_estimators':6,
          'use_label_encoder':False}

model = xgboost.XGBClassifier(**params, eval_metric='mlogloss')

### **Define the customized train function**

You should give your train function, which takes *model* as input, return a measurable *score* as target.

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_wine

def train(model):
  # Prepare dataset, here we use Sklearn Wine dataset as an example  
  x, y = load_wine(return_X_y=True, as_frame=True)
  x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=2)

  # Train the model
  model.fit(x_train, y_train)

  # Get the prediction score
  y_pred = model.predict(x_test)
  accuracy = accuracy_score(y_test, y_pred)

  return accuracy

print(train(model))
print(model)

0.9027777777777778
XGBClassifier(eval_metric='mlogloss', learning_rate=0.6, max_depth=8,
              n_estimators=6, objective='multi:softprob',
              use_label_encoder=False)


### **Run AIDK to democratize the customized model**

Just provide your model and train function,  *try_democratize()* function can automatically generate the optimized models.

try_democratize():

- Input args:
  
  - train: training function

  - model: customized model

  - other args: input args for train function

- Ouput: a list of selected models, sorted by score


In [5]:
args = {"models": model}
models_opt = try_democratize(train, model, *args)

for i in range(3):
  print(f"model: ",models_opt[i][0])
  print(f"result: ",models_opt[i][1])
  print()

model:  XGBClassifier(eval_metric='mlogloss', max_depth=8, n_estimators=6,
              use_label_encoder=False)
result:  0.9583333333333334

model:  XGBClassifier(eval_metric='mlogloss', max_depth=2, n_estimators=2,
              use_label_encoder=False)
result:  0.9444444444444444

model:  XGBClassifier(eval_metric='mlogloss', learning_rate=0.4, max_depth=2,
              n_estimators=2, use_label_encoder=False)
result:  0.9305555555555556



Select the best model as final one.

In [7]:
best_model = models_opt[0]
print("The democratized model is:")
print(f"model: ",best_model[0])
print(f"result: ",best_model[1])

The democratized model is:
model:  XGBClassifier(eval_metric='mlogloss', max_depth=8, n_estimators=6,
              use_label_encoder=False)
result:  0.9583333333333334
