
Work for next phase of [BQML Feature Engineering - v2.ipynb](./BQML%20Feature%20Engineering%20-%20v2.ipynb)

---
# TRANSFORM Only Models

It is now possible to create a transform only model with BQML.  This enables the pre-processing of features in BigQuery with the `ML.TRANSFORM` function as well as portability through [registering in Vertex AI Model Registry](https://cloud.google.com/bigquery/docs/create_vertex) and [exporting to Cloud Storage](https://cloud.google.com/bigquery/docs/exporting-models).

## Create Transform Model

Use BigQuery ML to create a transform only model:
- [TRANSFORM Only Model]() with BigQuery ML (BQML)

Feature preprocessing with the `TRANSFORM` statement:
- [TRANSFORM](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create#transform)

In [147]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
RUN_NAME = f'run-{TIMESTAMP}'
BQ_MODEL_TRANSFORM_ONLY = f'{SERIES}_{EXPERIMENT}_transform_only'

In [148]:
query = f"""
CREATE OR REPLACE MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL_TRANSFORM_ONLY}`
TRANSFORM (
    JUDGE_A,

    ML.MIN_MAX_SCALER(flourAmt) OVER() as scale_flourAmt, 
    ML.ROBUST_SCALER(saltAmt) OVER() as scale_saltAmt,
    ML.MAX_ABS_SCALER(yeastAmt) OVER() as scale_yeastAmt,
    ML.STANDARD_SCALER(water1Amt) OVER() as scale_water1Amt,
    ML.STANDARD_SCALER(water2Amt) OVER() as scale_water2Amt,

    ML.STANDARD_SCALER(waterTemp) OVER() as scale_waterTemp,
    ML.ROBUST_SCALER(bakeTemp) OVER() as scale_bakeTemp,
    ML.MIN_MAX_SCALER(ambTemp) OVER() as scale_ambTemp,
    ML.MAX_ABS_SCALER(ambHumidity) OVER() as scale_ambHumidity,

    ML.ROBUST_SCALER(mix1Time) OVER() as scale_mix1Time,
    ML.ROBUST_SCALER(mix2Time) OVER() as scale_mix2Time,
    ML.ROBUST_SCALER(mix1Speed) OVER() as scale_mix1Speed,
    ML.ROBUST_SCALER(mix2Speed) OVER() as scale_mix2Speed,
    ML.STANDARD_SCALER(proveTime) OVER() as scale_proveTime,
    ML.MAX_ABS_SCALER(restTime) OVER() as scale_restTime,
    ML.MAX_ABS_SCALER(bakeTime) OVER() as scale_bakeTime
)
OPTIONS (
        model_type = 'TRANSFORM_ONLY'
    ) AS
SELECT * EXCEPT(Recipe, JUDGE_B)
FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
"""
print(query)


CREATE OR REPLACE MODEL `statmike-mlops-349915.feature_engineering.03_feature_engineering_transform_only`
TRANSFORM (
    JUDGE_A,

    ML.MIN_MAX_SCALER(flourAmt) OVER() as scale_flourAmt, 
    ML.ROBUST_SCALER(saltAmt) OVER() as scale_saltAmt,
    ML.MAX_ABS_SCALER(yeastAmt) OVER() as scale_yeastAmt,
    ML.STANDARD_SCALER(water1Amt) OVER() as scale_water1Amt,
    ML.STANDARD_SCALER(water2Amt) OVER() as scale_water2Amt,

    ML.STANDARD_SCALER(waterTemp) OVER() as scale_waterTemp,
    ML.ROBUST_SCALER(bakeTemp) OVER() as scale_bakeTemp,
    ML.MIN_MAX_SCALER(ambTemp) OVER() as scale_ambTemp,
    ML.MAX_ABS_SCALER(ambHumidity) OVER() as scale_ambHumidity,

    ML.ROBUST_SCALER(mix1Time) OVER() as scale_mix1Time,
    ML.ROBUST_SCALER(mix2Time) OVER() as scale_mix2Time,
    ML.ROBUST_SCALER(mix1Speed) OVER() as scale_mix1Speed,
    ML.ROBUST_SCALER(mix2Speed) OVER() as scale_mix2Speed,
    ML.STANDARD_SCALER(proveTime) OVER() as scale_proveTime,
    ML.MAX_ABS_SCALER(restTime) OVER() a

In [149]:
job = bq.query(query = query)
job.result()
(job.ended-job.started).total_seconds()

4.153

In [150]:
job.total_bytes_processed

272000

Add labels to the model in BigQuery:

In [151]:
bqml_model_transform_only = bq.get_model(f'{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL_TRANSFORM_ONLY}')
bqml_model_transform_only.labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}'}
bqml_model_transform_only = bq.update_model(bqml_model_transform_only, ['labels'])

### Check out this model in the BigQuery Console:
- Make sure project selected is the one from this notebook
- Under Explore, expand this project and dataset
- Expand Models and select the model create here

In [152]:
print(f'Direct link to the model in BigQuery:\nhttps://console.cloud.google.com/bigquery?project={PROJECT_ID}&ws=!1m5!1m4!5m3!1s{PROJECT_ID}!2s{BQ_DATASET}!3s{BQ_MODEL_TRANSFORM_ONLY}')

Direct link to the model in BigQuery:
https://console.cloud.google.com/bigquery?project=statmike-mlops-349915&ws=!1m5!1m4!5m3!1sstatmike-mlops-349915!2sfeature_engineering!3s03_feature_engineering_transform_only


---
## Transformations

### Transformations With BigQuery ML (BQML)

Create a pandas dataframe with retrieved transformations for the test data in the table using [ML.TRANSPOSE]():

In [153]:
query = f"""
    SELECT *
    FROM ML.TRANSFORM (MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL_TRANSFORM_ONLY}`,(
        SELECT *
        FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
        WHERE Recipe <= 10)
    )
    ORDER BY Recipe
"""
print(query)


    SELECT *
    FROM ML.TRANSFORM (MODEL `statmike-mlops-349915.feature_engineering.03_feature_engineering_transform_only`,(
        SELECT *
        FROM `statmike-mlops-349915.feature_engineering.bread`
        WHERE Recipe <= 10)
    )
    ORDER BY Recipe



In [154]:
bq.query(query = query).to_dataframe()

Unnamed: 0,JUDGE_A,scale_flourAmt,scale_saltAmt,scale_yeastAmt,scale_water1Amt,scale_water2Amt,scale_waterTemp,scale_bakeTemp,scale_ambTemp,scale_ambHumidity,...,water1Amt,water2Amt,waterTemp,proveTime,restTime,bakeTime,bakeTemp,ambTemp,ambHumidity,JUDGE_B
0,67.0,0.450956,0.0,0.666667,-0.479432,2.270311,1.38279,-0.698112,0.51649,0.270187,...,320.903454,154.760677,49,97.656789,36,28,409.798183,61.812613,24.599715,54.0
1,81.0,0.469039,0.5,0.833333,-1.793494,-0.27714,-0.312637,-0.635637,0.267677,0.40759,...,294.907489,104.764913,44,98.06125,36,29,411.079681,45.124131,37.109838,72.0
2,63.0,0.472714,0.0,0.833333,1.433432,0.679561,-1.329893,1.180338,0.647006,0.562394,...,358.745436,123.540921,41,101.036741,38,29,448.329332,70.566656,51.204248,48.0
3,76.0,0.382826,-0.5,0.75,0.886299,0.21227,1.38279,-0.684108,0.623914,0.4448,...,347.921558,114.369975,49,99.773276,36,26,410.085443,69.0178,40.497711,65.0
4,57.0,0.822516,-0.5,0.916667,-0.058314,-0.902542,-0.990807,-2.004645,0.468709,0.691146,...,329.234385,92.490886,42,100.580615,42,27,382.998319,58.60786,62.926785,40.0
5,80.0,0.494199,0.0,0.916667,-0.49992,-0.166363,-0.990807,0.66384,0.5519,0.453831,...,320.498138,106.938983,42,95.026082,36,22,437.73481,64.187647,41.319957,71.0
6,71.0,0.39149,0.0,0.916667,0.001426,-0.454966,-0.312637,-1.291078,0.440716,0.679314,...,330.416212,101.274937,44,97.712527,37,29,397.635143,56.730291,61.849543,59.0
7,85.0,0.345462,-0.5,0.75,2.0262,-0.071103,-1.329893,0.139319,0.380666,0.526596,...,370.47211,108.808548,41,92.016182,45,25,426.975739,52.702558,47.94496,78.0
8,77.0,0.195772,1.0,0.833333,0.093704,0.808929,0.365534,-0.256844,0.409846,0.607245,...,332.24173,126.079875,46,110.620973,40,22,418.849566,54.659764,55.287793,66.0
9,86.0,0.527988,0.0,0.833333,0.890606,-0.837022,0.365534,0.315464,0.270112,0.589693,...,348.006773,93.776765,46,88.75728,36,25,430.588852,45.287481,53.689779,78.0


#### Compare to non-transformed data:

In [155]:
query = f"""
    SELECT *
    FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
    WHERE Recipe <= 10
    ORDER BY Recipe
"""
bq.query(query = query).to_dataframe()

Unnamed: 0,Recipe,flourAmt,saltAmt,yeastAmt,mix1Time,mix1Speed,mix2Time,mix2Speed,water1Amt,water2Amt,waterTemp,proveTime,restTime,bakeTime,bakeTemp,ambTemp,ambHumidity,JUDGE_A,JUDGE_B
0,1,497.653667,10,8,6,3,6,4,320.903454,154.760677,49,97.656789,36,28,409.798183,61.812613,24.599715,67.0,54.0
1,2,498.896134,11,10,5,4,5,6,294.907489,104.764913,44,98.06125,36,29,411.079681,45.124131,37.109838,81.0,72.0
2,3,499.148669,10,10,4,4,7,5,358.745436,123.540921,41,101.036741,38,29,448.329332,70.566656,51.204248,63.0,48.0
3,4,492.972374,9,9,6,4,4,6,347.921558,114.369975,49,99.773276,36,26,410.085443,69.0178,40.497711,76.0,65.0
4,5,523.183916,9,11,4,4,5,5,329.234385,92.490886,42,100.580615,42,27,382.998319,58.60786,62.926785,57.0,40.0
5,6,500.624903,10,11,6,3,7,5,320.498138,106.938983,42,95.026082,36,22,437.73481,64.187647,41.319957,80.0,71.0
6,7,493.567697,10,11,6,2,5,4,330.416212,101.274937,44,97.712527,37,29,397.635143,56.730291,61.849543,71.0,59.0
7,8,490.405017,9,9,3,3,7,4,370.47211,108.808548,41,92.016182,45,25,426.975739,52.702558,47.94496,85.0,78.0
8,9,480.11966,12,10,4,4,5,6,332.24173,126.079875,46,110.620973,40,22,418.849566,54.659764,55.287793,77.0,66.0
9,10,502.946563,10,10,6,2,7,4,348.006773,93.776765,46,88.75728,36,25,430.588852,45.287481,53.689779,86.0,78.0


### Transformations With BigQuery ML (BQML) Models Inline TRANSFORM:

Earlier in this notebook two BQML models were created.  The second of these included a `TRANSFORM` clause.  It is also possible to retrieve transformations from just the preprocessing transformations of this model.

Create a pandas dataframe with retrieved transformations for the test data in the table using [ML.TRANSPOSE]():

In [156]:
query = f"""
    SELECT *
    FROM ML.TRANSFORM (MODEL `{BQ_PROJECT}.{BQ_DATASET}.{BQ_MODEL_TRANSFORM}`,(
        SELECT *
        FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
        WHERE Recipe <= 10)
    )
    ORDER BY Recipe
"""
print(query)


    SELECT *
    FROM ML.TRANSFORM (MODEL `statmike-mlops-349915.feature_engineering.03_feature_engineering_transform`,(
        SELECT *
        FROM `statmike-mlops-349915.feature_engineering.bread`
        WHERE Recipe <= 10)
    )
    ORDER BY Recipe



In [157]:
bq.query(query = query).to_dataframe()

Unnamed: 0,scale_flourAmt,scale_saltAmt,scale_yeastAmt,scale_water1Amt,scale_water2Amt,scale_waterTemp,scale_bakeTemp,scale_ambTemp,scale_ambHumidity,scale_mix1Time,...,water2Amt,waterTemp,proveTime,restTime,bakeTime,bakeTemp,ambTemp,ambHumidity,JUDGE_A,JUDGE_B
0,0.450956,0.0,0.666667,-0.49882,2.299439,1.385941,-0.681722,0.51649,0.270187,0.5,...,154.760677,49,97.656789,36,28,409.798183,61.812613,24.599715,67.0,54.0
1,0.469039,0.5,0.833333,-1.826547,-0.276879,-0.309192,-0.618948,0.267677,0.40759,0.0,...,104.764913,44,98.06125,36,29,411.079681,45.124131,37.109838,81.0,72.0
2,0.472714,0.0,0.833333,1.433935,0.690662,-1.326273,1.205746,0.647006,0.562394,-0.5,...,123.540921,41,101.036741,38,29,448.329332,70.566656,51.204248,63.0,48.0
3,0.382826,-0.5,0.75,0.881112,0.218077,1.385941,-0.667651,0.623914,0.4448,0.5,...,114.369975,49,99.773276,36,26,410.085443,69.0178,40.497711,76.0,65.0
4,0.822516,-0.5,0.916667,-0.073323,-0.909368,-0.987246,-1.994527,0.468709,0.691146,-0.5,...,92.490886,42,100.580615,42,27,382.998319,58.60786,62.926785,57.0,40.0
5,0.494199,0.0,0.916667,-0.519521,-0.164848,-0.987246,0.686768,0.5519,0.453831,0.5,...,106.938983,42,95.026082,36,22,437.73481,64.187647,41.319957,80.0,71.0
6,0.39149,0.0,0.916667,-0.012962,-0.45672,-0.309192,-1.277535,0.440716,0.679314,0.5,...,101.274937,44,97.712527,37,29,397.635143,56.730291,61.849543,71.0,59.0
7,0.345462,-0.5,0.75,2.032867,-0.068508,-1.326273,0.159729,0.380666,0.526596,-1.0,...,108.808548,41,92.016182,45,25,426.975739,52.702558,47.94496,85.0,78.0
8,0.195772,1.0,0.833333,0.080275,0.821496,0.368861,-0.238336,0.409846,0.607245,-0.5,...,126.079875,46,110.620973,40,22,418.849566,54.659764,55.287793,77.0,66.0
9,0.527988,0.0,0.833333,0.885464,-0.843106,0.368861,0.336719,0.270112,0.589693,0.5,...,93.776765,46,88.75728,36,25,430.588852,45.287481,53.689779,86.0,78.0
