# Running BigQuery ML with Transformations

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sb
sb.set()

### The Data

In [2]:
%%bigquery earnings_data
SELECT *
FROM `crazy-hippo-01.clv.earnings_per_year`

Query complete after 0.01s: 100%|██████████| 1/1 [00:00<00:00, 644.98query/s] 
Downloading: 100%|██████████| 32461/32461 [00:01<00:00, 18977.70rows/s]


In [3]:
earnings_data.head()

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week,native_country,income
0,39,Private,297847,9th,5,Married-civ-spouse,Other-service,Wife,Black,Female,3411,0,34,United-States,<=50K
1,72,Private,74141,9th,5,Married-civ-spouse,Exec-managerial,Wife,Asian-Pac-Islander,Female,0,0,48,United-States,>50K
2,45,Private,178215,9th,5,Married-civ-spouse,Machine-op-inspct,Wife,White,Female,0,0,40,United-States,>50K
3,31,Private,86958,9th,5,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
4,55,Private,176012,9th,5,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,23,United-States,<=50K


### Training a model in BQML

#### Using Logistic Regression and adding a transformation to the data.

In [4]:
%%bigquery 
CREATE OR REPLACE MODEL clv.earnings_model
TRANSFORM(ML.FEATURE_CROSS(STRUCT(marital_status, relationship)) as cross_relationship,
            * EXCEPT(fnlwgt))
OPTIONS(input_label_cols=['income'], model_type='logistic_reg')
AS
SELECT *
FROM
  `crazy-hippo-01.clv.earnings_per_year`

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1330.54query/s]                        


The transformation step is stored within the model and will be applied to the data when doing predictions.

#### Use the <b>ML.EVALUATE</b> function to evaluate model metrics. 

In [5]:
%%bigquery
CREATE OR REPLACE TABLE clv.earnings_evaluation AS (
SELECT
  *
FROM
  ML.EVALUATE(MODEL `crazy-hippo-01.clv.earnings_model`)
)

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 408.09query/s]                          


#### Let us generate som samples to predit on

In [6]:
%%bigquery
CREATE or REPLACE TABLE `clv.prediction_sample`
AS (
SELECT *
FROM `crazy-hippo-01.clv.earnings_per_year`
WHERE RAND() < 0.0005 
)

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1557.87query/s]                        


#### Loading data from BQ and see how the data looks. 

In [7]:
%%bigquery prediction_data
SELECT * 
FROM `crazy-hippo-01.clv.prediction_sample`

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 538.56query/s]                          
Downloading: 100%|██████████| 7/7 [00:01<00:00,  4.22rows/s]


In [8]:
prediction_data

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week,native_country,income
0,56,?,132930,Masters,14,Never-married,?,Not-in-family,White,Female,0,0,50,United-States,>50K
1,39,Private,216149,Prof-school,15,Divorced,Prof-specialty,Not-in-family,White,Male,0,0,70,United-States,>50K
2,57,State-gov,109015,12th,8,Divorced,Other-service,Unmarried,White,Female,0,0,40,United-States,<=50K
3,28,?,222005,HS-grad,9,Never-married,?,Other-relative,White,Female,0,0,40,Mexico,<=50K
4,26,Private,336404,HS-grad,9,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,<=50K
5,32,Private,198452,HS-grad,9,Married-civ-spouse,Farming-fishing,Wife,White,Female,0,0,40,United-States,<=50K
6,46,Private,124733,Some-college,10,Married-civ-spouse,Craft-repair,Husband,Asian-Pac-Islander,Male,0,0,40,Vietnam,<=50K


#### Batch Predictions

In [14]:
%%bigquery
SELECT *
FROM
  ML.PREDICT(MODEL `crazy-hippo-01.clv.earnings_model`,
    (
    SELECT
      *
    FROM
      `crazy-hippo-01.clv.prediction_sample`))


Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 770.73query/s]                         
Downloading: 100%|██████████| 7/7 [00:01<00:00,  4.13rows/s]


Unnamed: 0,predicted_income,predicted_income_probs,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week,native_country,income
0,<=50K,"[{'label': ' >50K', 'prob': 0.1280631427577256...",56,?,132930,Masters,14,Never-married,?,Not-in-family,White,Female,0,0,50,United-States,>50K
1,>50K,"[{'label': ' >50K', 'prob': 0.5459859726474954...",39,Private,216149,Prof-school,15,Divorced,Prof-specialty,Not-in-family,White,Male,0,0,70,United-States,>50K
2,<=50K,"[{'label': ' >50K', 'prob': 0.0248695155665733...",57,State-gov,109015,12th,8,Divorced,Other-service,Unmarried,White,Female,0,0,40,United-States,<=50K
3,<=50K,"[{'label': ' >50K', 'prob': 0.0077506044851812...",28,?,222005,HS-grad,9,Never-married,?,Other-relative,White,Female,0,0,40,Mexico,<=50K
4,<=50K,"[{'label': ' >50K', 'prob': 0.2902664563264017...",26,Private,336404,HS-grad,9,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,<=50K
5,<=50K,"[{'label': ' >50K', 'prob': 0.2037352128721179...",32,Private,198452,HS-grad,9,Married-civ-spouse,Farming-fishing,Wife,White,Female,0,0,40,United-States,<=50K
6,<=50K,"[{'label': ' >50K', 'prob': 0.2167102400200784...",46,Private,124733,Some-college,10,Married-civ-spouse,Craft-repair,Husband,Asian-Pac-Islander,Male,0,0,40,Vietnam,<=50K
