## TPC-H & TPC-DS BigQuery Data Import  
Import data from GCS to a previously created BigQuery dataset.  

This Notebook assumes that you've already generated data at one or more scale factors and uploaded them to the project Google Cloud Storage bucket listed in `config.gcs_data_bucket`  

Three values are required to initiate an upload to BigQuery:  
1. `test` - the test name, either `h` or `ds`
2. `scale` - the scale factor in GB, usually this will be `1, 100, 1000, 10000`  
3. `name` - name of this instance of the `test` and `scale` combination, i.e. `time-partitioned`

In [1]:
import config, bq

In [2]:
import pandas as pd

In [3]:
pd.set_option("display.max_rows", 1000)
pd.options.display.float_format = '{:.2f}'.format

#### Upload Variables

In [4]:
test = "h"
scale = 1
cid = "01"
schema = "bq_h_01.sql"

In [5]:
dataset_name = "{}_{}GB_{}".format(test, scale, cid)
dataset_name

'h_1GB_01'

In [6]:
ddl_filepath = config.fp_schema + config.sep + schema

In [7]:
bq.create_dataset(dataset_name=dataset_name)

Dataset(DatasetReference('tpc-benchmarking-9432', 'h_1GB_01'))

In [8]:
bq.create_schema(schema_file=ddl_filepath, dataset=dataset_name)

In [9]:
bq_upload = bq.BQUpload(test=test, scale=scale, dataset=dataset_name)

#### Initiate Upload  
set `verbose=True` for status printouts.

In [10]:
fp = bq_upload.upload(verbose=True)

Tables to upload:
customer
lineitem
nation
orders
part
partsupp
region
supplier
Loading Table: customer
t0: 2020-06-10 12:37:46.070476
...
t1: 2020-06-10 12:37:56.204152
Load Job Done: True
ID: 8b2011db-df64-45f7-b0e6-89a90fd3e766
dt: 0 days 00:00:10.133676
GB/s: 0.00
------------------------------
Loading Table: lineitem
t0: 2020-06-10 12:37:56.213698
...
t1: 2020-06-10 12:39:00.419974
Load Job Done: True
ID: 7d3c48d4-2b04-4bd0-b6ee-9236dc1cb11a
dt: 0 days 00:01:04.206276
GB/s: 0.01
------------------------------
Loading Table: nation
t0: 2020-06-10 12:39:00.425323
...
t1: 2020-06-10 12:39:03.382731
Load Job Done: True
ID: 3917a494-7b50-417d-a65d-c9d746e129e9
dt: 0 days 00:00:02.957408
GB/s: 0.00
------------------------------
Loading Table: orders
t0: 2020-06-10 12:39:03.388706
...
t1: 2020-06-10 12:40:22.230121
Load Job Done: True
ID: 2d046187-48b0-4e26-9178-6f1f520787ca
dt: 0 days 00:01:18.841415
GB/s: 0.00
------------------------------
Loading Table: part
t0: 2020-06-10 12:40:22.

#### Summary of Upload

In [11]:
fp

'/home/colin/code/bq_snowflake_benchmark/h/bq_upload-h_1GB-h_1GB_01-2020-06-10 12:37:46.068779.csv'

In [12]:
dfa = bq.parse_log(fp)
dfa

Unnamed: 0,test,scale,dataset,table,status,t0,t1,size_bytes,job_id
0,test,scale,dataset,table,status,t0,t1,size_bytes,job_id
1,h,1,h_1GB_01,customer,start,2020-06-10 12:37:46.070476,,24196144,
2,h,1,h_1GB_01,customer,end,2020-06-10 12:37:46.070476,2020-06-10 12:37:56.204152,24196144,8b2011db-df64-45f7-b0e6-89a90fd3e766
3,h,1,h_1GB_01,lineitem,start,2020-06-10 12:37:56.213698,,753862072,
4,h,1,h_1GB_01,lineitem,end,2020-06-10 12:37:56.213698,2020-06-10 12:39:00.419974,753862072,7d3c48d4-2b04-4bd0-b6ee-9236dc1cb11a
5,h,1,h_1GB_01,nation,start,2020-06-10 12:39:00.425323,,2199,
6,h,1,h_1GB_01,nation,end,2020-06-10 12:39:00.425323,2020-06-10 12:39:03.382731,2199,3917a494-7b50-417d-a65d-c9d746e129e9
7,h,1,h_1GB_01,orders,start,2020-06-10 12:39:03.388706,,170452161,
8,h,1,h_1GB_01,orders,end,2020-06-10 12:39:03.388706,2020-06-10 12:40:22.230121,170452161,2d046187-48b0-4e26-9178-6f1f520787ca
9,h,1,h_1GB_01,part,start,2020-06-10 12:40:22.233363,,23935125,
