In [1]:
project: getenv `project_id
csbucketname: getenv `csbucketname

# From BigQuery to kdb

In [2]:
// extract from BigQuery to Cloud Storage
system "bq extract ", project, ":bqkdb.allBQSimpleTypes gs://", csbucketname, "/allBQSimpleTypes.csv"

Waiting on bqjob_rb65f9eeb7cb263c_0000016e5b56e7ca_1 ... (0s) Current status: DONE   

""


In [3]:
csfilename: "gs://", csbucketname, "/allBQSimpleTypes.csv"

In [4]:
// Copy from Cloud Storage to local box
system "gsutil cp ", csfilename, " /tmp/"

Copying gs://storagebodon/allBQSimpleTypes.csv...
- [1 files][  265.0 B/  265.0 B]                                                
Operation completed over 1 objects/265.0 B.                                      




## Setting types manually

* using q time results in losing microsecond precision
* postprocessing is needed for types 
    * BOOL
    * TIMESTAMP

In [5]:
allBQSimpleTypes: ("sIF**DTP"; enlist ",") 0: read0 hsym `$"/tmp/allBQSimpleTypes.csv"

In [6]:
\c 25 125

In [7]:
allBQSimpleTypes

stringCol intCol floatCol boolCol tsCol                            dateCol    timeCol      dtCol                        
------------------------------------------------------------------------------------------------------------------------
GOOG      42     100.3    "true"  "2019-11-06 01:45:00 UTC"        2019.11.11 16:32:04.291 2019.11.11D16:32:04.291299000
AAPL      200    104.9    "false" "2019-11-11 16:32:04.291299 UTC" 2019.11.11 16:32:04.291 2019.11.11D16:32:04.291299000


Converting true/false literals to bool values

In [8]:
bigQueryToKdbBoolMap: ("true";"false")!10b

In [9]:
allBQSimpleTypes_fixed: update bigQueryToKdbBoolMap boolCol from allBQSimpleTypes

Converting timestamps by chopping of " UTC" postfix.

In [10]:
update "P"$-4_/:tsCol from `allBQSimpleTypes_fixed

`allBQSimpleTypes_fixed


In [11]:
allBQSimpleTypes_fixed

stringCol intCol floatCol boolCol tsCol                         dateCol    timeCol      dtCol                        
---------------------------------------------------------------------------------------------------------------------
GOOG      42     100.3    1       2019.11.06D01:45:00.000000000 2019.11.11 16:32:04.291 2019.11.11D16:32:04.291299000
AAPL      200    104.9    0       2019.11.11D16:32:04.291299000 2019.11.11 16:32:04.291 2019.11.11D16:32:04.291299000


## Automatic type conversion

In [12]:
\l utils/csvutil.q

In [13]:
allBQSimpleTypes_auto: .csv.read hsym `$"/tmp/allBQSimpleTypes.csv"

In [14]:
allBQSimpleTypes_auto

stringCol intCol floatCol boolCol tsCol                            dateCol    timeCol              dtCol                  ..
--------------------------------------------------------------------------------------------------------------------------..
"GOOG"    42     100.3    "true"  "2019-11-06 01:45:00 UTC"        2019.11.11 0D16:32:04.291299000 2019.11.11D16:32:04.291..
"AAPL"    200    104.9    "false" "2019-11-11 16:32:04.291299 UTC" 2019.11.11 0D16:32:04.291299000 2019.11.11D16:32:04.291..


In [15]:
meta allBQSimpleTypes_auto

c        | t f a
---------| -----
stringCol| C    
intCol   | h    
floatCol | e    
boolCol  | C    
tsCol    | C    
dateCol  | d    
timeCol  | n    
dtCol    | p    


Automatic type conversion works well for all types except for BOOL and TIMESTAMP.

# From kdb to BigQuery

Let us save the fixed kdb table to CSV

In [16]:
save `:/tmp/allBQSimpleTypes_fixed.csv

`:/tmp/allBQSimpleTypes_fixed.csv


In [17]:
system "bq load --autodetect bqkdb.allBQSimpleTypes_auto /tmp/allBQSimpleTypes_fixed.csv"

Waiting on bqjob_r6e8703bc49513062_0000016e5b56fcd0_1 ... (1s) Current status: DONE   

""
""


We can see that boolean (b) and timestamp columns (ts, dt) are not casted properly, they are string columns.

In [18]:
system "bq show bqkdb.allBQSimpleTypes_auto"

"Table ferenc-world:bqkdb.allBQSimpleTypes_auto"
""
"   Last modified           Schema          Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields  ..
" ----------------- ---------------------- ------------ ------------- ------------ ------------------- ------------------ ..
"  11 Nov 17:41:13   |- stringCol: string   2            216                                                              ..
"                    |- intCol: integer                                                                                   ..
"                    |- floatCol: float                                                                                   ..
"                    |- boolCol: integer                                                                                  ..
"                    |- tsCol: string                                                                                     ..
"                    |- dateCol: date                                    

We can convert bool and timestamp manually in q

In [19]:
allBQSimpleTypes2: update bigQueryToKdbBoolMap?boolCol,
    @[; 4 7 10; :; "-- "] each string tsCol, 
    @[; 4 7 10; :; "-- "] each string dtCol from allBQSimpleTypes_fixed

In [20]:
allBQSimpleTypes2

stringCol intCol floatCol boolCol tsCol                           dateCol    timeCol      dtCol                          
-------------------------------------------------------------------------------------------------------------------------
GOOG      42     100.3    "true"  "2019-11-06 01:45:00.000000000" 2019.11.11 16:32:04.291 "2019-11-11 16:32:04.291299000"
AAPL      200    104.9    "false" "2019-11-11 16:32:04.291299000" 2019.11.11 16:32:04.291 "2019-11-11 16:32:04.291299000"


In [21]:
save `:/tmp/allBQSimpleTypes2.csv
system "bq load --autodetect bqkdb.allBQSimpleTypes2 /tmp/allBQSimpleTypes2.csv"

`:/tmp/allBQSimpleTypes2.csv


Waiting on bqjob_r5bbea663477e0e23_0000016e5b57433b_1 ... (4s) Current status: DONE   

""
""


In [22]:
system "bq show bqkdb.allBQSimpleTypes2"

"Table ferenc-world:bqkdb.allBQSimpleTypes2"
""
"   Last modified           Schema          Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields  ..
" ----------------- ---------------------- ------------ ------------- ------------ ------------------- ------------------ ..
"  11 Nov 17:41:32   |- stringCol: string   2            110                                                              ..
"                    |- intCol: integer                                                                                   ..
"                    |- floatCol: float                                                                                   ..
"                    |- boolCol: boolean                                                                                  ..
"                    |- tsCol: timestamp                                                                                  ..
"                    |- dateCol: date                                        

## Clean-up

In [23]:
// Cloud Storage
system "gsutil rm ", csfilename

Removing gs://storagebodon/allBQSimpleTypes.csv...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              




In [24]:
// local files
system "rm /tmp/allBQSimpleTypes.csv"
system "rm /tmp/allBQSimpleTypes_fixed.csv"
system "rm /tmp/allBQSimpleTypes2.csv"







In [25]:
// BigQuery table
system "bq rm -f bqkdb.allBQSimpleTypes_auto"
system "bq rm -f bqkdb.allBQSimpleTypes2"



