In [1]:
project: getenv `project_id
csbucketname: getenv `csbucketname

# From BigQuery to kdb

In [2]:
// extract from BigQuery to Cloud Storage
system "bq extract ", project, ":bqkdb.allBQSimpleTypes gs://", csbucketname, "/allBQSimpleTypes.csv"

Waiting on bqjob_r47c87866bd7f4e18_0000016e4623160f_1 ... (0s) Current status: DONE   

""


In [3]:
csfilename: "gs://", csbucketname, "/allBQSimpleTypes.csv"

In [4]:
// Copy from Cloud Storage to local box
system "gsutil cp ", csfilename, " /tmp/"

Copying gs://storagebodon/allBQSimpleTypes.csv...
- [1 files][  1.6 KiB/  1.6 KiB]                                                
Operation completed over 1 objects/1.6 KiB.                                      




## Setting types manually

* using q time results in losing microsecond precision
* postprocessing is needed for types 
    * BOOL
    * TIMESTAMP

In [5]:
allBQSimpleTypes: ("sIF**DTP"; enlist ",") 0: read0 hsym `$"/tmp/allBQSimpleTypes.csv"

In [22]:
\c 25 125

In [23]:
allBQSimpleTypes

s    int f     b       ts                               date       time         dt                           
-------------------------------------------------------------------------------------------------------------
AAPL 200 104.9 "false" "2019-11-04 14:06:15.048017 UTC" 2019.11.04 14:06:15.048 2019.11.04D14:06:15.048017000
AAPL 200 104.9 "false" "2019-11-04 14:15:50.695853 UTC" 2019.11.04 14:15:50.695 2019.11.04D14:15:50.695853000
AAPL 200 104.9 "false" "2019-11-04 14:16:36.648513 UTC" 2019.11.04 14:16:36.648 2019.11.04D14:16:36.648513000
AAPL 200 104.9 "false" "2019-11-04 14:17:06.165037 UTC" 2019.11.04 14:17:06.165 2019.11.04D14:17:06.165037000
AAPL 200 104.9 "false" "2019-11-04 14:17:25.185851 UTC" 2019.11.04 14:17:25.185 2019.11.04D14:17:25.185851000
AAPL 200 104.9 "false" "2019-11-04 14:17:45.804987 UTC" 2019.11.04 14:17:45.804 2019.11.04D14:17:45.804987000
AAPL 200 104.9 "false" "2019-11-04 16:18:49.093655 UTC" 2019.11.04 16:18:49.093 2019.11.04D16:18:49.093655000
GOOG 42  1

Converting true/false literals to bool values

In [7]:
bigQueryToKdbBoolMap: ("true";"false")!10b

In [8]:
allBQSimpleTypes_fixed: update bigQueryToKdbBoolMap b from allBQSimpleTypes

Converting timestamps by chopping of " UTC" postfix.

In [9]:
update "P"$-4_/:ts from `allBQSimpleTypes_fixed

`allBQSimpleTypes_fixed


In [10]:
allBQSimpleTypes_fixed

s    int f     b ts                            date       time         dt    ..
-----------------------------------------------------------------------------..
AAPL 200 104.9 0 2019.11.04D14:06:15.048017000 2019.11.04 14:06:15.048 2019.1..
AAPL 200 104.9 0 2019.11.04D14:15:50.695853000 2019.11.04 14:15:50.695 2019.1..
AAPL 200 104.9 0 2019.11.04D14:16:36.648513000 2019.11.04 14:16:36.648 2019.1..
AAPL 200 104.9 0 2019.11.04D14:17:06.165037000 2019.11.04 14:17:06.165 2019.1..
AAPL 200 104.9 0 2019.11.04D14:17:25.185851000 2019.11.04 14:17:25.185 2019.1..
AAPL 200 104.9 0 2019.11.04D14:17:45.804987000 2019.11.04 14:17:45.804 2019.1..
AAPL 200 104.9 0 2019.11.04D16:18:49.093655000 2019.11.04 16:18:49.093 2019.1..
GOOG 42  100.3 1 2019.11.04D14:05:07.166154000 2019.11.04 14:05:07.166 2019.1..
GOOG 42  100.3 1 2019.11.04D14:06:15.048017000 2019.11.04 14:06:15.048 2019.1..
GOOG 42  100.3 1 2019.11.04D14:15:50.695853000 2019.11.04 14:15:50.695 2019.1..
GOOG 42  100.3 1 2019.11.04D14:16:36.648

## Automatic type conversion

In [24]:
\l utils/csvutil.q

In [10]:
allBQSimpleTypes_auto: .csv.read hsym `$"/tmp/allBQSimpleTypes.csv"

[0;31m.csv.read[0m: [0;31m.csv.read[0m

In [10]:
allBQSimpleTypes_auto

[0;31mallBQSimpleTypes_auto[0m: [0;31mallBQSimpleTypes_auto[0m

In [10]:
meta allBQSimpleTypes_auto

[0;31mallBQSimpleTypes_auto[0m: [0;31mallBQSimpleTypes_auto[0m

Automatic type conversion works well for all types except for BOOL and TIMESTAMP.

# From kdb to BigQuery

Let us save the fixed kdb table to CSV

In [11]:
save `:/tmp/allBQSimpleTypes_fixed.csv

`:/tmp/allBQSimpleTypes_fixed.csv


In [12]:
system "bq load --autodetect bqkdb.allBQSimpleTypes_auto /tmp/allBQSimpleTypes_fixed.csv"

Waiting on bqjob_r60b98cca9dba000b_0000016e462330d5_1 ... (5s) Current status: DONE   

""
""


We can see that boolean (b) and timestamp columns (ts, dt) are not casted properly, they are string columns.

In [13]:
system "bq show bqkdb.allBQSimpleTypes_auto"

"Table ferenc-world:bqkdb.allBQSimpleTypes_auto"
""
"   Last modified        Schema        Total Rows   Total Bytes   Expiration ..
" ----------------- ----------------- ------------ ------------- ------------..
"  07 Nov 14:52:40   |- s: string      15           1620                     ..
"                    |- int: integer                                         ..
"                    |- f: float                                             ..
"                    |- b: integer                                           ..
"                    |- ts: string                                           ..
"                    |- date: date                                           ..
"                    |- time: time                                           ..
"                    |- dt: string                                           ..
""


We can convert bool and timestamp manually in q

In [14]:
kdbToBigQueryBoolMap: value[bigQueryToKdbBoolMap]!key bigQueryToKdbBoolMap

In [15]:
allBQSimpleTypes2: update kdbToBigQueryBoolMap b,
    @[; 4 7 10; :; "-- "] each string ts, 
    @[; 4 7 10; :; "-- "] each string dt from allBQSimpleTypes_fixed

In [16]:
allBQSimpleTypes2

s    int f     b       ts                              date       time       ..
-----------------------------------------------------------------------------..
AAPL 200 104.9 "false" "2019-11-04 14:06:15.048017000" 2019.11.04 14:06:15.04..
AAPL 200 104.9 "false" "2019-11-04 14:15:50.695853000" 2019.11.04 14:15:50.69..
AAPL 200 104.9 "false" "2019-11-04 14:16:36.648513000" 2019.11.04 14:16:36.64..
AAPL 200 104.9 "false" "2019-11-04 14:17:06.165037000" 2019.11.04 14:17:06.16..
AAPL 200 104.9 "false" "2019-11-04 14:17:25.185851000" 2019.11.04 14:17:25.18..
AAPL 200 104.9 "false" "2019-11-04 14:17:45.804987000" 2019.11.04 14:17:45.80..
AAPL 200 104.9 "false" "2019-11-04 16:18:49.093655000" 2019.11.04 16:18:49.09..
GOOG 42  100.3 "true"  "2019-11-04 14:05:07.166154000" 2019.11.04 14:05:07.16..
GOOG 42  100.3 "true"  "2019-11-04 14:06:15.048017000" 2019.11.04 14:06:15.04..
GOOG 42  100.3 "true"  "2019-11-04 14:15:50.695853000" 2019.11.04 14:15:50.69..
GOOG 42  100.3 "true"  "2019-11-04 14:16

In [17]:
save `:/tmp/allBQSimpleTypes2.csv
system "bq load --autodetect bqkdb.allBQSimpleTypes2 /tmp/allBQSimpleTypes2.csv"

`:/tmp/allBQSimpleTypes2.csv


Waiting on bqjob_r2e4eb136b1937b82_0000016e462383ee_1 ... (1s) Current status: DONE   

""
""


In [18]:
system "bq show bqkdb.allBQSimpleTypes2"

"Table ferenc-world:bqkdb.allBQSimpleTypes2"
""
"   Last modified         Schema        Total Rows   Total Bytes   Expiration..
" ----------------- ------------------ ------------ ------------- -----------..
"  07 Nov 14:52:57   |- s: string       30           1650                    ..
"                    |- int: integer                                         ..
"                    |- f: float                                             ..
"                    |- b: boolean                                           ..
"                    |- ts: timestamp                                        ..
"                    |- date: date                                           ..
"                    |- time: time                                           ..
"                    |- dt: timestamp                                        ..
""


## Clean-up

In [19]:
// Cloud Storage
system "gsutil rm ", csfilename

Removing gs://storagebodon/allBQSimpleTypes.csv...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              




In [20]:
// local files
system "rm /tmp/allBQSimpleTypes.csv"
system "rm /tmp/allBQSimpleTypes_fixed.csv"
system "rm /tmp/allBQSimpleTypes2.csv"







In [21]:
// BigQuery table
system "bq rm -f bqkdb.allBQSimpleTypes_fixed"
system "bq rm -f bqkdb.allBQSimpleTypes2"



