# Teradata to BigQuery SQL Translation

## Introduction

Both BigQuery and Teradata Database conform to the [ANSI/ISO SQL:2011](https://wikipedia.org/wiki/SQL:2011)standard. In addition, Teradata has created some extensions to the SQL standard to enable Teradata-specific functionalities.

In contrast, BigQuery does not support these proprietary extensions. Therefore, some of your queries might need to be refactored during migration from Teradata to BigQuery. Having queries that only use the ANSI/ISO SQL standard that's supported by BigQuery has the added benefit that it helps ensure portability and helps your queries be agnostic to the underlying data warehouse.

This notebook addresses some of the challenges you might encounter while migrating SQL queries from Teradata to BigQuery. It explains when these translations should be applied in the context of an end-to-end staged migration.

* basic replace stuff?
* qualify
* sub selects?
* Multiset?

## Teradata SQL differences

This notebook discusses notable differences between Teradata SQL and the BigQuery standard SQL, and some strategies for translating between the two dialects. The list of differences presented in this notebook is not exhaustive. For additional information, see the [Teradata-to-BigQuery SQL translation reference](https://cloud.google.com/solutions/migration/dw2bq/td2bq/td-bq-sql-translation-reference-tables).

### Data Types

BigQuery supports a more concise set of data types than
Teradata, with groups of Teradata types mapping into a single standard SQL data
type. For instance:

-   `INTEGER`, `SMALLINT`, `BYTEINT`, and `BIGINT` all map to `INT64`.
-   `CLOB`, `JSON`, `XML`, `UDT` and other types that contain large
    character fields map to `STRING`.
-   `BLOB`, `BYTE`, and `VARBYTE` types that contain binary information map
    to `BYTES`.

For dates, the main types (`DATE`, `TIME`, and `TIMESTAMP`) are equivalent in
Teradata and BigQuery. However, other specialized date types from
Teradata need to be mapped, such as the following:

-   `TIME_WITH_TIME_ZONE` to `TIME`.
-   `TIMESTAMP_WITH_TIME_ZONE` to `TIMESTAMP`.
-   `INTERVAL_HOUR`, `INTERVAL_MINUTE`, and other `INTERVAL_*` types map to
    `INT64` in BigQuery.
-   `PERIOD(DATE)`,` PERIOD(TIME)`, and other` PERIOD(*)` types map to `STRING`.

[Multi-dimensional arrays](https://docs.teradata.com/reader/S0Fw2AVH8ff3MDA0wDOHlQ/D3QuBsLccP9JObIH8f4yJA)
are not directly supported in BigQuery. Instead, you create an
[array of structs](/bigquery/docs/reference/standard-sql/arrays#building_arrays_of_arrays),
with each struct containing a field of type `ARRAY`.

### Data Types - Exercise

In this exercise, you will examine several of the TIMESTAMP and TIME functions and data types available to you. You will be using a public BigQuery dataset that contains rental records from the London bike share program

Use the `bq` command line tool to examine the schema of the table.

`bq head` or using the `Preview` tab in the BigQuery UI are much more efficient than a `SELECT * LIMIT 1` as this triggers a whole table scan.

In [45]:
!bq head -n 5  --selected_fields rental_id,duration,bike_id,end_date,end_station_id,start_date,start_station_id bigquery-public-data:london_bicycles.cycle_hire

+-----------+----------+---------+---------------------+----------------+---------------------+------------------+
| rental_id | duration | bike_id |      end_date       | end_station_id |     start_date      | start_station_id |
+-----------+----------+---------+---------------------+----------------+---------------------+------------------+
|  47469109 |     3180 |    7054 | 2015-09-03 12:45:00 |            111 | 2015-09-03 11:52:00 |              300 |
|  46915469 |     7380 |    3792 | 2015-08-16 11:59:00 |            407 | 2015-08-16 09:56:00 |              407 |
|  65899423 |     2040 |    3038 | 2017-06-09 18:30:00 |            165 | 2017-06-09 17:56:00 |              579 |
|  64280726 |     2280 |   10868 | 2017-04-22 10:14:00 |            553 | 2017-04-22 09:36:00 |              519 |
|  59235489 |     2340 |    7183 | 2016-10-09 04:31:00 |            100 | 2016-10-09 03:52:00 |              612 |
+-----------+----------+---------+---------------------+----------------+-------

We can similarily see the schema of the table. Notice the `TIMESTAMP` fields.

In [46]:
!bq show --schema --format=prettyjson bigquery-public-data:london_bicycles.cycle_hire

[
  {
    "description": "", 
    "mode": "REQUIRED", 
    "name": "rental_id", 
    "type": "INTEGER"
  }, 
  {
    "description": "Duration of the bike trip in seconds.", 
    "mode": "NULLABLE", 
    "name": "duration", 
    "type": "INTEGER"
  }, 
  {
    "description": "", 
    "mode": "NULLABLE", 
    "name": "bike_id", 
    "type": "INTEGER"
  }, 
  {
    "description": "", 
    "mode": "NULLABLE", 
    "name": "end_date", 
    "type": "TIMESTAMP"
  }, 
  {
    "description": "", 
    "mode": "NULLABLE", 
    "name": "end_station_id", 
    "type": "INTEGER"
  }, 
  {
    "description": "", 
    "mode": "NULLABLE", 
    "name": "end_station_name", 
    "type": "STRING"
  }, 
  {
    "description": "", 
    "mode": "NULLABLE", 
    "name": "start_date", 
    "type": "TIMESTAMP"
  }, 
  {
    "description": "", 
    "mode": "NULLABLE", 
    "name": "start_station_id", 
    "type": "INTEGER"
  }, 
  {
    "description": "", 
    "mode": "NULLABLE", 
    "name": "start_station_name",

Run a query to return the most recent 5 rentlas by end_date:

In [48]:
%%bigquery
SELECT
  rental_id,
  duration,
  bike_id,
  end_date,
  end_station_id,
  end_station_name,
  start_date,
  start_station_id,
  start_station_name
FROM
  `bigquery-public-data`.london_bicycles.cycle_hire
ORDER BY
  end_date DESC
LIMIT
  5

Unnamed: 0,rental_id,duration,bike_id,end_date,end_station_id,end_station_name,start_date,start_station_id,start_station_name
0,66036945,56220,13042,2017-06-14 03:25:00+00:00,501,"Cephas Street, Bethnal Green",2017-06-13 11:48:00+00:00,501,"Cephas Street, Bethnal Green"
1,66036927,56280,6631,2017-06-14 03:25:00+00:00,501,"Cephas Street, Bethnal Green",2017-06-13 11:47:00+00:00,501,"Cephas Street, Bethnal Green"
2,66063574,15720,6088,2017-06-14 02:34:00+00:00,556,"Heron Quays DLR, Canary Wharf",2017-06-13 22:12:00+00:00,477,"Spindrift Avenue, Millwall"
3,66063566,15780,4726,2017-06-14 02:34:00+00:00,556,"Heron Quays DLR, Canary Wharf",2017-06-13 22:11:00+00:00,477,"Spindrift Avenue, Millwall"
4,66037030,52440,3686,2017-06-14 02:26:00+00:00,523,"Langdon Park, Poplar",2017-06-13 11:52:00+00:00,501,"Cephas Street, Bethnal Green"


__#TODO(you):__ Modify this query to print the `end_date` and `start_date` fields in UNIX seconds as well.

In [52]:
%%bigquery
SELECT
  rental_id,
  duration,
  bike_id,
  end_date,
  UNIX_SECONDS(end_date) AS end_date_unix,
  end_station_id,
  end_station_name,
  start_date,
  UNIX_SECONDS(start_date) AS start_date_unix,
  start_station_id,
  start_station_name
FROM
  `bigquery-public-data`.london_bicycles.cycle_hire
ORDER BY
  end_date DESC
LIMIT
  5

Unnamed: 0,rental_id,duration,bike_id,end_date,end_date_unix,end_station_id,end_station_name,start_date,start_date_unix,start_station_id,start_station_name
0,66036927,56280,6631,2017-06-14 03:25:00+00:00,1497410700,501,"Cephas Street, Bethnal Green",2017-06-13 11:47:00+00:00,1497354420,501,"Cephas Street, Bethnal Green"
1,66036945,56220,13042,2017-06-14 03:25:00+00:00,1497410700,501,"Cephas Street, Bethnal Green",2017-06-13 11:48:00+00:00,1497354480,501,"Cephas Street, Bethnal Green"
2,66063574,15720,6088,2017-06-14 02:34:00+00:00,1497407640,556,"Heron Quays DLR, Canary Wharf",2017-06-13 22:12:00+00:00,1497391920,477,"Spindrift Avenue, Millwall"
3,66063566,15780,4726,2017-06-14 02:34:00+00:00,1497407640,556,"Heron Quays DLR, Canary Wharf",2017-06-13 22:11:00+00:00,1497391860,477,"Spindrift Avenue, Millwall"
4,66037030,52440,3686,2017-06-14 02:26:00+00:00,1497407160,523,"Langdon Park, Poplar",2017-06-13 11:52:00+00:00,1497354720,501,"Cephas Street, Bethnal Green"


__#TODO(you):__ Modify this query to print the time from the `end_date` and `start_date` fields in formatted PST timezone.

In [56]:
%%bigquery
SELECT
  rental_id,
  duration,
  bike_id,
  end_date,
  EXTRACT(TIME FROM end_date AT TIME ZONE "America/Los_Angeles") AS end_time_california,
  end_station_id,
  end_station_name,
  start_date,
  EXTRACT(TIME FROM start_date AT TIME ZONE "America/Los_Angeles") AS start_time_california,
  start_station_id,
  start_station_name
FROM
  `bigquery-public-data`.london_bicycles.cycle_hire
ORDER BY
  end_date DESC
LIMIT
  5

Unnamed: 0,rental_id,duration,bike_id,end_date,end_time_california,end_station_id,end_station_name,start_date,start_date_unix,start_station_id,start_station_name
0,66036927,56280,6631,2017-06-14 03:25:00+00:00,20:25:00,501,"Cephas Street, Bethnal Green",2017-06-13 11:47:00+00:00,1497354420,501,"Cephas Street, Bethnal Green"
1,66036945,56220,13042,2017-06-14 03:25:00+00:00,20:25:00,501,"Cephas Street, Bethnal Green",2017-06-13 11:48:00+00:00,1497354480,501,"Cephas Street, Bethnal Green"
2,66063574,15720,6088,2017-06-14 02:34:00+00:00,19:34:00,556,"Heron Quays DLR, Canary Wharf",2017-06-13 22:12:00+00:00,1497391920,477,"Spindrift Avenue, Millwall"
3,66063566,15780,4726,2017-06-14 02:34:00+00:00,19:34:00,556,"Heron Quays DLR, Canary Wharf",2017-06-13 22:11:00+00:00,1497391860,477,"Spindrift Avenue, Millwall"
4,66037030,52440,3686,2017-06-14 02:26:00+00:00,19:26:00,523,"Langdon Park, Poplar",2017-06-13 11:52:00+00:00,1497354720,501,"Cephas Street, Bethnal Green"


### The SELECT Statement

### Data Definition Language (DDL)

### Data Manipulation Language (DML)

### Stored Procedures