Skip to content

Commit

Permalink
Merge a58d9b0 into f5e17c7
Browse files Browse the repository at this point in the history
  • Loading branch information
massdosage committed Nov 15, 2018
2 parents f5e17c7 + a58d9b0 commit 1d11edd
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 6 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# [5.0.0] - 2018-11-15
### Changed
* Circus Train version upgraded to 13.0.0 (was 12.0.0). Note that this change is _not_ backwards compatible as this BigQuery extension now needs to be explicitly added to the Circus Train classpath using Circus Train's [standard extension loading mechanism](https://github.com/HotelsDotCom/circus-train#loading-extensions). See [#20](https://github.com/HotelsDotCom/circus-train-bigquery/issues/20).

# [4.0.0] - 2018-11-09
### Changed
* Replicated tables are now exported as AVRO files instead of CSV files. This allows BigQuery tables to be replicated without any schema or data change. See [#18](https://github.com/HotelsDotCom/circus-train-bigquery/issues/17).
Expand Down
19 changes: 16 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,29 @@
## Overview
This [Circus Train](https://github.com/HotelsDotCom/circus-train) plugin enables the conversion of BigQuery tables to Hive.

# Start using
## Start using
You can obtain Circus Train BigQuery from Maven Central:

[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.hotels/circus-train-bigquery/badge.svg?subject=com.hotels:circus-train-bigquery)](https://maven-badges.herokuapp.com/maven-central/com.hotels/circus-train-bigquery) [![Build Status](https://travis-ci.org/HotelsDotCom/circus-train-bigquery.svg?branch=master)](https://travis-ci.org/HotelsDotCom/circus-train-bigquery) [![Coverage Status](https://coveralls.io/repos/github/HotelsDotCom/circus-train-bigquery/badge.svg?branch=master)](https://coveralls.io/github/HotelsDotCom/circus-train-bigquery?branch=master) ![GitHub license](https://img.shields.io/github/license/HotelsDotCom/circus-train.svg)

## Installation
In order to be used by Circus Train the above `circus-train-bigquery` jar file must be added to Circus Train's classpath. It is highly recommended that the version of this library and the version of Circus Train are identical. The recommended way to make this extension available on the classpath is to store it in a standard location
and then add this to the `CIRCUS_TRAIN_CLASSPATH` environment variable (e.g. via a startup script):

export CIRCUS_TRAIN_CLASSPATH=$CIRCUS_TRAIN_CLASSPATH:/opt/circus-train-big-query/lib/*

Another option is to place the jar file in the Circus Train `lib` folder which will then automatically load it but risks interfering with any Circus Train jobs that do not require the extension's functionality.

## Configuration
* Add the `circus-train-bigquery` jar to your `CIRCUS_TRAIN_CLASSPATH`, or as a dependency on your Circus Train project.
* Add the following to the Circus Train YAML configuration in order to load the BigQuery extension via Circus Train's [extension loading mechanism](https://github.com/HotelsDotCom/circus-train#loading-extensions):

extension-packages: com.hotels.bdp.circustrain.bigquery

* Configure Circus Train as you would for a copy job from Google Cloud [Configuration](https://github.com/HotelsDotCom/circus-train/tree/master/circus-train-gcp)
* Provide the Google Cloud project ID that your BigQuery instance resides in as your `source-catalog` `hive-metastore-uris` parameter using the format `hive-metastore-uris: bigquery://<project-id>`
* To enable copying to Google Storage provide a path to your Google Credentials in the configuration under the gcp-security parameter.
* Provide your BigQuery dataset as `source-table` `database-name` and your BigQuery table name as `source-table` `table-name`


## Partition Generation
Circus Train BigQuery allows you to add partitions for one column to your Hive destination table upon replication from BigQuery to Hive. The partition must be a field present on your source BigQuery table. The user can configure the partition field by setting the `table-replications[n].copier-option : circus-train-bigquery-partition-by` property on a specific table replication within the Circus Train configuration file. The destination data will be repartitioned on the specified field.

Expand All @@ -24,6 +34,7 @@ If your destination data is partitioned you can also specify a partition filter
### Examples:

#### Simple Configuration
extension-packages: com.hotels.bdp.circustrain.bigquery
source-catalog:
name: my-google-source-catalog
hive-metastore-uris: bigquery://my-gcp-project-id
Expand All @@ -44,6 +55,7 @@ If your destination data is partitioned you can also specify a partition filter
table-location: s3://mybucket/foo/baz/

#### Configuration with partition generation configured
extension-packages: com.hotels.bdp.circustrain.bigquery
source-catalog:
... see above ...

Expand All @@ -60,6 +72,7 @@ If your destination data is partitioned you can also specify a partition filter
circustrain-bigquery-partition-by: date

#### Configuration with partition filter configured
extension-packages: com.hotels.bdp.circustrain.bigquery
source-catalog:
... see above ...

Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
</parent>

<artifactId>circus-train-bigquery</artifactId>
<version>4.0.1-SNAPSHOT</version>
<version>5.0.0-SNAPSHOT</version>
<name>Bigquery Client</name>
<inceptionYear>2018</inceptionYear>

Expand All @@ -20,7 +20,7 @@
</scm>

<properties>
<circus-train-version>12.0.0</circus-train-version>
<circus-train-version>13.0.0</circus-train-version>
<google-cloud.version>0.34.0-alpha</google-cloud.version>
<hadoop.version>2.7.1</hadoop.version>
<hive.version>2.3.3</hive.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
import com.hotels.bdp.circustrain.bigquery.extraction.service.ExtractionService;
import com.hotels.bdp.circustrain.bigquery.table.service.TableServiceFactory;
import com.hotels.bdp.circustrain.bigquery.util.BigQueryMetastore;
import com.hotels.bdp.circustrain.core.metastore.ConditionalMetaStoreClientFactory;
import com.hotels.hcommon.hive.metastore.client.api.CloseableMetaStoreClient;
import com.hotels.hcommon.hive.metastore.client.api.ConditionalMetaStoreClientFactory;
import com.hotels.hcommon.hive.metastore.exception.MetaStoreClientException;

@Component
Expand Down

0 comments on commit 1d11edd

Please sign in to comment.