Skip to content

Commit

Permalink
add until; update header; linter hot-fix
Browse files Browse the repository at this point in the history
  • Loading branch information
zhenik committed Sep 11, 2020
1 parent 09e586f commit e3db588
Show file tree
Hide file tree
Showing 5 changed files with 122 additions and 96 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/on_pr_push_master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Super-Linter
uses: github/super-linter@v3.9.4
# todo: use github/super-linter when will be resolved https://github.com/github/super-linter/issues/708
uses: Neha-Sinha2305/super-linter@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
VALIDATE_ANSIBLE: true
Expand Down
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

### Added

- Data examples & create tables
- Documentation
- Data examples & create tables #5 #4
- Documentation #3 #8
- Fixate linter version #10

## [0.0.1]
Expand Down
118 changes: 25 additions & 93 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,25 @@
<!-- markdownlint-disable MD041 -->
<p align="center">
<h2 align="center">Terraform-nomad-presto</h2>
<p align="center">Terraform module with example</p>
<a href="https://github.com/fredrikhgrelland/vagrant-hashistack/releases">
<img alt="Releases" src="https://img.shields.io/badge/dynamic/json?label=with%20vagrant-hashistack&query=%24.current_version.version&url=https%3A%2F%2Fapp.vagrantup.com%2Fapi%2Fv1%2Fbox%2Ffredrikhgrelland%2Fhashistack"/>
</a>
<h2 align="center">Terraform-nomad-presto</h2>
</p>
<p align="center">
<a href="https://github.com/fredrikhgrelland/vagrant-hashistack-template" alt="Built on">
<img src="https://img.shields.io/badge/Built%20from%20template-Vagrant--hashistack--template-blue?style=for-the-badge&logo=github"/>
</a>
<p align="center">
<a href="https://github.com/fredrikhgrelland/vagrant-hashistack" alt="Built on">
<img src="https://img.shields.io/badge/Powered%20by%20-Vagrant--hashistack-orange?style=for-the-badge&logo=vagrant"/>
</a>
</p>
</p>

---

Module contains a nomad job [./conf/nomad/presto.hcl](./conf/nomad/presto.hcl) with [presto sql server](https://github.com/prestosql/presto).

#
Additional information:
- [consul-connect](https://www.consul.io/docs/connect) integration
- [nomad docker driver](https://www.nomadproject.io/docs/drivers/docker.html)

## Contents
0. [Prerequisites](#prerequisites)
Expand Down Expand Up @@ -38,7 +51,9 @@ Please follow [this section in original template](https://github.com/fredrikhgre
make up
```

Check the example of terraform-nomad-presto documentation [here](./example)
Check the example of terraform-nomad-presto documentation [here](./example).

Example contains [csv, json, avro, protobuf](./example/resources/data) file types.

### Requirements

Expand Down Expand Up @@ -193,93 +208,10 @@ presto --server localhost:8080 --catalog hive --schema default --user presto --f
## Authors

## License
This work is licensed under Apache 2 License. See [LICENSE](./LICENSE) for full details.

________
---

## References
- [Blog post](https://towardsdatascience.com/load-and-query-csv-file-in-s3-with-presto-b0d50bc773c9)
- Presto, so far (release 340), [supports only varchar columns](https://github.com/prestosql/presto/pull/920#issuecomment-517593414).

## File types

### CSV
```sql
CREATE TABLE iris (
sepal_length varchar,
sepal_width varchar,
petal_length varchar,
petal_width varchar,
species varchar
)
WITH (
format = 'CSV',
external_location='s3a://hive/data/csv/',
skip_header_line_count=1
);
```

`NB!` Hive supports csv int types for columns.
You can create a table for `csv` file format using `hive-metastore`.
```sql
CREATE EXTERNAL TABLE iris (sepal_length DECIMAL, sepal_width DECIMAL,
petal_length DECIMAL, petal_width DECIMAL, species STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3a://hive/data/csv/'
TBLPROPERTIES ("skip.header.line.count"="1");
```

### JSON

```sql
CREATE TABLE somejson (
description varchar,
foo ROW (
bar varchar,
quux varchar,
level1 ROW (
l2string varchar,
l2struct ROW (
level3 varchar
)
)
),
wibble varchar,
wobble ARRAY (
ROW (
entry int,
EntryDetails ROW (
details varchar,
details2 int
)
)
)
)
WITH (
format = 'JSON',
external_location = 's3a://hive/data/json/'
);
```

### AVRO

```sql
CREATE TABLE tweets (
username varchar,
tweet varchar,
timestamp bigint
)
WITH (
format = 'AVRO',
external_location='s3a://hive/data/avro-tweet/'
);
```

### PROTOBUF
Reference to [using-protobuf-parquet](https://costimuraru.wordpress.com/2018/04/26/using-protobuf-parquet-with-aws-athena-presto-or-hive/)

todo
```sql

```
- Presto, so far (release 340), [supports only varchar columns](https://github.com/prestosql/presto/pull/920#issuecomment-517593414)
9 changes: 9 additions & 0 deletions dev/ansible/05_presto_create_tables.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,25 @@
- name: Create CSV table
shell: docker run --network host -v "/vagrant/example/resources/query/csv_create_table.sql:/csv_create_table.sql" fredrikhgrelland/presto-cli ./presto --server localhost:8888 --catalog hive --schema default --user presto --file /csv_create_table.sql
register: docker_output_cmd_csv
retries: 5
delay: 1
until: docker_output_cmd_csv.rc == 0
tags: example-upload

- name: Create JSON table
shell: docker run --network host -v "/vagrant/example/resources/query/json_create_table.sql:/json_create_table.sql" fredrikhgrelland/presto-cli ./presto --server localhost:8888 --catalog hive --schema default --user presto --file /json_create_table.sql
register: docker_output_cmd_json
retries: 5
delay: 1
until: docker_output_cmd_json.rc == 0
tags: example-upload

- name: Create AVRO table
shell: docker run --network host -v "/vagrant/example/resources/query/avro_tweets_create_table.sql:/avro_tweets_create_table.sql" fredrikhgrelland/presto-cli ./presto --server localhost:8888 --catalog hive --schema default --user presto --file /avro_tweets_create_table.sql
register: docker_output_cmd_avro
retries: 5
delay: 1
until: docker_output_cmd_avro.rc == 0
tags: example-upload

- name: Verify CSV table - available 150 records
Expand Down
84 changes: 84 additions & 0 deletions example/resources/data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# File types
Different file types which are uploaded to stack in current example

## CSV
```sql
CREATE TABLE iris (
sepal_length varchar,
sepal_width varchar,
petal_length varchar,
petal_width varchar,
species varchar
)
WITH (
format = 'CSV',
external_location='s3a://hive/data/csv/',
skip_header_line_count=1
);
```

`NB!` Hive supports csv int types for columns.
You can create a table for `csv` file format using `hive-metastore`.
```sql
CREATE EXTERNAL TABLE iris (sepal_length DECIMAL, sepal_width DECIMAL,
petal_length DECIMAL, petal_width DECIMAL, species STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3a://hive/data/csv/'
TBLPROPERTIES ("skip.header.line.count"="1");
```

## JSON

```sql
CREATE TABLE somejson (
description varchar,
foo ROW (
bar varchar,
quux varchar,
level1 ROW (
l2string varchar,
l2struct ROW (
level3 varchar
)
)
),
wibble varchar,
wobble ARRAY (
ROW (
entry int,
EntryDetails ROW (
details varchar,
details2 int
)
)
)
)
WITH (
format = 'JSON',
external_location = 's3a://hive/data/json/'
);
```

## AVRO

```sql
CREATE TABLE tweets (
username varchar,
tweet varchar,
timestamp bigint
)
WITH (
format = 'AVRO',
external_location='s3a://hive/data/avro-tweet/'
);
```

## PROTOBUF
Reference to [using-protobuf-parquet](https://costimuraru.wordpress.com/2018/04/26/using-protobuf-parquet-with-aws-athena-presto-or-hive/)

todo
```sql

```

0 comments on commit e3db588

Please sign in to comment.