This repo includes:
- scripts for Cloudera Director building sparklyr cluster
- demo of sparklyr analyzing US flights
It is based on director-sparklyr-bootstrap.
If you have a trouble with director-sparklyr-bootstrap with building cluster,
you can set scripts/bootstrap.sh
for bootstrapScript of every template.
Then, login the gateway node and run scripts/post_crete_script_on_gateway.sh
.
cluster.conf requires Director 2.3+.
It is assumed to run on Tokyo region. You should replace several configurations.
If you don't have installed Director client, you can use Docker based tool for Cloudera Director client.
This demo is for Big Data Analytics Tokyo.
It visualizes US air flights and builds a linear regression model for delay prediction.
You can also see it in RPubs (Japanese version).
Added scripts/create_external_table.sql
.
Before run the query, you should upload airlines_parquet and airpots data on S3.
Before creating table, you should replace bucket name.