Build script and Demo for Cloudera Director with Sparklyr
This repo includes:

  • scripts for Cloudera Director building sparklyr cluster
  • demo of sparklyr analyzing US flights

Scripts for Cloudera Director

It is based on director-sparklyr-bootstrap.

If you have a trouble with director-sparklyr-bootstrap with building cluster, you can set scripts/ for bootstrapScript of every template.

Then, login the gateway node and run scripts/

conf file for Cloudera Director client

cluster.conf requires Director 2.3+.

It is assumed to run on Tokyo region. You should replace several configurations.

If you don't have installed Director client, you can use Docker based tool for Cloudera Director client.

Demo: Analyzing US flights

This demo is for Big Data Analytics Tokyo.

It visualizes US air flights and builds a linear regression model for delay prediction.

You can also see it in RPubs (Japanese version).

Loading external table on s3

Added scripts/create_external_table.sql. Before run the query, you should upload airlines_parquet and airpots data on S3. Before creating table, you should replace bucket name.