Build script and Demo for Cloudera Director with Sparklyr
HTML Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
scripts
source
.gitignore
README.md
cluster.conf

README.md

cloudera-sparklyr

This repo includes:

  • scripts for Cloudera Director building sparklyr cluster
  • demo of sparklyr analyzing US flights

Scripts for Cloudera Director

It is based on director-sparklyr-bootstrap.

If you have a trouble with director-sparklyr-bootstrap with building cluster, you can set scripts/bootstrap.sh for bootstrapScript of every template.

Then, login the gateway node and run scripts/post_crete_script_on_gateway.sh.

conf file for Cloudera Director client

cluster.conf requires Director 2.3+.

It is assumed to run on Tokyo region. You should replace several configurations.

If you don't have installed Director client, you can use Docker based tool for Cloudera Director client.

Demo: Analyzing US flights

This demo is for Big Data Analytics Tokyo.

It visualizes US air flights and builds a linear regression model for delay prediction.

You can also see it in RPubs (Japanese version).

Loading external table on s3

Added scripts/create_external_table.sql. Before run the query, you should upload airlines_parquet and airpots data on S3. Before creating table, you should replace bucket name.