Skip to content
This repository has been archived by the owner on Apr 23, 2023. It is now read-only.
/ cloudera-sparklyr Public archive

Build script and Demo for Cloudera Director with Sparklyr

Notifications You must be signed in to change notification settings

chezou/cloudera-sparklyr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cloudera-sparklyr

This repo includes:

  • scripts for Cloudera Director building sparklyr cluster
  • demo of sparklyr analyzing US flights

Scripts for Cloudera Director

It is based on director-sparklyr-bootstrap.

If you have a trouble with director-sparklyr-bootstrap with building cluster, you can set scripts/bootstrap.sh for bootstrapScript of every template.

Then, login the gateway node and run scripts/post_crete_script_on_gateway.sh.

conf file for Cloudera Director client

cluster.conf requires Director 2.3+.

It is assumed to run on Tokyo region. You should replace several configurations.

If you don't have installed Director client, you can use Docker based tool for Cloudera Director client.

Demo: Analyzing US flights

This demo is for Big Data Analytics Tokyo.

It visualizes US air flights and builds a linear regression model for delay prediction.

You can also see it in RPubs (Japanese version).

Loading external table on s3

Added scripts/create_external_table.sql. Before run the query, you should upload airlines_parquet and airpots data on S3. Before creating table, you should replace bucket name.

About

Build script and Demo for Cloudera Director with Sparklyr

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages