Skip to content
Chan edited this page Aug 8, 2017 · 49 revisions

Table of Contents

For users, installers, and other persons interested in the KAVE, or developing solutions on top of a KAVE.

Kave on Azure

For contributors

For someone who modifies the AmbariKave code itself and contributes to this project. Persons working on top of existing KAVEs or developing solutions on top of KAVE don't need to read any of this second part.

Clone this wiki locally

KPMG Analytics and Visualization Environment

Welcome to the wiki page for KAVE

Who are you?

The KAVE is modular, you can pick and choose your components from a complete toolkit, looking like:

The KAVE combines the power of a Lambda Stack, with a development and core analysis tools.

The KAVE is an extension of Apache Ambari. The versioning diagram can be seen below:

(read more here about Kave Versioning)

Introduction: If you were asked to design the future of data science, how would you do it?

Connecting to KAVE

Perhaps you would build a skilled team comprising:

  • big-data experts, scientists and statisticians from the world-renowned physics laboratory and big data powerhouse of CERN,
  • adding enterprise programmers with a computational science background,
  • adding data architects working with existing commercial datasets and platforms
  • inserting the business knowhow of commerce by bringing together sector experts in Finance, Healthcare, Public Sector and Retail,
  • heading the team with the UvA professor of Big Data Ecosystems,
  • and bringing them underneath the umbrella of a globally distributed organization of thousands of support staff like KPMG.

Q: What happens when you do that? A: You get the KAVE

The Core Working Principles

  • Zero data leaks
  • Multi-dimensional Scalability
  • Flexibility for Analysts
  • Evolves with you as you need it
  • The best of the best open-source software

The Installer:

  • For your entire cluster? Or a single-node Hadoop? Choose AmbariKave
  • For your single laptop? Or your single workstation? Or your single VM? Choose KaveToolbox

Access, Security and Privacy

  • We favor a model where the data sent to your KAVE is carefully considered a-priori and all analysts then have access to all of the data stored there.
  • Network, privacy, user rights and security are discussed in Access and Security
  • Access through ssh necessitates a local ssh client. PuTTy can be used for this purpose. See Accessing your KAVE

The Components

Core Analytics Tools Data: HDP / Lambda Stack Development Line CI

Collaboration Tools Misc Management, Security, Access

Accessing your KAVE

Accessing KAVE is covered in the video tutorial here:

Connecting to KAVE

Or see the beginning of the text guide here: Accessing your KAVE

Licensing of KAVE

In principle this permits free and open usage of the software for any purpose, free and open use of the documentation for any purpose, and restricts slightly the modification of our imagery to prevent third parties unduly profiting from rebranding their products under the same imagery without correct attribution.

These details are repeated in the LICENSE and NOTICE files held with the software and distributed with the product.

Licensing of bundled components

KAVE gathers together a toolkit of pre-existing third-party open-source software components. These software components are governed by their own licences which KAVE installer does not modify or supersede, please consult the originating authors. These components altogether have a mixture of the following licences: Apache 2.0, GPL 2.0, AGPL and LGPL, ZPL, MIT, PSF, BSD and some BSD-like simple licences. For scipy and ipython see: .