Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
KPMG Analytics and Visualization Environment
Welcome to the wiki page for KAVE
Who are you?
- I am an implementer I need to know how these tools can fit together
- I am a data scientist, what tools does KAVE give me?
- I am hands-on, what will the interfaces be like?
- I am a software engineer, what tools does KAVE give me?
- I am a data architect or data engineer, what tools does KAVE give me?
- I am a sys admin, I want to install KAVE.
- I am in network and information security, tell me what controls are in place!
- I am a business expert, or executive, explain KAVE to me.
- ==> Keep reading below!
The KAVE is modular, you can pick and choose your components from a complete toolkit, looking like:
The KAVE combines the power of a Lambda Stack, with a development and core analysis tools.
The KAVE is an extension of Apache Ambari. The versioning diagram can be seen below:
(read more here about Kave Versioning)
Introduction: If you were asked to design the future of data science, how would you do it?
Perhaps you would build a skilled team comprising:
- big-data experts, scientists and statisticians from the world-renowned physics laboratory and big data powerhouse of CERN,
- adding enterprise programmers with a computational science background,
- adding data architects working with existing commercial datasets and platforms
- inserting the business knowhow of commerce by bringing together sector experts in Finance, Healthcare, Public Sector and Retail,
- heading the team with the UvA professor of Big Data Ecosystems,
- and bringing them underneath the umbrella of a globally distributed organization of thousands of support staff like KPMG.
Q: What happens when you do that? A: You get the KAVE
The Core Working Principles
- Zero data leaks
- Multi-dimensional Scalability
- Flexibility for Analysts
- Evolves with you as you need it
- The best of the best open-source software
- For your entire cluster? Or a single-node Hadoop? Choose AmbariKave
- For your single laptop? Or your single workstation? Or your single VM? Choose KaveToolbox
Access, Security and Privacy
- We favor a model where the data sent to your KAVE is carefully considered a-priori and all analysts then have access to all of the data stored there.
- Network, privacy, user rights and security are discussed in Access and Security
- Access through ssh necessitates a local ssh client. PuTTy can be used for this purpose. See Accessing your KAVE
|Core Analytics Tools||Data: HDP / Lambda Stack||Development Line CI|
|Collaboration Tools||Misc||Management, Security, Access|
Accessing your KAVE
Accessing KAVE is covered in the video tutorial here: https://www.youtube.com/watch?v=eBgr2wXjOZw
Or see the beginning of the text guide here: Accessing your KAVE
Licensing of KAVE
- Software: http://opensource.org/licenses/Apache-2.0 (c) KPMG Advisory N.V. 2016 (unless otherwise stated)
- Documentation: http://creativecommons.org/licenses/by/4.0/ where not covered by above (c) KPMG Advisory N.V. 2016 (unless otherwise stated)
- Branding: original images, the KAVE logo, and the name "KAVE": distributed under (c) KPMG Advisory N.V. 2016 with both http://creativecommons.org/licenses/by-nd/4.0/ and http://creativecommons.org/licenses/by-nc-sa/4.0/ (unless otherwise stated)
In principle this permits free and open usage of the software for any purpose, free and open use of the documentation for any purpose, and restricts slightly the modification of our imagery to prevent third parties unduly profiting from rebranding their products under the same imagery without correct attribution.
These details are repeated in the LICENSE and NOTICE files held with the software and distributed with the product.
Licensing of bundled components
KAVE gathers together a toolkit of pre-existing third-party open-source software components. These software components are governed by their own licences which KAVE installer does not modify or supersede, please consult the originating authors. These components altogether have a mixture of the following licences: Apache 2.0, GPL 2.0, AGPL and LGPL, ZPL, MIT, PSF, BSD and some BSD-like simple licences. For scipy and ipython see: http://docs.continuum.io/anaconda/licenses.html .