This is the code repository for Apache Hadoop 3 Quick Start Guide, published by Packt.
Learn about big data processing and analytics
Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS.
This book covers the following exciting features:
- Store and analyze data at scale using HDFS, MapReduce and YARN
- Install and configure Hadoop 3 in different modes
- Use Yarn effectively to run different applications on Hadoop based platform
- Understand and monitor how Hadoop cluster is managed
- Consume streaming data using Storm, and then analyze it using Spark
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, Chapter02.
The code will look like the following:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.0</version>
</dependency>
</dependencies>
Following is what you need for this book: Aspiring Big Data professionals who want to learn the essentials of Hadoop 3 will find this book to be useful. Existing Hadoop users who want to get up to speed with the new features introduced in Hadoop 3 will also benefit from this book. Having knowledge of Java programming will be an added advantage.
With the following software and hardware list you can run all code files present in the book (Chapter 1-8).
Chapter | Software required | OS required |
---|---|---|
2 to 8 | OpenJDK 1.8.0_171 64 bit Apache Hadoop-3.1.0 | Ubuntu 16.04.3_LTS |
Click on the following link to see the Code in Action:
Hrishikesh Vijay Karambelkar is an innovator and an enterprise architect with 16 years of software design and development experience, specifically in the areas of big data, enterprise search, data analytics, text mining, and databases. He is passionate about architecting new software implementations for the next generation of software solutions for various industries, including oil and gas, chemicals, manufacturing, utilities, healthcare, and government infrastructure. In the past, he has authored three books for Packt Publishing: two editions of Scaling Big Data with Hadoop and Solr and one of Scaling Apache Solr. He has also worked with graph databases, and some of his work has been published at international conferences such as VLDB and ICDE.
Click here if you have any feedback or suggestions.
If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.