This is the code repository for Simplify Big Data Analytics with Amazon EMR, published by Packt.
A beginner’s guide to learning and implementing Amazon EMR for building data analytics solutions
Amazon EMR, formerly Amazon Elastic MapReduce, provides a managed Hadoop cluster in Amazon Web Services (AWS) that you can use to implement batch or streaming data pipelines. By gaining expertise in Amazon EMR, you can design and implement data analytics pipelines with persistent or transient EMR clusters in AWS.
This book covers the following exciting features:
- Explore Amazon EMR features, architecture, Hadoop interfaces, and EMR Studio
- Configure, deploy, and orchestrate Hadoop or Spark jobs in production
- Implement the security, data governance, and monitoring capabilities of EMR
- Build applications for batch and real-time streaming data analytics solutions
- Perform interactive development with a persistent EMR cluster and Notebook
- Orchestrate an EMR Spark job using AWS Step Functions and Apache Airflow
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, Chapter02.
The code will look like the following:
"Properties": {
"mapred.tasktracker.map.tasks.maximum": "10",
"mapreduce.map.sort.spill.percent": "0.80",
"mapreduce.tasktracker.reduce.tasks.maximum": "20"
}
Following is what you need for this book: This book is for data engineers, data analysts, data scientists, and solution architects who are interested in building data analytics solutions with the Hadoop ecosystem services and Amazon EMR. Prior experience in either Python programming, Scala, or the Java programming language and a basic understanding of Hadoop and AWS will help you make the most out of this book.
With the following software and hardware list you can run all code files present in the book (Chapter 1-14).
| Chapter | Software required | OS required |
|---|---|---|
| 1-14 | EMR version 6.3 to 6.5 | Windows, Mac OS X, and Linux (Any) |
| 1-14 | Spark 3.1 | Windows, Mac OS X, and Linux (Any) |
| 1-14 | Python 3/PySpark | Windows, Mac OS X, and Linux (Any) |
| 1-14 | SSH client/PuTTy | Windows, Mac OS X, and Linux (Any) |
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.
The Code in Action videos for this book can be viewed at https://bit.ly/3HM9dpj.
Sakti Mishra is an engineer, architect, author, and technology leader with over 16 years of experience in the IT industry. He is currently working as a senior data lab architect at Amazon Web Services (AWS). He is passionate about technologies and has expertise in big data, analytics, machine learning, artificial intelligence, graph networks, web/mobile applications, and cloud technologies such as AWS and Google Cloud Platform. Sakti has a bachelor’s degree in engineering and a master’s degree in business administration. He holds several certifications in Hadoop, Spark, AWS, and Google Cloud. He is also an author of multiple technology blogs, workshops, white papers and is a public speaker who represents AWS in various domains and events.
If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.
