Skip to content

PacktPublishing/Simplify-Big-Data-Analytics-with-Amazon-EMR-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simplify Big Data Analytics with Amazon EMR

Simplify Big Data Analytics with Amazon EMR

This is the code repository for Simplify Big Data Analytics with Amazon EMR, published by Packt.

A beginner’s guide to learning and implementing Amazon EMR for building data analytics solutions

What is this book about?

Amazon EMR, formerly Amazon Elastic MapReduce, provides a managed Hadoop cluster in Amazon Web Services (AWS) that you can use to implement batch or streaming data pipelines. By gaining expertise in Amazon EMR, you can design and implement data analytics pipelines with persistent or transient EMR clusters in AWS.

This book covers the following exciting features:

  • Explore Amazon EMR features, architecture, Hadoop interfaces, and EMR Studio
  • Configure, deploy, and orchestrate Hadoop or Spark jobs in production
  • Implement the security, data governance, and monitoring capabilities of EMR
  • Build applications for batch and real-time streaming data analytics solutions
  • Perform interactive development with a persistent EMR cluster and Notebook
  • Orchestrate an EMR Spark job using AWS Step Functions and Apache Airflow

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

"Properties": {
  "mapred.tasktracker.map.tasks.maximum": "10",
  "mapreduce.map.sort.spill.percent": "0.80",
  "mapreduce.tasktracker.reduce.tasks.maximum": "20"
}

Following is what you need for this book: This book is for data engineers, data analysts, data scientists, and solution architects who are interested in building data analytics solutions with the Hadoop ecosystem services and Amazon EMR. Prior experience in either Python programming, Scala, or the Java programming language and a basic understanding of Hadoop and AWS will help you make the most out of this book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-14).

Software and Hardware List

Chapter Software required OS required
1-14 EMR version 6.3 to 6.5 Windows, Mac OS X, and Linux (Any)
1-14 Spark 3.1 Windows, Mac OS X, and Linux (Any)
1-14 Python 3/PySpark Windows, Mac OS X, and Linux (Any)
1-14 SSH client/PuTTy Windows, Mac OS X, and Linux (Any)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Code in Action

The Code in Action videos for this book can be viewed at https://bit.ly/3HM9dpj.

Related products

Get to Know the Author

Sakti Mishra is an engineer, architect, author, and technology leader with over 16 years of experience in the IT industry. He is currently working as a senior data lab architect at Amazon Web Services (AWS). He is passionate about technologies and has expertise in big data, analytics, machine learning, artificial intelligence, graph networks, web/mobile applications, and cloud technologies such as AWS and Google Cloud Platform. Sakti has a bachelor’s degree in engineering and a master’s degree in business administration. He holds several certifications in Hadoop, Spark, AWS, and Google Cloud. He is also an author of multiple technology blogs, workshops, white papers and is a public speaker who represents AWS in various domains and events.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781801071079

About

Simplify Big Data Analytics with Amazon EMR, published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5