Skip to content

PacktPublishing/Serverless-ETL-and-Analytics-with-AWS-Glue

Repository files navigation

Packt Conference

3 Days, 20+ AI Experts, 25+ Workshops and Power Talks

Code: USD75OFF

Serverless ETL and Analytics with AWS Glue

Serverless ETL and Analytics with AWS Glue

This is the code repository for Serverless ETL and Analytics with AWS Glue, published by Packt.

Your comprehensive reference guide to learning about AWS Glue and its features

What is this book about?

Organizations these days have gravitated toward services such as AWS Glue that undertake undifferentiated heavy lifting and provide serverless Spark, enabling you to create and manage data lakes in a serverless fashion. This guide shows you how AWS Glue can be used to solve real-world problems along with helping you learn about data processing, data integration, and building data lakes.

This book covers the following exciting features:

  • Apply various AWS Glue features to manage and create data lakes
  • Use Glue DataBrew and Glue Studio for data preparation
  • Optimize data layout in cloud storage to accelerate analytics workloads
  • Manage metadata including database, table, and schema definitions
  • Secure your data during access control, encryption, auditing, and networking
  • Monitor AWS Glue jobs to detect delays and loss of data
  • Integrate Spark ML and SageMaker with AWS Glue to create machine learning models

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter03.

The code will look like the following:

root
|-- ColumnA: string (nullable = true)
|-- ColumnB: string (nullable = true)

Following is what you need for this book: This artificial intelligence BI book is for data analysts and BI developers who want to explore advanced analytics or artificial intelligence possibilities with their data. Prior knowledge of Power BI will help you get the most out of this book.

With the following software and hardware list you can run all code files present in the book (Chapter 1-15).

Software and Hardware List

Chapter Software required OS required
1-15 An AWS account Windows, Mac OS X, and Linux (Any)
The AWS CLI
A web browser (Google Chrome, Mozilla
Firefox, Microsoft Edge, or Safari)

Not all the chapters' walkthroughs require an AWS CLI installation. You’ll be informed in each chapter when you need further requirements.

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Authors

Vishal Pathak is a Data Lab Solutions Architect at AWS. Vishal works with customers on their use cases, architects solutions to solve their business problems, and helps them build scalable prototypes. Prior to his journey in AWS, Vishal helped customers implement business intelligence, data warehouse, and data lake projects in the US and Australia.

Subramanya Vajiraya is a Big data Cloud Engineer at AWS Sydney specializing in AWS Glue. He obtained his Bachelor of Engineering degree specializing in Information Science & Engineering from NMAM Institute of Technology, Nitte, KA, India (Visvesvaraya Technological University, Belgaum) in 2015 and obtained his Master of Information Technology degree specialized in Internetworking from the University of New South Wales, Sydney, Australia in 2017. He is passionate about helping customers solve challenging technical issues related to their ETL workload and implementing scalable data integration and analytics pipelines on AWS.

Noritaka Sekiyama is a Senior Big Data Architect on the AWS Glue and AWS Lake Formation team. He has 11 years of experience working in the software industry. Based in Tokyo, Japan, he is responsible for implementing software artifacts, building libraries, troubleshooting complex issues and helping guide customer architectures.

Tomohiro Tanaka is a senior cloud support engineer at AWS. He works to help customers solve their issues and build data lakes across AWS Glue, AWS IoT, and big data technologies such Apache Spark, Hadoop, and Iceberg.

Albert Quiroga works as a senior solutions architect at Amazon, where he is helping to design and architect one of the largest data lakes in the world. Prior to that, he spent four years working at AWS, where he specialized in big data technologies such as EMR and Athena, and where he became an expert on AWS Glue. Albert has worked with several Fortune 500 companies on some of the largest data lakes in the world and has helped to launch and develop features for several AWS services.

Ishan Gaur has more than 13 years of IT experience in software development and data engineering, building distributed systems and highly scalable ETL pipelines using Apache Spark, Scala, and various ETL tools such as Ab Initio and Datastage. He currently works at AWS as a senior big data cloud engineer and is an SME of AWS Glue. He is responsible for helping customers to build out large, scalable distributed systems and implement them in AWS cloud environments using various big data services, including EMR, Glue, and Athena, as well as other technologies, such as Apache Spark, Hadoop, and Hive.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781800564985

About

Serverless ETL and Analytics with AWS Glue, published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published