Skip to content

PacktPublishing/Getting-Started-with-DuckDB

Repository files navigation

Getting Started with DuckDB

no-image

This is the code repository for Getting Started with DuckDB, published by Packt.

A practical guide for accelerating your data science, data analytics, and data engineering workflows

What is this book about?

This hands-on book teaches you to analyze large datasets with blazing speed and ease. You will learn how to use DuckDB to quickly load, query, transform, analyze, and visualize data effectively through a series of practical examples.

This book covers the following exciting features:

  • Understand the properties and applications of a columnar in-process database
  • Use SQL to load, transform, and query a range of data formats
  • Discover DuckDB's rich extensions and learn how to apply them
  • Use nested data types to model semi-structured data and extract and model JSON data
  • Integrate DuckDB into your Python and R analytical workflows
  • Effectively leverage DuckDB's convenient SQL enhancements
  • Explore the wider ecosystem and pathways for building DuckDB-powered data applications

If you feel this book is for you, get your copy today! https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

CREATE TABLE foods (
    food_name VARCHAR PRIMARY KEY,
    color VARCHAR,
    calories INT,
    is_healthy BOOLEAN
);

Following is what you need for this book: If you’re interested in expanding your analytical toolkit, this book is for you. It will be particularly valuable for data analysts wanting to rapidly explore and query complex data, data and software engineers looking for a lean and versatile data processing tool, along with data scientists needing a scalable data manipulation library that integrates seamlessly with Python and R. You will get the most from this book if you have some familiarity with SQL and foundational database concepts, as well as exposure to a programming language such as Python or R.

With the following software and hardware list you can run all code files present in the book (Chapter 1-12).

Software and Hardware List

Chapter Software required OS required
1-12 The DuckDB CLI client Windows, macOS, or Linux
1-12 The DuckDB Python client Windows, macOS, or Linux
1-12 Th e DuckDB R client Windows, macOS, or Linux

Related products

Get to Know the Authors

Simon Aubury is a data engineering specialist, with an extensive background in building large, flexible, highly-available distributed data systems. He has delivered critical data systems for finance, transport, healthcare, insurance, and telecommunications clients in Australia, Europe, and Asia Pacific.

Ned Letcher is a data science specialist who has worked across a range of industries, designing and building data-powered products and services. He has helped many individuals and teams apply best practices to data workflows, having worked as a Python trainer in enterprise and tertiary education. Ned works at Thoughtworks as a lead data science engineer.

About

Getting started with DuckDB, by Packt Publishing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •